Leveraging Groups for aggregate data

Group objects can keep track of any number of Sim and Group objects it counts as members, and it can store datasets derived from these objects. Just as a Sim manages data obtained from a single simulation, a Group is designed to manage data obtained from a collection of Sim or Group objects in aggregate.

As with a Sim, to generate a Group from scratch, we need only give it a name. We can also give any number of existing Sim or Group objects to add them as members

>>> from mdsynthesis import Group
>>> g = Group('gruffy', members=[s1, s2, s3, g4, g5])
>>> g
<Group: 'gruffy' | 5 Members: 3 Sim, 2 Group>

This will create a directory gruffy that contains a single file (Group.<uuid>.h5). That file is a persistent representation of the Group on disk. We can access its members with

>>> g.members
<Members(['marklar', 'scruffy', 'fluffy', 'buffy', 'gorp'])>
>>> g.members[2]
<Sim: 'fluffy'>

and we can slice, too

>>> g.members[2:]
[<Sim: 'fluffy'>, <Group: 'buffy'>, <Group: 'gorp'>]

Note

Members are generated from their state files on disk upon access. This means that for a Group with hundreds of members, there will be a delay when trying to access them all at once.

A Group can even be a member of itself

>>> g.members.add(g)
>>> g
<Group: 'gruffy' | 6 Members: 3 Sim, 3 Group>
>>> g.members[-1]
<Group: 'gruffy' | 6 Members: 3 Sim, 3 Group>
>>> g.members[-1].members[-1]
<Group: 'gruffy' | 6 Members: 3 Sim, 3 Group>

As a technical aside, note that a Group returned as a member of itself is not the same object in memory as the Group that returned it. They are two different instances of the same Group

>>> g2 = g.members[-1]
>>> g2 is g
False

But since they pull their state from the same file on disk, they will reflect the same stored information at all times

>>> g.tags.add('kinases')
>>> g2.tags
<Tags(['kinases'])>

Reference: Group

class mdsynthesis.Group(group, members=None, location='.', coordinator=None, categories=None, tags=None)

The Group object is a collection of Sims and Groups.

Generate a new or regenerate an existing (on disk) Group object.

Required Arguments:
 
group

if generating a new Group, the desired name to give it; if regenerating an existing Group, string giving the path to the directory containing the Group object’s state file

Optional arguments when generating a new Group:
 
members

a list of Sims and/or Groups to immediately add as members

location

directory to place Group object; default is the current directory

coordinator

directory of the Coordinator to associate with this object; if the Coordinator does not exist, it is created; if None, the Sim will not associate with any Coordinator

categories

dictionary with user-defined keys and values; used to give Groups distinguishing characteristics

tags

list with user-defined values; like categories, but useful for adding many distinguishing descriptors

Note: optional arguments are ignored when regenerating an existing
Group
basedir

Absolute path to the Container’s base directory.

This is a convenience property; the same result can be obtained by joining :attr:location and :attr:name.

categories

The categories of the Container.

Categories are user-added key-value pairs that can be used to and distinguish Containers from one another through Coordinator or Group queries. They can also be useful as flags for external code to determine how to handle the Container.

containertype

The type of the Container.

coordinators

The locations of the associated Coordinators.

Change this to associate the Container with an existing or new Coordinator(s).

data

The data of the Container.

Data are user-generated pandas objects (e.g. Series, DataFrames), numpy arrays, or any pickleable python object that are stored in the Container for easy recall later. Each data instance is given its own directory in the Container’s tree.

location

The location of the Container.

Setting the location to a new path physically moves the Container to the given location. This only works if the new location is an empty or nonexistent directory.

members

The members of the Group.

A Group is useful as an interface to collections of Containers, and they allow direct access to each member of that collection. Often a Group is used to store datasets derived from this collection as an aggregate.

Queries can also be made on the Group’s members to return a subselection of the members based on some search criteria. This can be useful to define new Groups from members of existing ones.

name

The name of the Container.

The name of a Container need not be unique with respect to other Containers, but is used as part of Container’s displayed representation.

tags

The tags of the Container.

Tags are user-added strings that can be used to and distinguish Containers from one another through Coordinator or Group queries. They can also be useful as flags for external code to determine how to handle the Container.

uuid

Get Container uuid.

A Container’s uuid is used by other Containers to identify it. The uuid is given in the Container’s state file name for fast filesystem searching. For example, a Sim object with state file:

'Sim.7dd9305a-d7d9-4a7b-b513-adf5f4205e09.h5'

has uuid:

'7dd9305a-d7d9-4a7b-b513-adf5f4205e09'

Changing this string will alter the Container’s uuid. This is not generally recommended.

Returns:
uuid

unique identifier string for this Container

Reference: Members

The class mdsynthesis.core.aggregators.Members is the interface used by a Group to manage its members. It is not intended to be used on its own, but is shown here to give a detailed view of its methods.

class mdsynthesis.core.aggregators.Members(container, containerfile, logger)

Member manager for Groups.

add(*containers)

Add any number of members to this collection.

Arguments:
containers

Sims and/or Groups to be added; may be a list of Sims and/or Groups; Sims or Groups can be given as either objects or paths to directories that contain object statefiles

containertypes

Return a list of member containertypes.

data

The data of the Container.

Data are user-generated pandas objects (e.g. Series, DataFrames), numpy arrays, or any pickleable python object that are stored in the Container for easy recall later. Each data instance is given its own directory in the Container’s tree.

names

Return a list of member names.

Members that can’t be found will have name None.

Returns:
names

list giving the name of each member, in order; members that are missing will have name None

remove(*members, **kwargs)

Remove any number of members from the Group.

Arguments:
members

instances or indices of the members to remove

Keywords:
all

When True, remove all members [False]

uuids

Return a list of member uuids.

Returns:
uuids

list giving the uuid of each member, in order