Using Sims to dissect trajectories

Sim objects are designed to store datasets that are obtained from a single simulation, and they give a direct interface to trajectory data by way of the MDAnalysis Universe object.

To generate a Sim from scratch, we need only give it a name. This will be used to distinguish the Sim from others, though it need not be unique. We can also give it a topology and/or trajectory files as we would to an MDAnalysis Universe

>>> from mdsynthesis import Sim
>>> s = Sim('scruffy', universe=['path/to/topology', 'path/to/trajectory'])

This will create a directory scruffy that contains a single file (Sim.<uuid>.h5). That file is a persistent representation of the Sim on disk. We can access trajectory data by way of

>>> s.universe
<Universe with 47681 atoms>

The Sim can also store selections by giving the usual inputs to Universe.selectAtoms

>>> s.selections.add('backbone', 'name CA', 'name N', 'name C')

And the AtomGroup can be conveniently obtained with

>>> s.selections['backbone']
<AtomGroup with 642 atoms>

Note

Only selection strings are stored, not the resulting atoms of those selections. This means that if the topology of the Universe is replaced or altered, the AtomGroup returned by a particular selection may change.

Multiple Universes

Often it is necessary to post-process a simulation trajectory to get it into a useful form for analysis. This may involve coordinate transformations that center on a particular set of atoms or fit to a structure, removal of water, skipping of frames, etc. This can mean that for a given simulation multiple versions of the raw trajectory may be needed.

For this reason, a Sim can store multiple Universe definitions. To add a definition, we need a topology and a trajectory file

>>> s.universes.add('anotherU', 'path/to/topology', 'path/to/trajectory')
>>> s.universes
<Universes(['anotherU', 'main'])>

and we can make this the active Universe with

>>> s.universes['anotherU']
>>> s
<Sim: 'scruffy' | active universe: 'anotherU'>

Only a single Universe may be active at a time. Atom selections that are stored correspond to the currently active Universe, since different selection strings may be required to achieve the same selection under a different Universe definition. For convenience, we can copy the selections corresponding to another Universe to the active Universe with

>>> s.selections.copy('main')

Need two Universe definitions to be active at the same time? Re-generate a second Sim instance from its representation on disk and activate the desired Universe.

Resnums can also be stored

Depending on the simulation package used, it may not be possible to have the resids of the protein match those given in, say, the canonical PDB structure. This can make selections by resid cumbersome at best. For this reason, residues can also be assigned resnums.

For example, say the resids for the protein in our Universe range from 1 to 214, but they should actually go from 10 to 223. If we can’t change the topology to reflect this, we could set the resnums for these residues to the canonical values

>>> prot = s.universe.selectAtoms('protein')
>>> prot.residues.set_resnum(prot.residues.resids() + 9)
>>> prot.residues.resnums()
array([ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19,  20,  21,  22,
        23,  24,  25,  26,  27,  28,  29,  30,  31,  32,  33,  34,  35,
        36,  37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47,  48,
        49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,
        62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
        75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,
        88,  89,  90,  91,  92,  93,  94,  95,  96,  97,  98,  99, 100,
       101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113,
       114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,
       127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,
       140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152,
       153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165,
       166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178,
       179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
       192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204,
       205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217,
       218, 219, 220, 221, 222, 223])

We can now select residue 95 from the PDB structure with

>>> s.universe.selectAtoms('protein and resnum 95')

and we might save selections using resnums as well. However, resnums aren’t stored in the topology, so to avoid having to reset resnums manually each time we load the Universe, we can just store the resnum definition with

>>> s.universes.resnums('main', s.universe.residues.resnums())

and the resnum definition will be applied to the Universe both now and every time it is activated.

Reference: Sim

class mdsynthesis.Sim(sim, universe=None, uname='main', location='.', coordinator=None, categories=None, tags=None)

The Sim object is an interface to data for single simulations.

Generate a new or regenerate an existing (on disk) Sim object.

Required arguments:
 
sim

if generating a new Sim, the desired name to give it; if regenerating an existing Sim, string giving the path to the directory containing the Sim object’s state file

Optional arguments when generating a new Sim:
 
uname

desired name to associate with universe; this universe will be made the default (can always be changed later)

universe

arguments usually given to an MDAnalysis Universe that defines the topology and trajectory of the atoms

location

directory to place Sim object; default is the current directory

coordinator

directory of the Coordinator to associate with the Sim; if the Coordinator does not exist, it is created; if None, the Sim will not associate with any Coordinator

categories

dictionary with user-defined keys and values; used to give Sims distinguishing characteristics

tags

list with user-defined values; like categories, but useful for adding many distinguishing descriptors

Note: optional arguments are ignored when regenerating an existing
Sim
basedir

Absolute path to the Container’s base directory.

This is a convenience property; the same result can be obtained by joining :attr:location and :attr:name.

categories

The categories of the Container.

Categories are user-added key-value pairs that can be used to and distinguish Containers from one another through Coordinator or Group queries. They can also be useful as flags for external code to determine how to handle the Container.

containertype

The type of the Container.

coordinators

The locations of the associated Coordinators.

Change this to associate the Container with an existing or new Coordinator(s).

data

The data of the Container.

Data are user-generated pandas objects (e.g. Series, DataFrames), numpy arrays, or any pickleable python object that are stored in the Container for easy recall later. Each data instance is given its own directory in the Container’s tree.

location

The location of the Container.

Setting the location to a new path physically moves the Container to the given location. This only works if the new location is an empty or nonexistent directory.

name

The name of the Container.

The name of a Container need not be unique with respect to other Containers, but is used as part of Container’s displayed representation.

selections

Stored atom selections for the active universe.

Useful atom selections can be stored for the active universe and recalled later. Selections are stored separately for each defined universe, since the same selection may require a different selection string for different universes.

tags

The tags of the Container.

Tags are user-added strings that can be used to and distinguish Containers from one another through Coordinator or Group queries. They can also be useful as flags for external code to determine how to handle the Container.

universe

The active universe of the Sim.

Universes are interfaces to raw simulation data. The Sim can store multiple universe definitions corresponding to different versions of the same simulation output (e.g. post-processed trajectories derived from the same raw trajectory). The Sim has at most one universe definition that is “active” at one time, with stored selections for this universe directly available via Sim.selections.

To have more than one universe available as “active” at the same time, generate as many instances of the Sim object from the same statefile on disk as needed, and make a universe active for each one.

universes

Manage the defined universes of the Sim.

Universes are interfaces to raw simulation data. The Sim can store multiple universe definitions corresponding to different versions of the same simulation output (e.g. post-processed trajectories derived from the same raw trajectory). The Sim has at most one universe definition that is “active” at one time, with stored selections for this universe directly available via Sim.selections.

The Sim can also store a preference for a “default” universe, which is activated on a call to Sim.universe when no other universe is active.

uuid

Get Container uuid.

A Container’s uuid is used by other Containers to identify it. The uuid is given in the Container’s state file name for fast filesystem searching. For example, a Sim object with state file:

'Sim.7dd9305a-d7d9-4a7b-b513-adf5f4205e09.h5'

has uuid:

'7dd9305a-d7d9-4a7b-b513-adf5f4205e09'

Changing this string will alter the Container’s uuid. This is not generally recommended.

Returns:
uuid

unique identifier string for this Container

Reference: Universes

The class mdsynthesis.core.aggregators.Universes is the interface used by a Sim to manage Universe definitions. It is not intended to be used on its own, but is shown here to give a detailed view of its methods.

class mdsynthesis.core.aggregators.Universes(container, containerfile, logger)

Interface to universes.

activate(handle=None)

Make the selected universe active.

Only one universe definition can be active in a Sim at one time. The active universe can be accessed from Sim.universe. Stored selections for the active universe can be accessed as items in Sim.selections.

If no handle given, the default universe is loaded.

If a resnum definition exists for the universe, it is applied.

Arguments:
handle

given name for selecting the universe; if None, default universe selected

add(handle, topology, *trajectory)

Add a universe definition to the Sim object.

A universe is an MDAnalysis object that gives access to the details of a simulation trajectory. A Sim object can contain multiple universe definitions (topology and trajectory pairs), since it is often convenient to have different post-processed versions of the same raw trajectory.

Using an existing universe handle will replace the topology and trajectory for that definition; selections for that universe will be retained.

If there is no current default universe, then the added universe will become the default.

Arguments:
handle

given name for selecting the universe

topology

path to the topology file

trajectory

path to the trajectory file; multiple files may be given and these will be used in order as frames for the trajectory

current()

Return the name of the currently active universe.

Returns:
handle

name of currently active universe

deactivate()

Deactivate the current universe.

Deactivating the current universe may be necessary to conserve memory, since the universe can then be garbage collected.

default(handle=None)

Mark the selected universe as the default, or get the default universe.

The default universe is loaded on calls to Sim.universe or Sim.selections when no other universe is attached.

If no handle given, returns the current default universe.

Arguments:
handle

given name for selecting the universe; if None, default universe is unchanged

Returns:
default

handle of the default universe

define(handle, pathtype='abspath')

Get the stored path to the topology and trajectory used for the specified universe.

Note: Does no checking as to whether these paths are valid. To
check this, try activating the universe.
Arguments:
handle

name of universe to get definition for

Keywords:
pathtype

type of path to return; ‘abspath’ gives an absolute path, ‘relCont’ gives a path relative to the Sim’s state file

Returns:
topology

path to the topology file

trajectory

list of paths to trajectory files

remove(*handle)

Remove a universe definition.

Also removes any selections associated with the universe.

Arguments:
handle

name of universe(s) to delete

resnums(handle, resnums)

Define resnums for the given universe.

Resnums are useful for referring to residues by their canonical resid, for instance that stored in the PDB. By giving a resnum definition for the universe, this definition will be applied to the universe on activation.

Will overwrite existing resnum definition if it exists.

Arguments:
handle

name of universe to apply resnums to

resnums

list giving the resnum for each residue in the topology, in atom index order; giving None will delete resnum definition

Reference: Selections

The class mdsynthesis.core.aggregators.Selections is the interface used by a Sim to access its stored selections. It is not intended to be used on its own, but is shown here to give a detailed view of its methods.

class mdsynthesis.core.aggregators.Selections(container, containerfile, logger)

Selection manager for Sims.

Selections are accessible as items using their handles. Each time they are called, they are regenerated from the universe that is currently active. In this way, changes in the universe topology are reflected in the selections.

add(handle, *selection)

Add an atom selection for the attached universe.

AtomGroups are needed to obtain useful information from raw coordinate data. It is useful to store AtomGroup selections for later use, since they can be complex and atom order may matter.

If a selection with the given handle already exists, it is replaced.

Arguments:
handle

name to use for the selection

selection

selection string; multiple strings may be given and their order will be preserved, which is useful for e.g. structural alignments

asAtomGroup(handle)

Get AtomGroup from active universe from the given named selection.

If named selection doesn’t exist, KeyError raised.

Arguments:
handle

name of selection to return as an AtomGroup

Returns:
AtomGroup

the named selection as an AtomGroup of the active universe

copy(universe)

Copy defined selections of another universe to the active universe.

Arguments:
universe

name of universe definition to copy selections from

define(handle)

Get selection definition for given handle and the active universe.

If named selection doesn’t exist, KeyError raised.

Arguments:
handle

name of selection to get definition of

Returns:
definition

list of strings defining the atom selection

keys()

Return a list of all selection handles.

remove(*handle)

Remove an atom selection for the attached universe.

If named selection doesn’t exist, KeyError raised.

Arguments:
handle

name of selection(s) to remove