Description of the program SIMULAID: a collection of utilities designed to help setting up and analyze molecular simulations.

Mihaly Mezei

Department of Pharmacological Sciences,
Icahn Sinai School of Medicine at Mount Sinai,
New York, NY 10029

Apr. 17, 2023.

Reference: M. Mezei, Simulaid: a simulation facilitator and analysis program, J. Comp. Chem., 31, 2658-2668 (2010). DOI:10.1002/jcc.21551

0. GENERAL INTRODUCTION TO SIMULATION SETUPS

I. DESCRIPTION OF THE FUNCTIONS OF THE PROGRAM

II. RUNNING THE PROGRAM - INPUT

III. THE OUTPUT ON THE TERMINAL

IV. FILE FORMATS

V. INSTALLATION

VI. CHANGING DIMENSION

VII. IRIS GL GRAPHICS VERSION

VIII. REFERENCES

0.	GENERAL INTRODUCTION TO SIMULATION SETUPS
I.	DESCRIPTION OF THE FUNCTIONS OF THE PROGRAM
II.	RUNNING THE PROGRAM - INPUT
III.	THE OUTPUT ON THE TERMINAL
IV.	FILE FORMATS
V.	INSTALLATION
VI.	CHANGING DIMENSION
VII.	IRIS GL GRAPHICS VERSION
VIII.	REFERENCES

0. GENERAL INTRODUCTION TO SIMULATION SETUPS

In general, a simulation requires three different types of information:

Description of the interaction energy terms
Description of the molecules in the system
Description of the boundary conditions.

For analysis, the

Trajectory (history) file

may also be needed.

Simulaid will help in various aspects of establishing these. In order to help clarify the various functionalities of Simulaid, a brief description is given of the various components of these descriptions. Frequent reference will be made to the PDB (Protein Data Bank) format that is considered the 'lingua franca' of molecular simulations (although it has a number of 'dialects') and to formats used by the simulation packages Amber, Charmm, Gromos/Gromacs, Discover, Macromodel, and MMC.

0.1. Description of the interaction energy terms

Most simulation packages rely on the concept of atomtype for the specification of the interaction potential, supplemented by a partial charge. It is generally assumed that force constants, dispersion and exhange repulsion terms are transferable to a larger extent than the partial charges.

0. GENERAL INTRODUCTION TO SIMULATION SETUPS

In general, a simulation requires three different types of information:

Description of the interaction energy terms
Description of the molecules in the system
Description of the boundary conditions.

For analysis, the

Trajectory (history) file

may also be needed.

0.1. Description of the interaction energy terms

The atomtypes can be defined by alphanumeric labels (Amber, Charmm, Gromos/Gromacs, Discover) or simple integers (Macromodel, OPLS) or both (MMC). In most cases, a parameter file contains the data necessary to extract the various coefficients to be used in the energy expressions (Amber, Charmm, Gromos/Gromacs, Discover) but they may be built into the program source code (Macromodel) or both (MMC).

0.2. Description of the molecules in the system

Most simulation packages use a hierarchical approach to define the assembly of atoms to be simulated.

The largest unit is called a segment in Charmm, a chain in PDB. MMC has two segments: the solvent and the solute. Segments or chains are likely to be different (macro)molecules, although they don't have to be.
Each segment is generally broken into residues. A residue is usually a repeating unit that may occur several times at different places (i.e., an amino or nucleic acid).
In some instances (Charmm, Discover) residues are further partioned into groups whose total charge is likely to be an integer (in most cases zero).
In any event, each residue involves a collection of atoms.

Simulaid can read molecular structures in the following formats:

Charmm .CRD
Charmm .CRD, extended format
Brookhaven PDB
Charmm PDB
Autodock3 .pdbqs
Autodock4 .pdbqt
Macromodel
MMC .slt
Gromos/Gromacs
Tripos .mol2
Schrodinger Maestro .mae
PDBx/mmCIF .cif
InsightII .car
NXYZ (atomic number and coordinates)
SXYZ (chemical symbol and coordinates)
SXYZRQ (chemical symbol, coordinates, atomic radius and charge)
Grasp .crg

Depending on the input format used, Simulaid can also write in most of the formats listed above. It can not write a .mae or a .cif file and it can only write a .mol2 file if the input was also in .mol2 format.

Since residues are considered frequently occuring molecular units, several packages (Amber, Charmm, Discover) use a so-called residue topology file (RTF) for their definition. For amino and nucleic acids the PDB nomenclature is followed, but only approximately. This standard includes definitions for amino acid and nucleic acid residues. These definitions consist of a three-letter residue name and atomnames that can be up to four characters long.

Simulaid can read topology files in the following formats:

Charmm RTF
Amber top

To arrive at the fully defined system, most packages rely on an annotated coordinate file containing the Cartesian coordinates of the atoms in the system as well as on the RTF and parameter files. For Amber, the coordinate file is in the PDB format while for Charmm it is in the so-called CRD format. Once the list of residues and atoms are known to Amber or Charmm, they will turn to their RTF file where the combination of residue names and atomnames will provide for each atom their atomtypes and partial charges. The .car file of Discover, the .dat file of Macromodel, the .mol2 file of Tripos, as well as the .slt file of MMC, however will contain not only the residue names and atomnames, but the atom types and partial charges as well.

The record formats of the formatted input records can be listed by Simulaid.

The RTF file of Amber and Charmm also specifies the bonding pattern. Macromodel and Tripos, on the other hand, require the bond list in the coordinate file while MMC will deduce the bonding pattern from the coordinates themselves.

Segments are usually defined by the input stream.

0.3. Description of the boundary conditions.

The simulated system may be a droplet in vacuum or a unit cell of a periodic system. Each program has to be explicitly instructed in the input stream as to the size and shape of the periodic cell - the options vary from package to package. Charmm uses an additional file of the type .img that specifies the centers of all surrounding cells, but most packages use built-in routines for the different cell types allowed.

0.4. Trajectories. Simulaid can read trajectories in the following formats:

Charmm DCD
Amber
MMC (both binary and ASCII formats)
Macromodel
Stack of PDB files
Stack of Charmm CRD files

If a trajectory is in multiple files then the file names should be derived from the name of the first segment. Assuming that the trajectory starts with file x.dcd, subsequent segments should be called x_2.dcd, x_3.dcd, etc.

NOTE on binary files: it is important to compile Simulad with the same (type) of compilers as the program that generated the trajectory was compiled with. Binary files generated with Fortran-77 and Fortran-90 are different; similarly, they can be different for 32-bit and 64-bit systems.

0.5. Clustering options. Simulaid has implemented several clustering algorithms.

Single link clustering: For a given threshold distance for any pair of clusters the shortest distance between representatives of each cluster is larger than the threshold. For any member of a cluster (of more than one members) there is at least one other cluster member within the threshold distance.
K-means/K-medoids clustering: with the number of clusters set by the user, the program iteratively assigns the objects to the cluster whose center is the nearest to it. The center of the cluster can be established in two ways:
- K-means: Calculate the center-of-mass of each cluster and set the center there. This is meaningful when the coordinates of the nodes are also known and the distances represent the cartesian distance between these nodes.
- K-medoids: Pick the member of each cluster as its center whose distance from the member farthes from is is the minimum. This option is useful when the distances are not representing cartesian distances.
A clustering based on maximum number of neighbors: Pick the (an) object that has the most number of neighbors (i.e., within threshold distance); and call a cluster and remove it and it neighbors from the list. Adjust the neighbor counts and repeat until the list is exhausted.
Density-based clusterings (REF1). These are a variant of single link clustering where the bond definition is tied to the neighborhood:
- the number of common neighbors is not less than an input minimum (CNN)
- as above and also the nodes have to be within a maximum distance
- both nodes have a minimum number of neighbors and are within a maximum distance (DBSCAN)
Clustering into cliques, i.e., clusters where every pair of members in a cluster is within a threshold distance. For a given threshold this clustering is not unique since the clustering is necessarily a partitioning but there can be cliques that overlap.

Single-link and maximum neighbor based clustering can be run either by specifying a distance cutoff, resulting in an unspecified number of clusters or by specifying the number of clusters in which case the cutoff is udjusted until this number of cluster results.

Furthermore, single-link clustering can be used to do a hierarchical clustering using several different distance cutoffs. In this case clusters obtained with a larger cutoff are further clustered with the next smaller cutoff.

I. DESCRIPTION OF THE FUNCTIONS OF THE PROGRAM

I.0. Online help: Any prompt containing a '?' within the bracketed field (e.g., 'Do you want ... [y/n/?] ?' or 'Number of ... [100,?]=') will respond with an explanation if the user types just a '?'. Some of the prompts asking for a selection from a list also has a help option that can be activated by typing a '?'.

I.1. Operations performed by several functions:

Online tips: Any prompt containing a '+' within the bracketed field (e.g., 'Do you want ... [y/n/+] ?') will provide a tip if the user types just a '+'.

In each case, the prompt is repeated after the explanation has been printed.

For translation/rotation and trajectory conversions Simulaid offers the option to reset the system into the periodic cell. The use of periodic boundary conditions may be restricted to two or one dimensions; i.e., the resetting is applied only in a selected coordinae plane or along a selected axis.

This option also selects periodic images of each solute molecule in such a way that the solute appears as compact as possible. Since Simulaid considers each solute segment a single molecule, this may result in moving groups of molecules (e.g., lipids) always together. Thus, for functions that may involve resetting the system into a PBC cell, Simulaid offers the users the option to define a number of residues as molecular residues, i.e., groups of atoms that are to be reset into the PBC cell together but independent of the rest of the solute. Furthermore, residues can also be defined as ions - these are also treated individually but they are not moved around only to keep them in the simulation cell, but not in an attempt to make the solute more compact,

I.2. Specific operations performed by Simulaid:

Geometry optimizations
- Find the sphere of smallest radius still enclosing a set of atoms and translate the atoms to the center of this smallest enclosing sphere (REF2). This allows the solvation of a solute with the smallest number of solvents providing a solvation layer of given minimum thickness. It is possible to input a solvated system where the solvents are excluded from the sphere calculations but will be subsequently translated along with the atoms included (i.e., the solute); solvents outside a layer of user-specified thickness will be eliminated.
- Find the orientation of the solute molecule so that the volume of the cube or rectangle enclosing it is the smallest (REF3). Smallest enclosing cube is optimal for a Delphi run, smallest enclosing rectangle is optimal for grand-canonical runs where a bulk solvent layer is required. The user is asked if cube or rectangle is required. The optimization is performed starting from the input orientation as well as from a user defined number of random orientations to provide several local minima of the optimization problem from which the best will be chosen. Option is provided for limiting the optimization to rotations around one of the coordinate axes.
- Find the orientation of the solute molecule that maximizes the smallest image-image distance (REF1). The optimization is performed starting from the input orientation as well as from a user defined number of random orientations to provide several local minima of the optimization problem. The user has a choice of optimizing the orientation in a periodic simulation cell defined on input or the construction of the smallest possible periodic simulation cell with a given minimum image-image distance. Again, option is provided for limiting the optimization to rotations around one of the coordinate axes.
  For the minimization of the cellsize the initial cell dimensions are obtained from the maximum extension of the molecule in the X, Y, and Z direction for cubic, rectangular and hexagonal prism cells, from the smallest enclosing sphere for a face-centered cubic, hexagonal close packed, or truncated octahedron cell and from input for a cell defined by a Charmm image file. Once the best orientation is found, the cell's dimensions are reduced until the smallest image-image distance reaches a user-defined minimum. Comparing the runs with different starting orientations, the program choses the one with the smallest volume.
Perform cleanup operations. Cleanup operations include:
- ensuring that atom and residue numbers are consecutive (starting from user-defined atom and residue numbers);
- ensuring that all atoms of a residue are in a contiguous stretch;
- ensuring that all PDB files have a chain (segment) ID. When none present in the input file, the first chain id will be 'A'. Every time a 'TER' record is encountered in the input file the segment id will be changed to the next upper-case letter, followed by the next lower case letters and finally, the digits '0' to '9'.
- ensuring that Macromodel files have both 1-letter and 3-letter residue names.
- For PDB output files, it gives an option to write CONECT records as well
- for MMC's .slt and Autodock's .pdbqs or .pdbqt files it provides an option to adjust charges to result in integral residue charges.
Perform a variety of data conversions.
- Conversions changing file formats
  - Rearrange atoms within a residue according to a template. Atomnames within a residue can be sorted and selected to conform to a prescribed template. This may be useful, for example, when a trajectory file is created from configurations that don't conform to the RTF file used by the program reading the trajectory, or when the configuration contains extra atoms (such as lone pairs). The atoms and their desired order is specified by a template file and/or by the RTF file (in Amber, Charmm or Gromacs formats only) describing the residues in the desired order. Currently, template files are provided for Amber, Amber94, and Charmm22. If the residue is not found in the template or the number of atoms in the template is less than the number in the structure or if the correspondence between the two atomsets is not one-to-one then that residue will be left unchanged. The names of the template files (found in the installation directory) are listed by the program. Again, they can be modified by the user and placed in their own directory to allow the treatment of non-standard, patched or other terminal residues.
  - Convert to a different file syntax. Conversions to most of the recognized input formats can be requested. The following limitations apply to these conversions:
    - If the input format was PDB, the charges will be left as zero;
    - Macromodel output may select generic atomtypes for the potential function specification when it fails to recognize the bonding pattern around an atom
    - For Macromodel output, the program may not be able to determine the proper bond orders. The bond orders will be especially problematic if the input file does not contain hydrogens
    - InsightII .car files will have atom symbols for potential types
    - Can not convert to Grasp .crg format - use the Create a Grasp .crg file option for that
    - Can not convert to Tripos .mol2 format
  - Create an input configuration for the MMC Monte Carlo program. The MMC Monte Carlo program's input requires coordinates, charges and potential type information for each atom. There are three options to establish atomtypes and partial charges:
    - For Macromodel input and Insight .car file input (assuming that the latter has been prepared with AMBER potential types) the program will use the Amber codes;
    - For Charmm or Amber input the atom codes can be established from the residue and atomnames from the Charmm PSF file or the Amber top file corresponding to the input structure.
    - If no PSF or top file is available, a conversion files can be used: one of amb_prot_ua.dat, amb_all.dat, amb_all94.dat, and cha_all22.dat are provided in the distribution or a file prepared by the user. Terminal residues may have to be corrected manually, though. The Charmm conversion file also has the protonated ASP and GLU residue information, using residue names ASX and GLX, respectively.
    - For Charmm, Amber and Gromacs the conversion information can be also extracted from multiple RTF file(s) - the user will be prompted for the filenames.
  - Convert trajectories. This is currently limited to Charmm, Amber, MMC and Macromodel formats. During conversions, the user has the option to
    - rearrange the atoms within residues.
    - place only selected configurations on the output file.
    - center the solute molecule(s) in a periodic cell.
    - rotate the whole system for each frame around a selected axis or by an inputted rotation matrix.
    - Split an MMC trajectory generated in the grand-canonical ensemble into Amber trajectories with fixed number of solvents.
  - Create an input for Amsol 3.5. Amsol requires the specification of atomtypes according to its own convention besides the atomic coordinates and atomic numbers. This functionality determines these codes based on the molecule's connectivity table.
  - Create an input for Gaussian's ONIOM option. Besides the X, Y, Z coordinates of each atom, this option requires the atomic number, the Amber atom type, partial charge of the atoms as well as the treatment category (H, M, or L). This option works analogously to the input for creating input for MMC: it can use the built-in list for deducing types and charges from atom names and/or topology file(s) in Amber, Charmm or Gromos/Gromacs format. The H, M, and L regions can be selected by user provided lists or by spheres of different radii drawn around a user-selected atom. For the latter, option is provided to extend the list specified by the inner spheres to avoid breaking non C-C bonds. Optionally, it also produces a connectivity input, with bond orders deduced from the coordinates, just like for conversion to Macromodel format.
  - List the positions of the various items in a coordinate file record for formatted input types.
- Conversions changing atom and or residue names
  - 'Regularize' PDB atomnames. Brookhaven specification uses the first two characters of the name as the right-adjusted chemical symbol and put trailing numeric digits to the place of the first space for hydrogens. This functionality enforces this convention.
  - Undo PDB regularization.This functionality moves all leading numeric characters found in atomnames to the end of the name.
  - Left-adjust atom names.
  - Convert residue and atomnames from different conventions. Unfortunately, PDB atomnames (and even file syntax) follow many different conventions and are slightly different in the various software systems in use. This functionality modifies atom and residue names based on the source of the names and the software system in use. Conversion of the atom and residue names are governed by a file called pdb_nam.dat (found in the installation directory). Residues or atomnames not in the file will be left unchanged. At present, the file contains data to interconvert protein and nucleic acid residues for the Amber, Amber94, Charmm, Moil, IUPAC, Macromodel, ChemX and PDB conventions. Conversions from Macromodel would allow for dropping lone-pair 'atoms' and the change of names of the form C0* to their corresponding PDB-type names only. As a result, conversions to Macromodel convention have been disabled. When converting to PDB convention, the one selected by InsightII will be used. When converting from PDB convention, a list of known variants will be examined (most of these variants were compiled by Dr. D. Strahs). If there is a copy of the conversion file pdb_nam.dat in the local directory (the directory Simulaid is run from), it will be used instead, so users can add additional conversions to the existing residues or add new residues. A local copy of pdb_nam.dat can also be used for making selected changes by eliminating or changing conversion information from the 'master' file.
Trajectory - structure file conversions
- Unpack a collection of configurations into single files. This functionality takes a large file containing a number of configurations and separates them into single configurations. There are several options of selecting the files to be unpacked. An input file XXX.YYY will result in files XXX.1.YYY, XXX.2.YYY, etc.
- Unpack a trajectory into single configurations, either in a single file or in separate files. The selection options and the naming convention of the new files are the same as for unpacking configurations (see above). The format of the unpacked file can be specified by the user. For PDB output, option is given mark eachs structure as MODEL.
- Convert (pack) a series of configurations into a trajectory file. This option will create a Charmm, Amber or MMC binary trajectory file and put the input configuration as well as all subsequent configurations into this file as consecutive snapshots. When packing into MMC binary format, Simulaid will search for energy information if the input structures are in PDB or Charmm CRD format s follows:
  - In Charmm .CRD files, Checking the title records for the string '* energy=' or '* Energy=', or '* E=', followed by the energy
  - In PDB files, Checking the REMARK records for the string 'REMARK energy=' or 'REMARK Energy=', or 'REMARK E=', followed by the energy
Edit the conformation (translate/rotate, add/delete atoms, etc)
- Edit the configuration. Editing allows to
  - select (keep only one) chain/segment
  - excise (drop only one) chain/segment
  - keep only the backbone
  - keep only the alpha carbons
  - drop all aliphatic (bonded to carbon) hydrogens (and add its charge to the carbon's)
  - select atom range to keep
  - select atom range to drop
  - select residue range to keep
  - select residue range to drop
  - select residue name to drop
- Translate and/or rotate the inputted structure
  - The following translations can be performed
    - Translate by a user-specified vector.
    - Translate by a user-specified vector and reset the molecules into a user-specified periodic cell.
    - Center the solute molecule(s) in a user-specified periodic cell. For more than one solute molecules, the images resulting in the most compact solute will be selected.
    Translation is useful when switching between sytems with the origin at the corner (e.g., Amber) and at the center (e.g., Charmm and MMC) or when a simulation lets molecules move away from the periodic cell.
  - Rotation can be performed around a user-specified axis by a user-specied angle or by a user-specified rotation matrix
  - Scale the coordinates (e.g., change units)
  - Separate molecules (to obtain a clearer picture)
  - Swap the chirality of a selected atom
- Permute the coordinates of the configuration read.
- Modify the configuration. Modifying allows to
  - replace an atom
  - add an atom
  When replacing a terminal atom (i.e., on with only one bonded neighbor) the bond length can be changed. When adding a new atom X, the program will let the user establish a bonded sequence of heavy atoms, R3-R2-R1-X and the user will have to specify the R1-X distance, R2-R1-X angle and the R3-R2-R1-X torsion angle.
- Replace the coordinates of a configuration. This option allows the replacement of just the coordinates in a structure without changing the rest of the file. The replacement can be
  - simply a copy of an other structure
  - the coordinates of the original structure translated and rotated to obtain the best overlay to an inputted reference structure
  The source of the coordinates or the reference structure can be in any format except .mol2 for the system to be changed.
- Overlay a structure on a reference structure
- Transform a peptide to retro-inverso. This option essentially swaps the petide >C=O with the paptide >NH. It is recommended to do a short minimization on the transformed structure, especially when prolenes are involved.
- Replace the charges in a configuration. For now, both the input and the replacing structure must be in Autodock's .pdbqt format.
- Extract the CA carbons of the protein backbone into a PDB file
- Change the (solvent) water molecules to their gas phase geometry. Minimizations or molecular dynamics may have been run without restraining the solvents to their experimental geometry. This option 'fixes' these solvents.
- Eliminate solvents outside a simulation cell. Simulation setup may involve the 'soaking' of a solute in some solvent bath. This option allows the carving out of a periodic cell of any of the above mentioned types.
- Reverse the optimization. This option transforms back the structure in the original position/orientation. The translation/rotation information is read from the comment lines in the input structure file as saved there by Simulaid.
- Create a full unit cell from the asymmetric unit. This option extracts from the PDB file the transformation information that defines the symmetry-related images of the asymmetric unit and performs these transformations.
- Create biological oligomers from the asymmetric unit. This option extracts from the PDB file the transformation information that defines the symmetry-related images forming the biological oligomers and performs these transformations.
- Gather all crystal contacts of the asymmetric unit. This option gathers all copies of the asymmetric unit that are in contact with the first copy. Each copy may be related to the first asymmetric unit by symmetry transformation and/or unit cell translation.
- Create the coordinates of all atoms in the neighboring PBC cells
Miscellaneous structure-derived files, analyses (sequence, Charmm RTF, Amber energy decomposition, UHBD dat, etc)
- Extract the sequence list from the inputted structure. For structures carrying residue information this function will extract the list of the residues in a user selectable format:
  - Charmm segment definition format
  - PDB (SEQRES) format
  - PIR format (first line: title; subsequent lines: one letter residue codes without any separator).
  - GCG - the proprietary format of the Wisconsin GCG program
- Convert sequences
- Create a Grasp .crg file from RTF file(s)
- Create a UHBD .dat file from RTF and parameter file(s) (for now, only in Charmm format)
- Print the centers of neigboring image cells and cell vertices. The cell centers will written as a pdb file called imagecell.pdb and the vertices in an other PDB file called pbcvertices.pdb, complete with CONECT statements to draw the cell polygon's edges.
- Write a PDB file for a user-defined rectangle
- Generate torsion angle input for Monte Carlo calculations:
  - Torsion angle list in Macromodel (Bmin) format (general or protein (i.e., with rigid peptide bond))
  - Torsion angle list in MMC format for proteins (backbone with rigid peptide bond and sidechains or sidechains only) or for a general molecule.
- Create Charmm topology file residue records for residues specified by the user. When the input is in InsightII .car format, the atom types are also converted, using the algorithm (and routines) of the program Intocham.
- Extract and tabulate results for selected properties from the large output tables created by the script mm_pbsa.pl that is part of the Amber suite of programs analyzing Amber trajectories. Simulaid will recognize if the input tables contain residue-residue contributions or residue-molecule contributions. The table colums are the selected energy contribution terms, except for the case when one property is selected to tabulate from from residue-residue tables - in that case the table printed will have both rows and colums as residues, the range of which is specified by the user. In addition, correlations between selected properties can also be calculated.
Perform a number of conformation analyses on the solute.
The analyses can be done on the configuration read or on a trajectory specified by the user. When analyzing a trajectory, options are provided for analyzing only selected configurations. The time axis of plots of the evolution of various properties may be labeled by the frame number, or in units of time (ps, ns, or ms). The following analyses are implemented:
- Geometry/topology analyses
  - Print a complete list of neighbors, bond lenths, angles and torsions and mark atoms that are chiral centers (in some rare cases the algorithm may miss a subtle chirality).
  - Print a complete list of torsion angles and the corresponding 1-4 distances
  - Perform a functional group analysis that groups atoms into functional groups, list these groups and prints the neighbors of each member of each group
  - Print the longest connected chain of the molecule (one atom per line) and for each atom prints the longest side chain.
  - Print a complete list of bonds types and their lengths, angles and their values as well as statistics (min, max, average) of the lengths of bonds and values on angles between various elements
  - Allow the user to change the bond-threshold values for selected atoms
- Finding and tracking various bonds
  The following types of bonds are considered:
  - hydrogen bond
  - salt bridge
  - hydrophobic bond,
  - VdW contact based on distance
  - contact based on mututal proximity
  - atoms connected by hydrogen-bond bridges
  Simulaid finds the atom pairs that form the selected type of bond.
  For trajectories (except for hydrogen-bond bridges), Simulaid
  - Saves or reads a previously written set of bond tracks
  - Plots the history of each hydrogen bond (i.e., tracks when it was on or off)
  - Maps the atom pairs to residue pairs and plot the residue-residue bond history
  - Calculates the frequencies of residue pairs being bonded and generate a 2D map of these frequencies.
  - Optionally, calculates the correlation c(i,j) between selected hydrogen bonds
  - Optionally, clusters these hydrogen bonds using (1-c(i,j))^e or (1/c(i,j))^e as the distance function; the exponent e can be 1,2,3,... or 1/2, 1/3, 1/4, ...
  - Calculates the fraction of on and off states and the # of state changes
  - Plots a matrix (map) of the residue-residue bond strengths
  - Compares the residue-residue bond map calculated above from two different matrices of the same size (e.g., a wild type and a mutant). The difference plot can use
    - the actual differences
    - the absolute value of the differences
    - the discretized differences: -1 if structure 1 lost the bonds between this pair; +1 if structure 2 lost the bonds between this pair; zero otherwise.
  - Optionally, calculates the bond autocorrelation functions to estimate bond lifetimes
  - Optionally, writes a PDB file of the system where
    - the B factor column contains the fraction of time that residue was bonded (either the CA atom or, optionally, all atoms of that residue)
    - a line is drawn between the CA atom (or the first atom of the residue if no CA atom was found) with a He atom at the middle of the line. The He atoms form a separate segment withc chain id X and the residue number of eash He atom is just the bond number. Furthermore, the occupancy column contans the fraction of time the residue pair to which the He atom belongs was bonded.
    - if the residue-residue bond correlations were calculated then Simulaid can also draw a bond between the He atoms of bonds that are correlated (i.e., the bond correlation distance function is below a user specified threshold or between the He atoms of bonds that are anticorrelated (i.e., the bond correlation distance function is abowe a user specified threshold.
  Actions specific to different bonds:
  - hydrogen bonds:
    - Specify the distance thresholds for hydrogen bonds
    - Select atoms whose hydrogen bonds are to be tracked
    - Print a list of hydrogen bonds with the geometric parameters.
  - salt bridges:
    - Specify the atom name and/or partial charge that can form salt bridge
    - Specify the distance threshold for salt bridges
    - Select atoms whose salt bridges are to be tracked
    - Print a list of salt bridges with their length.
  - hydrophobic bonds:
    - Specify the carbon atom type that are considered hydrophobic
    - Specify the distance threshold for hydrophobic bonds
    - Select atoms whose hydrophobic bonds are to be tracked
    - Print a list of hydrophobic bonds with their length.
  - heavy-atom mutually proximal ontact bonds:
    - Select atoms whose contacts are to be tracked
  - heavy-atom contact bonds:
    - Specify the distance threshold for the contact
    - Select atoms whose contacts are to be tracked
  - hydrogen-bonded bridges
    - Select atoms whose bridges are to be tracked
    - Specify the residue name of the bridging molecule (e.g., HOH)
    - Specify the bridge lengths (maximum length considered: 4 bonds)
- Calculate atomic properties to be written on the data column of the output structure file.
  - Get the hydrophobicity (hydropathy) values of the residue the atom belongs to. The user can choose among Kyte-Doolittle, Eisenberg or White's scale or input a new one. A good resource (references, values) for the large number of such scales is the paper by J. Cornette et al. (REF4).
  - Calculate circular variance of the solute atoms and solvent molecules w.r.t. the solute (REF5).
  - Read in a DELPHI potential map (written in InsightII format) and calculate the potential at the sites of the solute atoms (InsightII has a DELPHI spectrum built in for coloring).
- Calculate molecular properties
  - Calculate a Monte Carlo estimate of the volume of the solute's combined vdW spheres, the volume of the shell region excluded by the solvent (of user-specified radius), the volume of the first solvation shell and, for multimolecular solutes the volume of the interface beween the solute molecules (the combined total and the pairwise values). A region is considered interface if it is between two atoms on different molecules and the solvation shells of the two atoms overlap. For trajectories, a plot of the evolution and the distributions of the excluded volume and first solvation shell volume is also given, as well as averages and correlations among the volumes calculated.
  - Calculate the principal axes of a selected solute domains. For trajectory scan, the angle between the initial and the current axes are also calculated and plotted. The initial structure can be set to either the first structure of the trajectory or the structure inputted at the start. The calculation of the principal axes can be restricted to a subset of the atoms in the selected solute molecule.
    For calculation on a single structure option is given to align the solute's principle axes to the Cartesian coordinate axes.
  - Calculate the radius of gyration, hydrodynamic radius and - when partial charges are available - the dipole moment of a selected solute molecule.
  - Average B factors for each chain/segment
- Membrane flatnesss
- Tracking solvent transit through a membrane
- RMSD's
  - Calculate the RMSD between the reference structure and the structures and the RMSF in a trajectory. The atoms used for overlay and calculation of the RMSD can be limited to a subset of the system (see the Edit the configuration option). The RMSF calculation gives the root mean square average of the differences in the atomic positions of all atoms in a residue, averaged over all frames in the trajectory.
  - Calculate the 2D-RMSD map over a trajectory. The calculations will give
    - The RMSD between all pairs of structures in a trajectory, printed and (optionally) visualized in a color-coded Postscript 2D map. Option is given to avarege and condnese blocks in the plot.
    - The average RMSD will also be calculated as a function of the run length.
    - The distribution of the MSD's, printed and (optionally) displayed in a Postscript plot.
    - The distribution of the MSD's divided by the distribution from an uncorrelated random walk (R^2 = c * N), printed and (optionally) displayed in a Postscript plot.
    - The structure for which the largest RMSD between itself and the rest of the structures is minimal
    - The structure for which the sum of MSD's between itself and the rest of the structures is minimal
    - Optionally the program can cluster the structures, using the calculated RMSD's as the distance measure.
      The clustering is based either on a user-given distance threshold, or on the user-defined requested number of clusters. Representative members (to be considered at the center of the cluster) of the cluster will be selected based on the following criteria:
      - The structure with the smallest maximum RMSD with the rest of the cluster members
      - The structure with the smallest average RMSD with the rest
      - The structure with the smallest energy within the cluster. This option is only available if energy was read from the trajectory. Currently, energies can be read from some of MMC formatted trajectories:
        
        Binary MMC trajectory
        MMC trajectory that is a set of Charmm .CRD files, having in the title the string '* energy=' or '* Energy=', or '* E=', followed by the energy
        MMC trajectory that is a set of PDB files, containing REMARK lines of the form 'REMARK energy=' or 'REMARK Energy=', or 'REMARK E=', followed by the energy
        When the energy is available, for each cluster the average energy will also be calculated and printed.
    - Optionally, separate trajectories can be generated containing the frames of the different clusters.
  - Calculate the cross RMSD map between two trajectories run on the same system. The calculations will give
    - The color-coded map of the RMSD's
    - The distribution of the RMSD's
    - If requested, for each structure in the first trajectory Simulaid will print the number of the one in the second with the smallest RMSD and the same for the second trajectory. Structure pairs that are mutully the closest ones are marked.
    - If the 2D RMSD result of the two trajectories are available then Simulaid can read in the .rd2 files of these calculation, the result file (the file with .rdx extension) of the cross RMSD calculation, perform a clustering on the .rd2 files and calculate the overlap among the clusters of the two trajectory.
  - Calculate the difference between two sets of RMSFs calculated by Simulaid.
- Measure various distances
  - Find (protein) residue pairs that are in "contact"; one residue should be in a user-specified reference range and the other in the user-specified neighborhood range. Contacts are defined in three different ways:
    - The distance between the between representative atoms of the two residues considered (C(alpha) by default (unless the user specifies a different representative atom) is within a certain threshold distance
    - the distance between the pair of atoms (one on each residue under consideration) nearest to each other is within a certain threshold distance
    - There is a pair of atoms (R,N) in the residue pair considered such that R is closest to N when considering the whole reference range and N is closest to R when considering the whole neighborhood range.
    The distance list is followed by a nonredundant list of neigbour residues as well as a nonredundant list of the non-neighbour residues. When analyzing trajectories, the average residue-residue distances between the selected residue range will also be calculated. Besides the residue-residue distances, the contacts (defined as mutually proximal atoms in the reference and neighbour residue sets are also calculated and printed.
  - Calculate the matrix of distances between the molecules (i.e., chains/segments)
  - Perform an analysis of the adjaceny matrix defined between residues based on the idea of A. Schuyler et al. (REF6) that looks at the column sum of successive powers of the adjacency matrix and its generalization that uses weighted edges.
  - Calculate the distance between two selected atoms (including PBC imaging, if requested).
    When scanning a trajectory,
    - option is given to use the first atom of the pair from the inputted reference structure
    - the evolution of the distance and its components are plotted as function of simulation length.
  - Track PBC-adjusted movements of the solvent molecules. When a membrane is present, count the transits of the solvents, both via the membrane or (if present) a membrane-embedded molecule.
  - For a set of pairs of atoms calculate the average distances and their standard deviation and the distribution of the distances between the atom pairs.
  - Calculate the atom-atom SD matrix from a trajectory
  - Check for potentially unphysical distances in the system
  - Compare two sets of trajectory-averaged distances and plot the matrix of these distances. Option is provided to plot the absoulet value of the distances. Optionally, atom pairs can be replaced by pairs of small clusters (e.g., the hydrogens of a given methyl group); the distance between such clusters is defined as the distance between the closest pair of two atoms in different clusters.
- Plot the PBC cell sizes and volume for a trajectory
- Calculate the protein backbone torsion angles Phi and Psi. The following outputs can be obtained:
  - A Ramachandran map of the protein backbone torsions. For a single configuration, the points are color-coded continuously with the residue number on the rainbow scale red-yellow-green-cyan-blue, and for trajectory scan they are color-coded with the frame number.
  - A file with the calculated angles. For trajectory scan, the distribution of angles in 60^o grids and the PROSS assignment is also printed.
  - Optionally, for residues selected by the user (currently up to 25), the time-evolution of the angles is also shown both with dial plots and with a path in the Ramachandran map and plots of the autocorrelation function of the conformation of each residue is also prepared (to quantify the rate of convergence).
- Calculate selected torsion angles (currently up to 49). For analyzing trajectories, Simulaid prepares
  - dial plots showing the evolution of the selected torsions.
  - calculate their average, CV. For each CV value the S.D. of a matching normal distribution is also printed.
  - If requested, all pairs of angles will be used to calculate the Pearson correlation coefficent and either Fisher-Lee circular correlation coefficients:
    
    C(a,b) = Dab / (A²*B²) where Dab = SUM^N_i=1 SUM^N_j=i+1 sin(a_i-a_j) * sin(b_i-b_j) A = SUM^N_i=1 SUM^N_j=i+1 sin(a_i-a_j)² B = SUM^N_i=1 SUM^N_j=i+1 sin(b_i-b_j)² or Jammalamadaka and SenGupta circular correlation coefficients: C(a,b) = [SUM^N_i=1 sin(a_i-av(a))*sin(b_i-av(b))]/
    [SUM^N_i=1 sin(a_i-av(a))² * sin(b_i-av(b))²]^1/2
    - If requested, the circular correlation coefficients are plotted as a heatmap
    - If requested, the eigenvales, eigenvectors of the circular correlation coefficient matrix is calculated
    - If requested, the angles are clustered using 1-abs(C(a,b)) as the distance measure
- Perform the analysis of proline kinks as implemented in the program ProKink by Visiers et al. (REF7)
  - As an enhacement ot the original algorithm, option is porvided to fit a plane to the proline ring and use the projection of the alpha-carbon to this plane for the definition of the wobble and phase shift angles.
  - When reading a trajectory, the kink angles are plotted onto a Postscript file as dial plots.
  - The psudorotation angle with the nitrogen as the apex is also calculated and plotted for each structure.
- Perform helix analyses (TRAJHELIX) with a selected set of helices for each helix (REF8),
  - calculate its axis
  - find and plot the angles between the axis and the laboratory frame coordinate axes
  - calculate the and plot radius if circle fitted to the alpha carbons (a bend indicator) and provide a novel shape estimator.
  - calculate and plot the length of the helix
  - calculate and plot the average turn angle per residue.
  - when more than one helix is selected for analysis, the angles between each pair and the distance between the centers of each helix will also be calculated
    When scanning a trajectory, for each frame
    - calculate and plot the angle between the axis in the current and in the reference structure (read in at the start of the run)
    - calculate and plot the angle the helix was rotated around the axis (again, compared to the reference structure). The overall rotation of the protein can be applied to the reference helix before the comparison.
    - calculate and plot the displacement fo the helix center compared to the reference structure. The displacement of the protein's COM can be subtracted.
    - calculate the angle between the normals of the plane fitted to the reference structure and the current structure, rotate both normals so that the reference planes normal is parallel to the Z axids and plot the progress of the X-Y projection of the current (rotated) normal.
    At the end of the trajectory scan the correlation coefficients among all pairs of calculated properties will be calculated. For properties involving two angles the Fisher-Lee circular correlation coefficient will also be calculated: For properties involving one angle only the Mardia linear-circular correlation coefficient will also be calculated:
    
    C(a,x) = (r12²+r12²-2*r12*r13*r23)/(1-r23²) where r12 = corr(sin(a),x); r13 = corr(cos(a),x); r23 = corr(sin(a),cos(a))
- Calculate pseudorotation angles as defined by Cremer and Pople (REF9) and for 5-membered rings also by Altona and Sundaralingam (REF10).
- Find helices and sheets using the formalism of Kabsch and Sander, implemented in the program DSSP (REF11). When scanning a trajectory, a plot of the secondary structure evolution is also prepared.
- Calculate circular variance map of the solute and plot it onto a Postscript file:
  The circular variance map of a macromolecule of N residues is an NxN matrix:
  
  CV_i,j = [ SUM_{k between i and j} (r_i,k/|r_i,k|)]/|i-j+1| ,
  while its weighted variant is defined as:
  CV^w_i,j = [ SUM_{k between i and j} r_i,k]/ [ SUM_{k between i and j}|r_i,k|]
- Calculate the the covariance and correlation matrices of selected residues over a trajectory. Both matrices are is printed and a color-coded map is plotted for both. Furtermore, if requested, the eigenvalues and eigenvectors of the covariance matrix will also be calulated.
- Summarize Amber energy decomposition table results (old format)
- Filter solvents
  - Delete solvents whose nearest solute atom is above threshold
  - Delete solvents whose circular variance (CV) w.r.t. solute is below threshold
  - Delete solvents satisfyin both distance and CV criteria
  - Delete solvents that are not in the interface between selected solute molecules
  - Delete solvents that are not bridging two different solute residues with hydrogen bonds.
General clustering
- Cluster atoms. This option takes the input structure, and clusters the atoms based on their distances. This function can be used for a data set representing a density distribution - the clustering will delineate the regions of different density maxima.
- Cluster based on an input matrix This option reads in a matrix representing distances and performs clustering on it. The input is assumed to be in the following form:
  Size of the matrix (N)
  N times:
  This matrix can be transformed before clustering using several formulae:
  - R(i,j) = 1 - R(i,j)
  - R(i,j) = R(i,j)^N
  - R(i,j) = R(i,j)^(1/N)
  - R(i,j) = (1- R(i,j))^N
  - R(i,j) = (1- R(i,j))^(1/N)
For K-means and K-medoid clustering it can produce an inertia plot that helps deciding on the number of clusters to request.
Start logging the keyboard input. When logging is turned on, the program writes all keyboard input on a file that can be later used as input to repeat the run:
simulaid < <logfile>
Allow Simulaid to make the input predictable, otherwise you have to remove or rename all files crated before 'replaying' the log file.
Make the interactive quiz predictable. In certain cases, Simulaid has extra questions or omits some questions, depending on the data. This can confuse runs that use a log file as input. To avoid this problem, when this option is set, the extra qestions will assume default answers and the questions that may be eliminated in some cases will not be.

II.RUNNING THE PROGRAM - INPUT

The program is run interactively: the user is first asked to select the operation to be performed then (if needed) the

name of the input STRUCTURE file. Input syntax can be
- (.CRD) Charmm CRD, either in the original (80 characters/atoms, 4-character names) to the extended format introduced with Charmm 32 (132 characters/atoms, 8-character names, 10-character integers)
- (.pdb) Original Protein Data Bank format - there are several variants:
  - Brookhaven format
  - Charmm PDB (Charmm PDB differs from Brookhaven PDB mainly in the placement of the segment ID)
  - Autodock PDBQS - an extension of the PDB format containig partial charge and solvation parameters (used by Autoock 3)
  - Autodock PDBQT - an extension of the PDB format containig additional partial charge (used by Autoock 4)
- Macromodel
- Macromodel/Xcluster (for more than one conformations, a Macromodel format conformation is followed by only x,y,z information as provided by Xcluster)
- (.slt) MMC solute description file format
- (.gro) Gromos/Gromacs coordinate file format
- (.mol2) Tripos Mol2 coordinate file format (only for input)
- (.car) Insight car format
- Free format as defined by the Insight format filenxyz.frm (number of atoms in the first line, followed by atomic number and x,y,z coordinates in each of the successive lines)
- Free format as defined by the Insight format filesxyz.frm (like nxyz.frm, except that chemical symbols are expected instead of atomic numbers)
- Free format as defined by the Insight format filesxyzqr.frm (like sxyz, but appended by charge and residue number).
- (.crg) Grasp charge file
If the extension of the input file is neither of the extensions specified above the program asks the format of the input data.
type of calculation to be run,
name of the output file.
Whenever relevant, it asks the
- simulation cell shape. The following (periodic) boundary conditions are recognized:
  - cubic
  - rectangular
  - hexagonal prism (both regular or irregular hexagon), with user-specified coordinate axis along the prism axis and with user-specified coordiante axis going through the hexagon vertex (e.g., X axis along the prism axis and vertices of the hexagon are on the Z axis - as used by MMC)
  - face-centered cubic
  - truncated octahedron, X axis normal to a square face - as used by Charmm
  - truncated octahedron, X axis normal to a hexgon face - as used by Amber
  - hexagonal close-packed
  - image-file based
  - sphere (used only for trimming solvents)
  - no boundary conditions
- residue name of the solvent (default: HOH for PDB, WTR for Insight and TIP3 otherwise)
- number of solvent atoms for solvents other than TIP3, WTR or HOH.
- For runs optimizing position or orientation it offers the option of excluding hydrogens.
- Sphere optimization asks the width of the extra solvent layer - this width will be used for volume calculations and (when relevant) excluding excess solvents.
- Orienting run asks the minimum image-image distance to be achieved. The final cell size will be determined with this value in sight.
- Trajectory conversions will use the input configuration to define the order of the atoms in the input trajectory and may be given a second configuration file to establish the target trajectory's atom order.

Creation of a Grasp .crg file involves specifying the RTF file name(s). Note that the Grasp format limits atom names to four characters and residue names to three characters. If either of these limits is exceded, the program will truncate the names and print a warning. Also, the program requires a valid 'input' filename, but it will be ignored.

For most runs with conversions or with cleanup the program asks the number of additional structures to process in the same way. This allows the processing of a batch of files with one run. For Charmm format output, the segment id of the additional structures will be appended by 1,2, etc., as long as there is 'room' for this additional information in the four-digit sequence id (i.e., the original segment id contained enough blanks).

Additional input structures may be either in the same file or in separate files - you will be asked which. Each file may contain several structures - you will be asked how many. If they are in separate files, and the first file was called ether A.B or A.1.B then the remaining files have to be called A.2.B, A.3.B, etc. Similar naming convention applies to the output files.

The output file(s) will contain the modified molecule. Except for runs requesting a transformation of syntax, it will use the same syntax type as the input file. The header comments for Charmm and PDB formats will be extended with a description of the transformation(s) done by the program and/or the results of the optimization.

Running Simulaid on a batch of files

Simulaid has been succesfully run from a c-shell script on a batch of files. The script requires a copy of all the answers to the interactive quiz durig the Simulaid run. This can be obtained by running Simulaid and turn on on input logging. It is advisable to set the input predictable in this case. This list has to be put into the script file between the simulaid << fin command and the terminator string fin. In the example below all .pdb files in the directory are converted from PDB atom/residue name convention to Charmm's convention. Also, the generation of the list of files to be processed has to be adapted to each case.

#! /bin/csh -f
#
# script to convert from PDB name convention to Charmm convention
#
set files = `ls *.pdb`
# The variable $file lists all the .pdb files in this directory
foreach file ($files)
set file1 = `echo $file | sed s/\.pdb/_ch\.pdb/`
# Generated a new file name from the original .pdb file name to use as output
echo Reading $file and writing file $file1
simulaid << fin >> junk
p
n
f
$file
b
$file1
y


9
3



fin
end
#rm junk
exit

III. THE OUTPUT ON THE TERMINAL

The terminal output will include the title cards and the number of atoms found in the input file (the number of hydrogens also if they are not to be considered) for all types of runs.

For enclosing sphere calculation:
- The initial radius found based on the center determined by shifting along the X, Y, and Z axes by (X(min)+X(max))/2, (Y(min)+Y(max))/2, (Z(min)+Z(max))/2, respectively, where X(min), X(max) are the smallest and largest X coordinates, etc.;
- The optimized radius;
- The volume of the initial and optimized sphere and the number of waters required to fill it in.
- The volume of the initial and optimized spheres increased by the water layer width and the number of waters required to fill it in.
- The vector shifting the molecule to the center of the smallest sphere.
- If solvents were present in the input file, the number of them remaining that is within the layer of specified width around the optimized sphere.
For minimizing the bounding box:
- The initial bounding box sixe
- Messages describing the progress of optimization.
- The smallest bounding boxes found in each minimization try
- The optimal bounding box sixe
- The vector shifting the molecule's COM to (0,0,0) and the rotation matrix to the optimal orientation
For orientation optimization:
- The shortest distance between atoms on different images in the original and in the new orientation. The optimization will be performed starting from the input orientation as well as from a user-defined number of different random orientations.
- Messages describing the progress of optimization.
- The cell parameter(s), like the edge lengths, corresponding to the cell shape used in the calculation.
- The cell volume and the number of waters required to fill it in (without the solute present).
- The vector shifting the molecule's COM to (0,0,0) and the rotation matrix to the optimal orientation
For solvent geometry fixing calculations:
- The number of solvents found.
For elimination of waters outside a simulation cell:
- The number of waters kept and the number of non-water atoms found.
The cleanup operations, performed automatically at the end of the run or at special request. Note, that automatically invoked cleanup operations start the residue and atomnumbers from one.
- The number of atoms found detached from their residue
- A message if resequencing was or was not needed
Conversion output
- For each type of conversion a variety of warnings or error messages may be obtained indicating that the input file does not conform to some assumptions or the templates used are not prepared for the particular input. These messages should be heeded.
- For residue-name bearing input a sequence list can be written to a file.
The analysis option provides a menu from which a choice should be made and ask if trajectory scan is requested. After the analysis requested is performed the program returns to this menu allowing the user to do an other one or to stop. If additional configurations are read from the input file, the last analysis will be repeated.
- Each type of analysis prints the name of the output file (generated from the input file name by adding a special suffix) where the requested information is written.
- Various statistics generated over the lists are also printed on the terminal.

Furthermore, several internal consistency checks are implemented to increase the chance of detecting any error that might have remained in the program. Failing such a test results in a message preceded by PROGRAM ERROR. This always indicates a bug in the program and should be brought to the attention of the author, together with the input file that was processed when the error occurred.

IV. FILE FORMATS

The name conversion file pdb_nam.dat is of the following syntax:
- Part 1: nconv,ncol (free format)
  - nconv: Number of name conventions included.
  - ncol: number of data columns in the file.
- Part 2: 8-character names of the conventions (in one line):
  "Amber ", "Amber94 ", "Charmm ", "Moil ", "IUPAC ", "Macromod","ChemX ", " PDB "
  PDB has to be the last convention.
- Part 3: Data for residue name conversions. Each line has ncol entries.
  - The first entry is a generic residue name.
  - Entries two to nconv+1 are the actual residue names corresponding to possible variants in the various conventions.
  - Entries from nconv+2 to ncol are possible additional PDB variants. A blank entry means that conversions to that convention of the corresponding atoms will retain the original name. An entry of "****" means that conversions to that convention will result in dropping those atoms.
  - This section is concluded with a line for residue name "DONE".
- Part 4: Data for atom name conversions. Each line has ncol entries. The first entry is a generic residue name as defined in Part 3. Entries two to nconv+1 are the atom names corresponding to the variants in the various conventions.
  Entries from nconv+2 to ncol are possible additional PDB variants. Entries for PDB varians have to conform to the PDB convention requiring that atom names for elements with single-character chemical id be in the second position - leaving the first one either blank or filled with a number (e.g., "_OD1" or "2HA1").
  There are a number of entries with blank generic residue names - they represent names that convert to each other for any residue. This section is also concluded with a line for residue name "DONE".
The template file for the orders of the atoms within each residue has 8-character entries in the form "RESN ","ATNM ". The current versions have been directly generated from the RTF files of Amber and Charmm.
The file containing the information needed to convert to MMC Monte Carlo input has 8-character entries "RESN ", "ATNM ", "PFCD ", "Charge","GROUPa b" where
- RESN is the residue name
- ATNM is the atom name
- PFCD is the left-adjusted atom-type name
- Charge is the partial charge given with decimal point
- GROUPa b gives information about the carous charge groups within a residue. For each group, a=1 for the first atom of the group and a=0 for the rest; b is the sequence number of the group within the residue (i.e., b=1,...,b=2,..., etc).
Again, they have been extracted from the RTF files of Amber and Charmm, respectively. Note that the atom order template file has the same syntax as the MMC conversion file for the residue and atomnames thus an MMC conversion file can be also used for atom order template file.

V. INSTALLATION

The distribution contains the source code and some data files. Installation of the program requires the following steps:

Putting these data files into a directory of your choice, referred to as the installation directory.
Changing in the block data
- the string MYPATH to the path of the installation directory
- the fourth number in the data ldatapaths statement (00) to the number of characters in /MYPATH/
Removing the C@** (i.e., deleting the first four characters in the line) comments from lines corresponding to your environment in the subroutine datprt (near the end of the source code) for the printing of the date on output files as follows:
- "C@UG" --> "" - SGI IRIX Fortran compiler
- "C@G7" --> "" - g77 Fortran compiler
- "C@AB" --> "" - Absoft Fortran compiler
- "C@EF" --> "" - Intel Fortran compiler
- "C@HP" --> "" - Hewlett-Packard Fortran compiler
- "C@AX" --> "" - IBM AIX Fortran compiler
The subroutine datprt checks if the program is executed on certain hostnames, assumed to be the headnodes of a cluster. The hostnames (e.g., minerva2 in the distribution copy) should be replaced with the head node hostname where Simulaid is installed. This will enable Simulaid to warn the user that longer runs should be sent to a compute node.
Compile the program with Fortran, e.g.,:
f77 -o simulaid.bin simulaid.f
to obtain the executable simulaid.bin

Some compilers fail due to a so-called 'relocation error' when optimizing at higher levels or specifying larger dimensions (see below) than the default. When using the Intel Fortran compiler (ifort), adding the compiler directives
-mcmodel=medium -share_intel
solved the problem. With some of the other compiler (but not the GNU compiler) the compilation key -fpic was found to solve the problem.
If the program is to be compiled with Fortran95 then there is a preprocessor in the distribution f77tof95.f and f77tof95_f95.f (the second one is the Fortran95 version of the first one) that changes the syntax to conform to Fortran95 requirements.

The distribution directory also contains the files nxyz.frm, sxyz.frm and sxyzrq.frm that give the decription of the data format for some of the InsightII free-formats. The Charmm image files for the non-rectangular boundary conditions, hexy.img, hexz.img, fcc.img, to.img, and hcp.img are also given.

VI. CHANGING DIMENSIONS

The sizes of the arrays are established with parameter statements throughout the code. Several symbols user used for this purpose. There are certain relations between these symbols, so changing one of them is likely to require changes in some others. Below is a list of these symbols and their relations that have to be maintained (the program checks for violations).

IMPORTANT: Parameter statements for most symbols occur several places in the program. When a change is required, it has to be carried out at ALL occurences!

MAXPHI: maximum number of Delphi gridpoints
MAXREC: Maximum number of records to read
MAXRSD: Maximum number of residues
MAXNEIG: Maximum number of neighbors of an atom in the neighbor list
MAXBONDS: Maximum number of bonds (hydrogen, hydrophobic or salt bridge)
MAX2D: maximum number of frames for the 2-D RMSD map
MAXCV: maximum number of residues for the CV-based domain map
MAXRCORR: maximum number of residues for the residue correlation map
MAXCONN: maximum number of residues for the residue distance map

The current version uses MAXREC=200000, MAXRSD=70000, MAXNEIG=25, MAXPHI=400, MAXBONDS=10000, MAX2D=5000, MAXCV=2000, MAXRCORR=5000 . Valid values for a larger version (ca. 1.1 Gb) are MAXPHI=640, MAX2D=10000, MAXCV=800, MAXRCORR=10000.

In additions to the symbols above, there are other size parameters whose value can be chosen independently (the current value is in braces):

MAXFRAMES {50000}: the number of trajectory frames to store for analysis
MAXCOPY {600}: the number of different data (e.g., torsion angles) per frame to store for analysis
MAXHX {50}: the maximum number of residues per protein helix
MAXNHX {16}: the maximum number of helices for a single TRAJHELIX run
MAXBRDIGEATOM {8000}: Maximum number of bridge anchor atoms:
MAXBRDIGETYPE {500}: Maximum number of bridge destination atoms:
MAXBRIDGELEN {4}: Maximum number of H bonds in a bridge:
MAXDDISTR {200}: Maximum number of atom pairs to calculate the distance distribution
MAXDDBIN {20}: The number of distance bins in the distance distributions
MAXCDLIST {2000}: The total number of atom pairs that can be involved in distance distribution calculation

VII. IRIS GL GRAPHICS VERSION

Earlier versions of Simulaid included optional code for graphics output. Since support for Iris GL stoppped in the early 90s, the graphics code has been removed from Simulaid. The latest version with this code is saved as simulaid_IGL.f and simulaid_IGL.html.

VIII. REFERENCES

1. O. Lemke and B.G. Keller, Density-based cluster algorithms for the identification of core sets, J. Chem. Phys.,145, 164104 (2016). DOI:10.1063/1.4965440.

2. M. Mezei, Optimal Position of the Solute for Simulations, J. Comp. Chem. 18,812-815 (1997). DOI:10.1002/(SICI)1096-987X(19970430)18:6<812::AID-JCC6>3.0.CO;2-V.

3. M. Mezei, Finding of the smallest enclosing cube to improve molecular modeling, Information Newsletter for Computer Simulation of Condensed Phases, CCP5, Daresbury Lab., No 47, (2000).

4. J. Cornette, K. B. Cease, H. Margalit, J. L. Spouge, J. A. Berzofsky and C. DeLisi, Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins, J. Mol. Biol., 195, 659-685 (1987). DOI:10.1016/0022-2836(87)90189-6.

5. M. Mezei, A new method for mapping macromolecular topography, J. Molec. Modeling and Simul., 21, 463-472 (2003). DOI:10.1016/S1093-3263(02)00203-6.

6. Adam D. Schuyler, Heather A. Carlson, and Eva L. Feldman, Computational Methods for Predicting Sites of Functionally Important Dynamics J. Phys. Chem. B,113, 6613-6622 (2009) 10.1021/jp808736c

7. I. Visiers, B.B. Braunheim, and H. Weinstein, Prokink: a protocol for numerical evaluation of helix distortions by proline. Prot. Engrg., 13, 603-606 (2000). DOI:10.1093/protein/13.9.603

8. M. Mezei and M. Filizola, TRAJELIX: A computational tool for the geometric characterization of protein helices during molecular dynamics simulations. J. Computer-Aided Molecular Design, 20, 97-107 (2006). DOI:10.1007/s10822-006-9039-1.

9. D. Cremer and J.A. Pople, General definition of ring puckering coordinates, J. Am. Chem. Soc., 97, 1354 (1975). DOI:10.1021/ja00839a011.

10. C. Altona and M. Sundaralingam, Conformational analysis of the sugar ring in nucleosides and nucleotides. New description using the concept of pseudorotation J. Am. Chem. Soc., 94, 8205 (1972). DOI:10.1021/ja00778a043.

11. W. Kabsch and C. Sander, Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577-2637 (1983). DOI:10.1002/bip.360221211.