Description of the program SIMLOC for the detection of

locally similar substructures in two molecules.

Mihaly Mezei

Department of Pharmacological Sciences,

Icahn School of Medicine at Mount Sinai

New York, NY 10029

Mihaly.Mezei@mssm.edu

Nov. 14, 2002.

The program SIMLOC (written in C) takes as input the coordinates of two molecules (or two conformations of a molecule) and generates substructures with low RMS, as described in Protein Engineering, Vol. 7, p331 (1994). The substructures found depend on a threshold parameter RMSlim, the smaller RMSlim is, the smaller in general will the substructure RMS's be. The substructure RMS values are not limited by RMSlim, though.

At start, the program allows you to change the values of some algorithmic parameters: TESTLEV, TOLERANCE and SIM. When TESTLEV > 0, the progam prints detailed information during the run. TOLERANCE is the largest number considered to be zero (used for analyzing the eigenvalues during the RMS minimization). SIM=0 means that the program will search for dissimilar substructures.

The program asks for an (optional) output file name, followed by the input syntax and the name of the input file(s). The following input formats are allowed:

In either case, the input items are to be separated by blanks.

Next, the program offers an option to filter the data. The following options are implemented:

After filtering you will have the option to ask for a listing of the result of filtering operations.

It will be assumed that after the filtering is done, the atoms in the two copies correspond to each other. If the original structures had different number of atoms (e.g., two different proteins were read, but both are reduced to their backbone), the output will refer to the atom numbers of the first structure.

The program then calculates and prints the RMS and maximum distances between the two structures and determines the bonding pattern (based on the first conformation). Next, for each bond it calculates the RMS between the atoms bonded to the endpoints of the bond as an indicator of the 'rigidity' of that bond. It prints the range and distribution of these bond-related RMS values and then asks for a value of RMSlim. This should fall between the printed minimum and maximum.

If the first molecule was found to be disconnected, the program prints the number of molecules it consists of and (if disk output was also requested) the atoms belonging to each molecule.

For the RMSlim value read, calculates a new bond list where only bonds where the above calculated bond-related RMS is below this threshold are kept. Next, the program determines the connected substructures with this new bond list, calculates the local RMS and compares it with the RMS of the same substructure in the overall fit calculated at the outset ('global' RMS). The maximum deviation between atoms of the two structures (DEV) is also calculated and printed - it is usually about twice the RMS value.

The (optional) output file will contain additional information: if the input syntax was 0, the atom number of the substructure members, if it was 1, the atomname and the residue number of the substructure members. The lines containing this additional information are prefixed with *, so they can be deleted with the command

sed '/*/d' outputfile > outputfile.new

where the file outputfile.new will be the shortened output.

The program then offers the option to input a different RMSlim value to calculate a new substructure set. Zero RMSlim stops the calculation.

Installation

The program is written in C. Installation requires only that the source code be compiled. On most Unix systems this can be done by issuing the command

cc -o simloc simloc.c -lm

After compilation, typing simloc will start the run.