Description of the program Compligset:
Further analysis of the data provided by the program Dockres
Mihaly Mezei
Department of Pharmacological Sciences,
Icahn School of Medicine at Mount Sinai,
New York, NY 10029
Mihaly.Mezei@mssm.edu
Dec. 16, 2019.
The program Compligset operates on the results of docking runs,
in most caes processed by Dockres.
The following inputs are implmented:
- A PDB file of the docked complezes, generated by Dockres
- A list file (extension .lst) of ligand ids and scores, generated by Dockres
- A Glide ligand file (either straight out of Glide (*_pv.mae) or extracted
from the *_pv.mae file by Dockres
- Combined score list generated by Dockres
- A PDB file of the top-scoring ligands generated by Dockres
Most operations require the user to specify the number of ligands
to be searched from each list (Nsearch).
If ligand names
contaning '_' or '-' are seen (that are usually indicating different
tautomers of the same ligand), the user has to specify
if the '_*' or '-*' part can be ignored in the ligand names.
The user can also specify minimum COM-COM distance and minimum RMSD thresholds,
beyond which two ligand poses will be treated as different ligands
(and labeled accordingly)
The following operations are implemented:
- Look for overlap between ligand sets
For each ligand in the first Nsearch sets the program will list the
targets in whose list it appears, for each pair of targets it counts the number
of common ligands (in the top Nsearch sets),
the number of ligands appearing in exactly/at least target's list.
There is also an option to extract ligands that are on a user-specified minimum
number of targets' list plus a user-specified number of top-scoring ligands
for each target that are not on the concensus list.
- Combine (average) scores among ligand sets
For all lists read, the scores and ranks of the first Nsearch ligands
are averaged and the ligands are sorted either by score or by rank.
Ligands that are on fewer lists than a user-specified limit can be
excluded.
Rank correlation between all pairs of lists as well as the averege rank correlation
between the list ranks (either by score or by ligand rank) are also computed
when the lists are duplicate free.
- Look for selectivity (different targets)
For all pairs of targets, search the first Nsearch ligands and list
those whose scores on the two targets differ by a user defined threshold.
- Rank lists by top score (averages)
This option calculates the average score of the first Nsearch ligands
for each target - these averages may be used to rank the targets.
- Merge lists
- Remove duplicates from a .lst file
- Combine ligand-target contact maps
This option reads the Dockres output (.res) file from screening different targets
with the same ligand set, extracts the contact maps showing which ligand is
docked to which target residues and combines them into a single map.
- Extract ligand-target complex
Compilation of the program
The program is written in Fortran 77. Its size is governed by
the parameters (the number between the braces is the value set
in the source code), established in the first line of the program
- MAXLIG {125000} - maximum number of ligands per target to read
- MAXTOP {125000} - maximum number of ligands per target to compare
- MAXLIGAT {200} - maximum number of atoms per ligand
- MAXTARGET {12} - maximum number of targets
- MAXPOSE {10000} - maximum number ligands per target to
use in combining (averaging) scores/ranks.
- MAXDUP {100000} - maximum number of ligand pairs in the 'duplicate list'
- MAXCMEM {200} - maximum number of ligand poses to combine (average)
during overlap search
It should be compiled at the highest optimization level for maximum speed.
For example, using the g77 compiler the compilation can be executed
by
g77 -O4 -o compligset.exe compligset.f