Description of the program Dockres:

Summary of the results of docking a library to a target by Autodock-4, Autodock-Vina, eHiTS, PLANTS, GOLD, DOCK or Glide

Mihaly Mezei

Department of Pharmacological Sciences
Icahn School of Medicine at Mount Sinai
New York, NY 10029

Mihaly.Mezei@mssm.edu

Aug. 14, 2020

The program Dockres scans the result of Autodock (Version 4) or Autodock-Vina or eHiTS or PLANTS or GOLD or DOCK or Glide docking runs with a series of ligands. It gathers the top binders and displays a variety of statistics, both on the ligand set and on the top binding poses. The ligands selected can be filtered by a number of criteria.

File convention

Dockres uses a one letter code for the screening software used:

This code is included in the names of the various directories and files created by Dockres.

Input of the program

Besides the structure file for the target macromolecule (of the form macro.pdb*, or (for GOLD) macro.mol2) Dockres assumes the availability of the following files (the notation macro stands for the name of the macromolecule file's name without the .pdbqs .pdbqt, .pdb or .mol2 extension):

  • Except for DOCK and Glide, a file called macro_<sw>.dir listing the docking result files (for PLANTS the directories the docking result of each ligand are) where <sw> is a one letter code for the screening software used. It can be created with the script getdir.csh (or by the the user with a text editor) prior to running Dockres.

    Format of the file macro_<sw>.dir:

    For example, docking with Autodock-4 ligands ligx.mol2, ligy.mol2 and ligz.mol2 to macomolecule mm.pdbqt will result in files ligx.mm.dlg, ligy.mm.dlg, and ligz.mm.dlg. Thus the user has to prepare a file called mm.dir, with the following content
    mm.gpf
    1 ligx.mm.dlg
    2 ligy.mm.dlg
    3 ligz.mm.dlg
    
  • Note that for GOLD, Dockres assumes that each pose is in a sperate file in the result directory and they are of the form gold_soln_<structure>#l_#p.mol2 where #l is the ligand number and #p is the pose number.

    In addition, Dockres needs

    Dockres can be run both interactively from a terminal or in batch mode, specifying the run parameters as command-line options. The terminal inputs can also be logged to a file that can be used to rerun a job, possibly after editing it. When compiled with the parallel code included it has to be run in batch mode (with concomittant restrictions, vide infra).

    In interactive mode it starts with asking (possibly a subset of the) for the following information:

    Once this information is given, the docking result files are read and the data is extracted from each. Besides the coordinates of the pose, the program extracts the docking score. For Autodock-4 and eHiTS the user has to select between the wo scores calculated by the docking program. This extraction may take some time - for larger libraries the program periodically will print a report of the progress. Once the data is gathered, a checkpoint file is written and the result summary starts.

    The result summary starts with printing on the terminal the list of the top-scoring poses, the number of poses in the top-score ranges, and a plot showing the distribution of the location of poses over the macromolecule's residues. The program then gives the user the option to

    If no (more) repetition of calculations are requested, the program proceeds to the last stage where it offers the user the following options:

    For each of the options above the user can specify the number of top-scoring ligands to use and decide if different isomers/tautomers of the same ligand should be included.

    In batch mode the following information can be specified:

    A possible batch run call can be

    > dockres -mm hemoglobin -sw eHiTS -np 20 -ol 2 -ib 99

    The first two items (-mm and -sw) are compulsory; preferably, they should be the first two items, allowing all warning and error messages to be printed on the log file macro_<sw>.res. For the rest of the input that can be specified in interactive mode the default values are used.

    Batch run with flexible macromolecule has not yet been implemented.

    Some inputs currently can only be provided in the interactive mode (e.g., hydrogen-bond thresholds, filtering options). To use a non-default option for which no command-line input is implemented, an interactive run is required that can be started from the checkpoint file. It will not be CPU intensive since the time-consuming data gathering has been completed already.

    Output of the program

    Dockres will create the following files:

  • A log file called macro_<sw>.res where all result will be printed. If it is already present, it will write instead to macro_<sw>_N.res where N is the smallest integer such that no file with that number exists. A file called macro_<sw>.ckp containig all the information gathered allowing the repeated extraction of data with different filtering criteria without having to perform the time-consuming scan of the .dlg files
  • Additional output files can be created optionally - see above the batch directives -nx, -py -tl, -ta, -tr, -cs, -gr and -gc. For example, -nx creates a PDB file(s) containing extracted ligand poses with the macromolecule.

    The file macro_<sw>.res will contain

    Compilation of the program

    The program is written in Fortran 77. Its size is governed by the parameters (the number between the braces is the value set in the source code), established in the first line of the program

    There are some constraints on the parameters: Several of the parameter definitions are repeated in different subroutines. If any one is changed, all occurrences have to be changed similarly

    The program uses several arrays of size MAXMOL*MAXPOSE, dominating the memory requirement.

    It should be compiled at the highest optimization level for maximum speed. For example, using the f77 compiler the compilation can be executed by

    f77 -O4 -o dockres.exe dockres.f

    Some compilers fail due to a so-called 'relocation error' when optimizing at levels higher than one is asked. When using the Intel Fortran compiler (ifort), adding the compiler directives
    -mcmodel=medium -share_intel
    solved the problem. With some of the other compiler (but not the GNU compiler) the compilation key -fpic was found to solve the problem.

    The optional parallelization is using the MPI library. Note, that this requires running in batch mode, with the concomittant restrictions. Furthemore, the parallel version does not work for DOCK or Glide. To compile Dockres with the parallel code included, first remove the 'C@DM' string from the source code:

    > cat dockres.f | sed 'C@DM'd > dockres_mpi.f

    > f77mpi -o dockres -O4 dockres_mpi.f

    The name of the MPI-enabled compiler may be different in your system and additional libraries may also be needed to be invoked.

    For parallelized runs, the parameter MAXMOL can be set to less than the total number of ligands - it should be just large enough to hold data for Nmolec/NCPU. In this case, however, the program stops after writing the checkpoint file and a separate single-CPU run, compiled with the parameter MAXMOL set large enough to hold all ligands should be used to print/save the results. This option is useful for distributed memory systems where the majority of nodes have relatively small memory.

    Note that if Fortran-90 is used for one compilation, then it should be used for the version reading the checkpoint file as well, otherwise the binary files will be incompatible.