Description of the programs and scripts written for the screening of library with AUTODOCK (4, or Vina), eHiTS, Glide, or PLANTS

Mihaly Mezei

Department of Pharmacological Sciences,

Icahn School of Medicine at Mount Sinai

New York, NY 10029

Sept. 01, 2015.

The screening of a library of ligands by Autodock-4 or Autodock-Vina or eHiTS or Glide or PLANTS requires

Preparation of the target molecule
Preparation of the ligand set
Running the dockings, preferably using many processors of a cluster
Extracting, filtering and analyzing the docked poses obtained

The programs and scripts in this package help in all three steps. It is assumed that the ligands are available in either

Tripos' .mol2 format
Autodock's .pdbqt format (for Autodock only)
.sdf format (for eHiTS or Glide only)
.mae format (for Glide only)

and the requisite executables are available, and (for Autodock) Python 2.4 (or higher) is installed. Semiempirical partial charge calculations also require the availabity of the program Gaussian.

Furthermore, some of the programs in this package have to be compiled with a Fortran compiler (e.g., g77). To obtain an executable form the source file program.f type (replace g77 with your compiler's name, if different).

>g77 -O4 -o program program.f

When used on systems other than the one(s) tested, the c-shell sripts are likely to need some customization. Special attention has to be given to definitons of executables and software paths and laoding modules.

I. Target preparation
- I.1. Autodock-4 and Autodock-Vina
- I.2. eHiTS
- I.3. PLANTS
- I.4. Glide
II. Ligand preparation
III. Run the screening
- III.1 Submit the job(s)
- III.2 Track the jobs
- III.3 Change the number of CPU's used
IV. Post-processing

I. Target preparation

I.1. Autodock-4 and Autodock-Vina

Autodock requires the target in .pdbqt format. It can be conveniently prepared by the program AutoDockTools. Only polar hydrogens are needed and the charges are to be prepared with the Gasteiger algorithm. Note, that Vina is not using the partial charge information. When running Autodock-4, for best result the charge sum of residues should be integral numbers. The residue charge sums can be checked with Simulaid.

I.2. eHiTS

eHiTS requires the target in PDB, .sdf or Tripos .mol2 formats. The .pdbqt files have to be converted to regular PDB format; again Simulaid can do that.

I.3. PLANTS

PLANTS requires the target in .mol2 format. If waters are to be used in the binding site then they should be part of the file, with residue name HOH.

I.4. Glide

The target for Glide has to be in .mae format. If only a .pdb file is available, the script fullscreen.csh will call Schrodinger's prepwizard utility.

II. Ligand preparation

The programs and scripts used for preparation make sure that there is a single directory containing files with a single molecule each, with likely valid partial charges.

II.1. Prepare individual .mol2 or .pdb* files from a single file

The program splitmol (written by D.A. Gschwend and adapted by M. Mezei) reads in a file containing several .mol2, .sdf or .pdb* structures and creates several files with a limited number (e.g., one) structures each. A sample run looks like this:

% splitmol
 Running splitmol DAG/MM v.2.16...        24-Oct-0

 This program supports command-line arguments.
 Type splitmol -h for a brief description.

 Use PDB-type or mol2 file (<P>/M/S)? m
 Enter name of the mol2 file: ../split_test.mol2
 Enter name to add to the molecule number: xxx
 Enter starting and ending molecule numbers to extract.
 (Enter 0 0 to do all, enter a negative number to use a list file.) 0 0
 Split this file into groups of how many molecules? 1
 Label by absolute or relative molecule # (<A>/R)? a
 Output file names will be of the form <number>.xxx.mol2
 All molecules in input will be extracted.
 Each molecule will be put into an individual file.
 Absolute molecule numbers will be used.

 Working...

 Execution completed.
 2 molecules written.
 2 file(s) created.
%

II.2. Partition ligand files into subdirectories

Directories containing large number of file (O(100,000) or more) tend to negatvely affect the overall system performace and system manager generally discourage it. The script split_db.csh takes a directory containing a lot of files and creates a new directory that contains several subdirectories with a limited number of files:

$ split_db.csh
Name of the directory to split=LIB
Name of the directory to contain the split list=SPLITLIB
Extension of the files to split=mol2
Do you want to count the number of mol2 files in SPLITLIB (y/n) [n] y
Number of mol2 file in directory LIB = 21
Number of files in each subdirectory=7
The directory of files will be written into the file SPLITLIB/SPLITLIB.dir
First subdirectory name= SPLITLIB_1
Directory SPLITLIB/SPLITLIB_1 is completed
Directory SPLITLIB/SPLITLIB_2 is completed
Directory SPLITLIB/SPLITLIB_3 is completed
Copied 21 files into 3 directories

Besides the subdirectories, the script prepares a file with the list of subdirectory names and another file with the name of the ligand files (including the subdurectory they are in):

$ ls SPLITLIB
SPLITLIB.dir  SPLITLIB.list  SPLITLIB_1  SPLITLIB_2  SPLITLIB_3
$ more SPLITLIB/SPLITLIB.dir
SPLITLIB_1
SPLITLIB_2
SPLITLIB_3
$ more SPLITLIB/SPLITLIB.list
SPLITLIB_1/MS0023153.S1.mol2
SPLITLIB_1/MS0033023.S1.mol2
. . .
SPLITLIB_3/MS0056323.S1.mol2
SPLITLIB_3/MS0056323.S2.mol2
SPLITLIB_3/MS0056324.S1.mol2
SPLITLIB_3/MS0056328.S1.mol2
$

II.3. Filter a .mol2 ligand directory

The C-shell script filtermol2.csh filters files by the following criteria:

Drop files whose molecular weight is outside a range specified by the user
Drop files whose formal charge is outside a range specified by the user
Drop files where steric clashes exceed a limit specified by the user
Makes sure the ligand is a simple molecule (if not, drops the smaller part)
Corrects some of the commonly occurring incorrect atom names

Running filtermol2.csh requires the executable filtermol2.

A sample run looks like this:

% filtermol2.csh
Filtering a directory of .mol2 files - Version 03/10/2014
This script requires the following file to be present in this directory:
     The executable program filtermol2
 
Name of the input directory containing the ligand .mol2 files=test_lig
Name of the directory containing the filtered ligand .mol2 files=filt_lig
 Checking for filtermol2
Minimum molecular weight to keep [0]=
Maximum molecular weight to keep [9999999]=250
Minimum formal charge to keep [-99]=0
Maximum formal charge to keep [99]=0
Do you want to check the ligands for short bonds and steric clashes (y/n) [y]?
Maximum steric clash to allow [9999999]=0.4
Do you want to check the ligands for connectivity (y/n) [y]?
 
Number of files found in test_lig : 21
Trying file MS0023153.S1.mol2
Trying file MS0033023.S1.mol2
Trying file MS0048677.S1.mol2
Trying file MS0051036.S1.mol2
Trying file MS0054531.S1.mol2
Trying file MS0056306.S1.mol2
Trying file MS0056310.S1.mol2
Trying file MS0056313.S1.mol2
Trying file MS0056314.S1.mol2
Trying file MS0056315.S1.mol2
Trying file MS0056317.S1T1.mol2
Trying file MS0056317.S1T2.mol2
Trying file MS0056320.S1.mol2
Trying file MS0056321.S1.mol2
Trying file MS0056322.S1T1.mol2
Trying file MS0056322.S1T2.mol2
Trying file MS0056322.S1T3.mol2
Trying file MS0056323.S1.mol2
Trying file MS0056323.S2.mol2
Trying file MS0056324.S1.mol2
Trying file MS0056328.S1.mol2
Number of files kept: 6
Check the file filtermol2.log for detailheck the file filtermol2.log for details
%

II.4. Get a list of Autodock 4 atom types used in a ligand library

The C-shell script get_typlist.csh reads all the ligand structures in a directory, extracts a list of Autodock 4 atom types and creates a file all.pdbqt with atoms having all the atomtypes found - this file will be needed for the screening run.

Running get_typlist.csh requires the executable get_typlist and the template file all0.pdbqt. It also uses Python and thus has to set the path to it.

A sample run looks like this:

 % get_typlist.csh 
Gather atom types used in this library - written by Mihaly Mezei
Version 10/22/2007 - Autodock 4
Running Thu Oct 25 11:23:52 EDT 2007

Running on faradis.mssm.edu
Host specific information (to be customized):
    MGLROOT = /Library/MGLTools/1.4.5
    pythonutil = /Library/MGLTools/1.4.5/MGLToolsPckgs/AutoDockTools/Utilities24
End host specific information

Name of the ligand  database file contaning the .mol2 files=../mol2_files_t
/private/var/automount/Users/mezei/autodock4/mol2_files_t
Number of files found in ../mol2_files_t : 3
 get_typlist: file 000007.pdbqt opened OK
 get_typlist: file all0.pdbqt opened
 Read  26  lines from 000007.pdbqt
 Read  24 lines from all0.pdbqt
 Types found so far= C  OA
 Types found so far= C  OA A  Cl NA HD
 get_typlist: file all0.pdbqt opened as new
 mol2 file No: 1
 get_typlist: file 000008.pdbqt opened OK
 get_typlist: file all.pdbqt opened
 Read  34  lines from 000008.pdbqt
 Read  24 lines from all.pdbqt
 Types found so far= C  OA A  Cl NA HD
 Types found so far= C  OA A  Cl NA HD N 
 get_typlist: file all.pdbqt opened as new
 mol2 file No: 2
 get_typlist: file 000009.pdbqt opened OK
 get_typlist: file all.pdbqt opened
 Read  21  lines from 000009.pdbqt
 Read  24 lines from all.pdbqt
 Types found so far= C  OA A  Cl NA HD N 
 mol2 file No: 3
 get_typlist: file 000011.pdbqt opened OK
 get_typlist: file all.pdbqt opened
 Read  23  lines from 000011.pdbqt
 Read  24 lines from all.pdbqt

Final all.pdbqt file (copied to /private/var/automount/Users/mezei/autodock4/alkynes):
REMARK  3 active torsions:
REMARK  status: ('A' for Active; 'I' for Inactive)
ROOT
ATOM      1  C1  <0> d           1.327   3.455  -0.398  0.00  0.00     0.036 C
ATOM      2  O2  <0> d           1.420   1.980  -0.001  0.00  0.00    -0.032 OA
ATOM      3  C1  <0> d           2.376   1.251  -0.914  0.00  0.00     0.031 A
ATOM      4 C12  <0> d           3.741   1.077  -0.220  0.00  0.00     0.035 Cl
ATOM      5  N5  <0> d           4.048   2.342   0.567  0.00  0.00     0.013 NA
ATOM      6  H19 <0> d           3.145   2.387   1.810  0.00  0.00     0.008 HD
ATOM      7  N9  <0> d           1.805   1.812   1.483  0.00  0.00     0.013 N
ATOM      8  C8  <0> d           0.592   2.429   2.198  0.00  0.00     0.027 C
ATOM      9  C9  <0> d          -0.610   1.949   1.340  0.00  0.00     0.006 C
ATOM     10  C10 <0> d          -0.017   1.420   0.010  0.00  0.00     0.240 C
ATOM     11  C13 <0> d           3.178   3.801   2.370  0.00  0.00     0.033 C
ATOM     12  C14 <0> d           4.601   4.036   2.906  0.00  0.00     0.070 C
ATOM     13  C15 <0> d           5.624   3.720   1.831  0.00  0.00    -0.059 C
ATOM     14  C16 <0> d           6.588   4.614   1.625  0.00  0.00    -0.092 C
ATOM     15  C17 <0> d           7.646   4.378   0.641  0.00  0.00     0.383 C
ATOM     16  O2  <0> d           8.375   5.279   0.269  0.00  0.00    -0.452 OA
ATOM     17  C18 <0> d           7.797   2.971   0.104  0.00  0.00     0.029 C
ATOM     18  C19 <0> d           6.408   2.416  -0.189  0.00  0.00     0.030 C
ATOM     19  C20 <0> d           5.503   2.449   1.031  0.00  0.00     0.172 C
ENDROOT
TORSDOF 0
Processed 23 .mol2 files
%

II.5. Replace the .mol2 partial charges with AM1-calculated charges

The C-shell script gausscharge.csh runs Gaussian for all ligands in a directory, and replaces the partial charges with the result of the Mulliken population analysis of their AM1 wavefunction. Optionally, the conformation can be minimized with Gaussian. Running gausscharge.csh requires the C-shell script mol2togauss.csh and the executables gausstomol2, mol2togauss. mol2togauss prepares the input file for Gaussian and gausstomol2 extracts the charges from the Gaussian output file and replaces the values in the .mol2 file.

The programs mol2togauss and gausstomol2 can be run independently. The first command-line argument is the full name of the .mol2 file. mol2togauss takes the second argumnt as an indicator for optimization: it will set up input for an optimization run if the first character is 'y' or 'Y'. For example,
% mol2togauss test.mol2 yes
will create an input file test.mol2.g99 from the structure in test.mol2 asking for optimization.
% gausstomol2 test.mol2
will replace the charges and coordinates of test.mol2 by the charges and coordinates calculated by Gaussian (read from the Gaussian output file test.mol2.g99out) and write a new mol2 file called test.mol2.am1.

A sample run looks like this:

% gausscharge.csh
Name of the directory containing the ligand .mol2 files=hits_mol2
Name of the directory to put the converted .mol2 files=hits_mol2_am1
First molecule to use [1]=
Last molecule to use [1000000]=
Name of the Gaussian executable [gaussian]=
Is your Gaussian version 03 or higher? (y/n)? n
Do you want to remove all Gaussian files (y/n)? n
Number of CPUs to use [10]=4
Do you want to optimize the structure as well (y/n)? n
Number of CPUs to be used: 4
Current directory: /hosts/fulcrum/home/mezei/MOLMOD/autodock/morph
 
Converting ligands in directory hits_mol2 to directory hits_mol2_am1
Number of files found in hits_mol2 : 2
Starting conversion for 5236332.1.mol2
WARNING:
Reserve directory  /hosts/pepi/reserve/mezei is not found
/hosts/pepi/scr/mezei  will be used instead
more WARNING:
You do not even have a scr directory here, the current directory:
/hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph  will be used instead

II.6. Creating an eHiTS clip file from Autodock grid information

The program make_clipfile.f can read the description of an Autodock grid (easily obtainable with AutoDockTools) and create a clip file that eHiTS can read. make_clipfile can be run independently or can be called from fullscreen.csh, the script setting up a virtual screening.

When run interactively, the user has the option to also write CONECT records (so the box edges will show in the graphics display) and to write a skeleton .gpf file that can be read by the fullscreen.csh script

An interactive run of make_clipfile looks like this:

 >make_clipfile.exe
 Creating a clip box pdb file - Version 04/25/2014
 Macromolecule=kinase
 Writing clip file kinase_clip.pdb
 X coordinate of the box center [ 0.00000 ]=41
 Y coordinate of the box center [ 0.00000 ]=22
 Z coordinate of the box center [ 0.00000 ]=38
 Number of gridpoints in the X direction [   0]=60
 Number of gridpoints in the Y direction [  60]=66
 Number of gridpoints in the Z direction [  60]=60
 Grid size [ 0.37500 ]=
 Do you want CONECT record in the clip file (y/n)[n]? y
 Do you want a skeleton .gpf file (y/n)[n]? y
 Writing skeleton .gpf file kinase_0.gpf
>

Note that by default eHiTS extends the box thus defined by 10 Å in each direction to define the target atoms that are included in the docking calculations.

II.7. Aggregating individual ligand files to a single file

The program eHiTS requires the ligands in a single file. The program aggregate can create such a file from a directory of individual liagnd files. The program can be run either with command-line input (from the directory the files are):

$ ../aggregate -fn ../AGG.mol2 -ex mol2 -mn a -sm 1
 Aggregating ligand files into a single file  - Version 11/20/2013
 All files with extension mol2 in the current directory will be aggregated
 into file ../AGG.mol2
 Actions taken will be logged on file ../AGG.mol2.log
 Files with filenames of the form X.M.Y will be skipped if only M changed
 The ligand file name will be added to the molecule name
 Processed       21 ligands
 Number of ligands skipped=       4
$

or interactively:

$ ../aggregate
 Aggregating ligand files into a single file  - Version 11/20/2013
 Aggregate ligand file name=../AGG.mol2
 Ligand file extension=mol2
 Do you want to skip files with just middle extension changed (y/n) [n] y
 Select molecule name treatment option
 Leave the molecule name read <U>nchanged . . . . . . : u
 <R>eplace the molecule name with the ligand file name: r
 <A>dd the ligand file name to the molecule name . . .: a [u] a
 Output file appears to exist - do you wan to replace it (y/n) [n] y
 Log file appears to exist - do you wan to replace it (y/n) [n] y
 All files with extension mol2 in the current directory will be aggregated
 into file ../AGG.mol2
 Actions taken will be logged on file ../AGG.mol2.log
 Files with filenames of the form X.M.Y will be skipped if only M changed
 The ligand file name will be added to the molecule name
 Processed       21 ligands
 Number of ligands skipped=       4
$

The following command-line options are implemented:

-fn NAME: Aggregate file name
-ex EXT: Ligand file extension (mol2 or sdf or pdb)
-mn C: Molecule name option (for mol2 files only):
- C=u: molecule names are kept as read after the MOLECULE tag
- C=a: File name will be added to the molecule name read
- C=r: File name replace the molecule name read
-sm C: C=1: for ligand files of the form X.M.E skip the file if only M changed; C=0: keep all ligands
-st C: C=1: for ligand files of the form X.SnTm.E skip the filer if only m changed; C=0: keep all ligands

III. Runnig the screening

III.1. Submitting the job(s)

The screening script fullscreen.csh requires individual .mol2 or .pdbqt or .sdf files in a single directory. It also requires a sample file all.pdbqt for Autodock 4 containing all the atom types that are used in the library. These files are provided in the distribution - see also the script get_typlist.csh.

There are placeholders in the script to specify the path to the

Autodock, PLANTS or eHiTS executables
Python executable (Pythonsh)
Python library (Pythonutil)
Que name to submit the jobs to
Maximum number of CPUs allowed to be used

If these are not changed in the script to the particular system the screening is to be run, the user will be prompted for them.

The script fullscreen.csh asks the user to specify

The docking software to be used
The macromolecule file name macro (without the .pdbqt, .pdb, .mol2, or .mae extension) to be run.
The ligand library directory path or ligand file name
The number of CPU's to use
Optionally, for Autodock 4 or Vina, the flexible macromolecule file name
The number of dockings per ligand
The GA algorithm parameters
The grid parameters (center, sizes, gridsize)
For ligand in .mol2 format, options are provided for filtering out ligands
The extent of cleanup in the docking directory

fullscreen.csh run results in the creation of the directories needed for docking:

macro_dock_A for Autodock 4
macro_dock_V for Autodock Vina
macro_ehits and macro_work for eHiTS
macro_dock_P for PLANTS
macro.glide for Glide

as well as a log file macro_<sw>.log. Here sw is a 1-character symbol specifying the software used for docking - see the documentation of the program Dockres; This is achieved by calling the script screenlist_loop.csh or screen_setup.csh for Autodock 4, Autodock Vina, and PLANTS; by calling ehits.sh for eHiTS and the requsite Schrodinger utilities for Glide. Note that ehits.sh (part of the eHiTS distribution) has been modified to accept a few more parameters; the modofied and original ehits.sh files are added to the distribution.

screenlist_loop.csh and screen_setup.csh run on system with the SGE, PBS or LSF queuing system, on any shared-memory Unix/Linux system or on any Unix/Linux system using a single CPU. For other queing systems the script has to be extended.

For Autodock 4, these C-shell scripts prepare the input for Autogrid, run it and for each ligand prepare the corresponding input file for Autodock and run it. They use the executable filtermol2 and a system-dependent script submit_*.csh that calls Autodock.

Autodock Vina does not need separate Autogrid run, just the preparation of the ligand input file(s) in .pdbqt format. There are submit scripts for three queing systems:

dockit_SGE.csh for systems using the Sun Grid Engine (SGE)
dockit_PBS.csh for systems using the Portable Batch System (PBS)
dockit_LSF.csh for systems using the Load Sharing Facility (LSF)

Once the grid maps are ready (if needed), depending on the user, the following actions can be taken:

Submit the allowed number of docking jobs to run on different CPUs. It creates a directory runcount where each job creates a file when starting the run and deleting it when the run ends. The number of files in runcount is also used to control the submission of subsequent jobs for runs on a shared memory system while runs on the PBS, SGE or LSF queing systems use the result of a qstat statement. Note, that the number of CPU's to be used can be modified during the run - see the instruction printed in the sample run below.
For Autodock runs, the script can be asked to just run Autogrid and the various scripts preparing the input for the docking runs (.pdbqt and .dpf files for each ligand) but not running the docking jobs. In that case, the user has the option of
- Copy the directories prepared (after tar and compress) to a different system (e.g., a supercomputer) and just run it there. That system only has to have Autodock installed, but not the script libraries.
- Prepare the input file (list of run commands) for the TACC launcher utility or the Mount Sinai selfscheduler. The former needs to run subsequently the script Launcher.sge on TACC/Lonestar while the latter needs to execute the script selfsched.csh.

When requesting an Autodock-4 run the script also ask if screening by Vina and by eHiTS is requested as well. When requesting an Autodock-Vina run the script also ask if screening by eHiTS is requested as well.

A sample run looks like this:

 % fullscreen.csh
                Automated screening using Autodock
             Written by Mihaly Mezei - version 06/02/2010

Select docking software:
Autodock-4   : 4
Autodock-Vina: v
eHiTS:         e 4
Docking with Autodock-4 selected
Will the grid have more than 128 points in any direction (y/n)? n
Host is unrecognized. Select your queing system from the list below
SUN grid engine:       g
PBS queuing system     p
TACC Launcher:         l
Single CPU (no que):   1
Multiple CPU (no que): s
None of the above:     n
Your choice (g/p/l/1/s/n): g
Path to the Python executable (pythonsh)=/share/apps/MGLTools-1.5.4/bin/pythonsh
Path to the Python utilities (pythonutil)=/share/apps/MGLTools-1.5.4/MGLToolsPckgs/AutoDockTools/Utilities24
Path to the directory where the Autodock executables are=/share/apps/autodock4mezei/autodocksuite-4.0.1/bin/i86Linux2
Maximum number of CPU's to use allowed=200
Name of the que to submit the jobs=orte
NOTE: you can replace the placeholder NEWHOST in this script with your host
and make the assignments within the script (instead of entering it interactively)
Name of the macromolecule file (without the .pdbqs or .pdbqt or .pdb)=target
target.pdbqt found
Do you have flexible residues (y/n) [n]? n
This script requires the following files to be present in this directory:

     The executable program filtermol2
     The executable program checkresnum
     The sample structure file with all atom types all.pdbqt
     The awk script mod_gpf.awk
     The csh script add_dpf.csh
     The c-shell script screenlist_loop_4.csh
     The c-shell script dockit_gridengine.csh

Checking checkresnum
Checking filtermol2
Checking mod_gpf.awk
Checking mod_dpf.awk
Checking screenlist_loop_4.csh
Checking dockit_gridengine.csh
 Checking residue numbers in file target.pdbqt
 Residue number check - Version 09/23/08
Number of CPUs to use [20]=100
Do you want to adjust the number of CPUs automatically (y/n)? y
Minimum number of CPUs to use [30]=
NOTE: The automatic adjustment can be turned off via a file target_A.NEWNCPU
First molecule to use [1]=
Last molecule to use [99999999]=
Name of the directory containing the ligand files=../CHEMBR_am1_opt_mol2
Number of dockings per ligand (50 or more recommended) [50]=100
RMSD tolerance for clustering (in A) [1.0]=
rmstol set to 1.0
Maximum number of GA energy evaluations (0 torsions) [250000]=
Maximum number of GA energy evaluations (1-2 torsions) [500000]=
Maximum number of GA energy evaluations (3-5 torsions) [1000000]=
Maximum number of GA energy evaluations (6-10 torsions) [2000000]=
Maximum number of GA energy evaluations (11-  torsions) [3000000]=
Maximum number of GA generations [27000]=
Maximum number of GA populations [200]=
Ligand charge option
     Add Gasteiger charges : g
     Add Kollman   charges : k
     Keep input    charges : i [g]

X[Grid center] (in A) [0]=10
Y[Grid center] (in A) [0]=3
Z[Grid center] (in A) [0]=1
Grid spacing (in A) [0.375]=
gridspace set to 0.375
Number of gridpoints in the X direction (even number)=100
Number of gridpoints in the Y direction (even number) [100]=
Number of gridpoints in the Z direction (even number) [100]=
Do you want to ignore symmetry for ligand pose clustering (y/n) [n]?
Do you want to print all members of the clusters (y/n) [n]?
Do you want to drop files with names of the form H.M.T
       where only the M part differs (y/n) [n]?
Do you want to partition the number of dockings for files  of the form H.M.T
       where only the M part differs (y/n) [y]?
Minimum number of docking attempts per copy [20]=
Are the ligand files already in .pdbqt form (instead of .mol2) (y/n) [n]
Do you want to only run the setup but skip docking for now (y/n) [n]?
Do you want to remove all ligand .mol2, .pdbqt, .dpf  files (y/n) [n]? y

All ligand .mol2, .pdbqt, .dpf files will be removed
Docking ligands in directory ../CHEMBR_am1_opt_mol2 to macromolecule target.pdbqt
RMSD tolerance for clustering: 1.0 A
Files with names of the form H.M.T where only the M part differs will
share the number of docking attempts
Minimum number of docking attempts per copy= 20
Number of dockings/ligand: 100
Maximum number of GA energy evaluations (0 torsions)=250000
Maximum number of GA energy evaluations (1-2 torsions)=500000
Maximum number of GA energy evaluations (3-5 torsions)=1000000
Maximum number of GA energy evaluations (6-10 torsions)=2000000
Maximum number of GA energy evaluations (11- torsions)=3000000
Maximum number of GA generations: 27000
Maximum number of GA populations: 200
Grid center at < 10 , 3 , 1 >
Number of gridpoints in the x, y, and z direction: 100, 100, and 100
Grid spacing = 0.375 A
Gasteiger charges will be added to the ligand
Grids will be generated for atom types (based on all.pdbqt):
1 C
2 OA
3 A
4 N
5 HD
6 Cl
7 NA
8 SA
9 P
10 F
11 Br
12 S
13 I

Number of CPUs to be used: 100
The number of CPUs to use will be adjusted automatically every hour
Calculations will be logged on file target_4.log
The rm command (to delete a file) is aliased to
For proper functioning of this script, the alias will be removed
thus files will be removed without confirmation.
OK to submit the run (y/n)? y
[1] 23784
 %

In this example, the script did not recognize our host and prompted for the system dependent indormation. There is a template in the script to enter this data specific to a particular host - this way the user will not be prompted for that information.

The screening with eHiTS is also initiated with the fullscreen.csh script. A sample run looks like this:

 % fullscreen.csh
                Automated screening using Autodock
             Written by Mihaly Mezei - version 06/02/2010

Select docking software:
Autodock-4: 4
eHiTS:      e e
Docking with eHiTS selected
Host is unrecognized. Select your queing system from the list below
SUN grid engine:       g
PBS queuing system     p
TACC Launcher:         l
Condor pool:           c
Single CPU (no que):   1
Multiple CPU (no que): s
None of the above:     n
Your choice (g/p/l/c/1/s/n): g
Path to the directory where the script ehits.sh is=../eHiTS
Maximum number of CPU's to use allowed=200
Name of the que to submit the jobs=orte
NOTE: you can replace the placeholder NEWHOST in this script with your host
and make the assignments within the script (instead of entering it
interactively)
Name of the macromolecule file (without the .pdb)=target
target.pdb found
New directory target_ehits has been created

Checking checkresnum
 Checking residue numbers in file target.pdb
 Residue number check - Version 02/03/10
Number of CPUs to use [20]=100
Docking accuracy (1-6)[6]=
Do you want to create a clipfile from ATD box info (y/n) [y]?
X[Grid center] (in A) [0]=3
Y[Grid center] (in A) [0]=-1
Z[Grid center] (in A) [0]=
Grid size (in A) [0.375]=
Number of gridpoints in the X direction=88
Number of gridpoints in the Y direction [88]=90
Number of gridpoints in the Z direction [88]=70
Clip file target_clip.pdb has been successfully created
Name of the file (with FULL PATH) containing the ligand files=/scratch/1/Users/mezeim01/autodock4/FDA_2k_agg.mol2
Ligand files in file /scratch/1/Users/mezeim01/autodock4/FDA_2k_agg.mol2 will be used
Ligands are expected in mol2 format
Best pose for each ligand (that was docked successfully) will be in the file
/scratch/1/Users/mezeim01/autodock4/test/target/results.sdf
For analysis with Dockres, run splitmol to create individual .sdf files
Working directory for docking: /scratch/1/Users/mezeim01/autodock4/test/target_work
Clip file: target_clip.pdb
OK to submit the run (y/n)? y
[1] 23784
 %

The screening with Glide is also initiated with the fullscreen.csh script. The script optionally calls the prepwizard utility if the target is not in .mae format and the ligprep utility if the ligands are not available in .mae format. The Glide run setup will allow the user to select the docking mode, precision, number of poses pre ligand to save and the range of ligand numbers to use.

A sample run looks like this:

% fullscreen.csh
       Automated screening using Autodock, Vina, Glide, PLANTS or eHiTS
         Written by Mihaly Mezei - Version 09/01/2015

NOTE: selecting Autodock-4 gives you the option to run Vina and eHiTS as well
      selecting Autodock-Vina gives you the option to run eHiTS as well

Select docking software:
Autodock-4   :   4
Autodock-Vina:   v
Glide        :   g
PLANTS       :   p
eHiTS:           e g
Docking with Schrodinger Glide selected
Please be sure to have a host file 'schrodinger.hosts'
in ~/.schrodinger or job's working directory.
https://hpc.mssm.edu/about/schrodinger

$SCHRODINGER : SCHRODINGER home directory
$eMolDB      : eMolecules DB directory
Name of the macromolecule file (without the .mae, .maegz or .pdb)=TSHR_TMD
Log file TSHR_TMD_glide_setup.log is present - do you want to overwrite it (y/n)? y
Removing TSHR_TMD_glide_setup.log
TSHR_TMD.mae found
Server is Minerva - distributed memory system using the LSF queuing system
Name of the file containing the ligands (without the .sd, .sdf, or .mae)=dyrka1_l
dyrka1_l.mae found
File dyrka1_l.mae contains 18 ligands
Grid file TSHR_TMD.grd is found do you want to use it (y/n) [y]?

Name of the account to run the dockings on=acc_sbdd
Name of the queue to run the dockings [alloc]?
Number of CPUs to use [20]=10
First molecule to use [1]=
Last molecule to use [99999999]=
Existing TSHR_TMD.grd file will be used
Partition to run Glide (manda/mothra/bode) (a/o/b)? a
Time limit (in whole hours) [24]=12
Docking method (flexible/rigid ligand) (f/r)[f]?
Precision (extra/standard/high thougput) (x/s/h)[s]? x
RMSD (in A) for pose clustering [0.5]? 1.0
Number of poses/ligand to write out [1]? 5
Do you want the output poses compressed (y/n)[n]?
cd $LS_SUBCWD run_glide.csh
TSHR_TMD.glide.in:
RECEP_FILE TSHR_TMD.mae
GRIDFILE TSHR_TMD.grd
LIGANDFILE dyrka1_l.mae
DOCKING_METHOD confgen
PRECISION XP
POSE_RMSD 1.0
POSES_PER_LIG 5
COMPRESS_POSES FALSE
Job submission file run_glide.csh is ready in the directory TSHR_TMD.glide
If you want to add/change options then select stop, edit run_glide.csh
and submit it with the command
bsub -J TSHR_TMD.G < run_glide.csh
Do you want to submit the Glide job (y/n) [y]? y
Glide run files will be in the directory TSHR_TMD.glide:
total 33280
-rw-rw-r--. 1 mezeim01 sbdd      157 Sep  1 12:46 TSHR_TMD.glide.in
-rw-rw-r--. 1 mezeim01 sbdd 32381620 Sep  1 12:46 TSHR_TMD.grd
-rw-rw-r--. 1 mezeim01 sbdd   743676 Sep  1 12:46 TSHR_TMD.mae
-rw-rw-r--. 1 mezeim01 sbdd   743979 Sep  1 12:46 TSHR_TMD_recep.mae
-rwxrwxr-x. 1 mezeim01 sbdd      184 Sep  1 12:46 run_glide.csh
%

This will create the Glide input file $macro.glide.in and run_glide.csh in the directory $macro.glide. Additional docking options can be added manually to the file $macro.glide.in. Note also, that run_glide.csh is written for the LFS queing system. For others it has to be changed manually.

III.2.Tracking the jobs

The scripts screenlist_loop_*.csh keep messages from the jobs running in the directory runcount. If a docking fails to complete, the file (woth extension run will be copied into the directory where the docking log files are kept. If the script screenlist_loop_*.csh aborts then the file corresponding to this job (that would be deleted when the job exits normally) may not be deleted. The script cleanruncount.csh checks for such files and offers to delete them.

III.3.Controlling the number of CPU's during the run

Except for eHiTS, for screenings that don't use the selfscheduler or the launcher the number of CPUs used by a job (i.e., the number of ligands docked simulatanously on different CPU's) can be controlled by creating a file in the directory running the job called macro_<sw>.NEWNCPU where <sw> is A, V, or P for Autodock-4, Autodock-Vina, or PLANTS, resp. This file should have one line, containing

a number - the new number of CPUs. However, if the automatic adjustment is turned on, the number will be ignored. If reduction is requested the change is executed by attrition: the next job will only be submitted when the currently running jobs number less than the new limit.
optionally, one of the words STOP, YES or NO.
- If STOP is present, the script exits, i.e., no more docking will be submitted, but the ones running will continue until completed.
- If YES is present then the automatic adjustment of CPU numbers to use will be turned on (if it was off)
- If NO is present then the automatic adjustment of CPU numbers to use will be turned off (if it was on)
optionally, an other number: the minimum number of jobs to run (when the automatic adjustmentis on)

IV. Post-processing

These functions include the extraxting, sorting, filtering and analyzing of the results, as well as some house cleaning.

IV.1. Gather a list of docking log files

The script getdir.csh looks into the directory of docking log files and extracts the name of the grid parameter file and of all the logfiles. This information is written into a file macro_<sw>.dir and used by the program dockres. macro_<sw>.dir is a simple text file; its format is described in the documentation of the program Dockres;

A sample run looks like this:

% getdir_new.csh
Select docking software:
Autodock-4   : 4
Autodock-Vina: v
eHiTS:         e
GOLD:          g
Docking with Autodock-4 selected
Name of the macromolecule file (without the .pdbq*)=cbx_h
cbx_h.pdbqt found
Existing cbx_h_A.dir file is removed
Directory of .dlg files: cbx_h_dock - OK (y/n)? y
Number of files found in cbx_h_dock : 2114
Number of cbx_h .dlg files found in cbx_h_dock : 208
%

The run above read a directory that was prepared by the fullscreen.csh script. For directories that were obtained in a different manner the user has to specify the directory name and possibly the name and the location of the grid-parameter file (.gpf).

For runs with eHiTS there are two options: the single file results.sdf containing the top scoring ligand pose for each ligand can be split into single ligand pose files with the program splitmol or the script getdir.csh could be asked to gather a list of the .sdf files containing all the poses saved by eHiTS.

IV.2. Extract, sort, filter and analyze docked poses

The program Dockres gathers the top binders and diplays a variety of statistics, both on the ligand set and on the top binding poses. It writes the result in a file with extension .res and, if requested, extracts docked poses into .pdb files. Since it can be run independently of these utilities, it has a separate documentation.
If Dockres extracted a PDB file with the target and a number of ligands, the program makepml.f can make a Pymol log script that defines each extracted ligand pose as a separate object, making it easy to show/hide them.
The c-shell script cantotan.csh creates a similarity matrix from a list of top-scoring ligands generated by Dockres that can be read by Simulaid and run clustering on. The script asks for the list of canonical SMILES of the ligands in the library scanned and calculates the similarity matrix with the pySIML software.
The program Compligset can run various operations on a set of top-scoring ligands (e.g., average, overlap, difference).

IV.3. 'House-cleaning' of the docking log directory

When a directory contains a large number of files the ls command in the C shell is known to fail. To circumvent this, there are three scripts that clean, copy, compress and uncompress files in a directory, irrespective of the nuber of files there:

compressdir.csh: Compresses or uncomresses all files in a directory, using either gzip or compress.
clean_dock_dir.csh: Removes all files with extension mol2, new, pdbq, pdbqt, or dpf. These are the extensions of files that the screening script fullscreen.csh creates. The removal will also take place if the files are gzip-ed (i.e., have an additional .gz extension).
clean_ehits.csh: Removes most log files and empty directories from the work directory tree.
del_ext.csh: Deletes all files with a user-defined extension and/or header in a directory. The removal will also take place if the files are gzip-ed (i.e., have an additional .gz extension).
copy_ext.csh: Copies all files with a user-defined extension from one directory to an other.
CPU.csh: extracts the last line containing the CPU time used fro the run form each .dlg file.
avgtime: Fortran program calculating the avrege CPU time of a screening run. It assumes a file CPU to contain the last line from the .dlg files (prepared, e.g., with the script CPU.csh.

clean_dock_dir.csh and compressdir.csh accept the directory name as an argument, e.g.,
% compressdir x_dock
will process all files in the directory x_dock. If no argument is given, each script will prompt for the directory name to clean, compress or uncompress.