Running DOCK on Minerva

Instructions for running DOCK on Minerva

Running DOCK involves three major steps:

Preparation of the ligand conformational database(s), called flexibase
Running the docking
Extracting results, post processing

1. Preparation of the ligand conformational database

NOTE: This step is not needed if the ligands are from the ZINC database since one can download flexibase versions of their ligand sets.

Copy all files from the directory /sc/hydra/projects/~mezeim01/DOCK_FB to your work directory
Create a single file containing all your ligands in .sdf format.
- Files in other format can be converted with OpenBabel. The script run_babel.csh will convert files in a directory to .sdf-formatted files and write it in an other directory.
- The script aggregatesdf.csh will combine .sdf files into a single file
- There is also a script add_mol2name.csh that adds to the molecule name record of a .mol2 file the ligand file name (without the .mol2 extension).
Once the file containing the ligand .sdf files is at hand, run the script subprepG2_L.csh. It will ask for the aggregated sdf file of your ligands and create databases in the form DOCK expects them, with names 1_** where ** is just two characters. The script processes ca two ligands per minute.

2. Preparation of the target protein and running the docking

This phase also consists of several steps. The programs and scripts needed are in the directory /sc/hydra/projects/mezeim01b/DOCK_screen.

Define the library paths by typing
> module load dock
Copy all files from the directory /sc/hydra/projects/mezeim01b/DOCK_screen to your work directory
Make/obtain pdb files for receptor and a ligand at the target site. Save them as rec.pdb and xtal-lig.pdb (keep these names). If the files contain chain IDs, remove them.
Take care of the hydrogens and charges with AutoDockTools (ADT).
- Read the file rec.pdb
- If there are non-polar hydrogens merge them (Edit->Hydrogens)
- Set the default protonation state (ADT sets all histidines to be protonated on both nitrogens (+1 charge) as the default!). via File->Preferences->Set. This will open a "Set User Preferences" widget; select the AutoDockTools tab and set the HIS protonation state as needed (usual default: ND1).
- Alternatively, protonation states can be also set individually, as follows:
  - If "Edit Histidine Hydrogens" option is missing from the "Edit"->Hydrogens" menu then Load Pmv/repairCommands module:
    - File->Browse Commands; click on Pmv under "Select a package"
    - Scroll down until you see repairCommands and click on it
    - click on "Load Module" and "DISMISS"
    - Now "Edit Histidine Hydrogens" is available in the menu displayed by clicking on Edit->Hydrogens.
  - Clicking on Edit->Hydrogens->Edit Histidine Hydrogens opens a widget listing each histidine present in the current selection (or in the viewer if there is nothing currently selected) followed by 3 radiobuttons in columns labelled "0,HD1", "0, HE2" and "+1".
  - You set the pattern of protonation as you wish and then click on "Apply". The most common choice is HE2 but the environment may affect it.
  This option has the added advantage that it provides a list of the histidines needed in the hislist file (see below).
- Add polar hydrogens (Edit->Hydrogens)
- Add charges (Edit->Charges-> Calculate Gasteiger Charges)
- Save as a .PDBQT file
- Change (if needed) the histidine residue names to HID (proton on ND1), HIE (proton on NE2), HIP (both ND1 and NE2 are protonated) in the file rec.pdb (Amber convention).
- Check for incomplete residues. When the total charge added is not an integral number then it is likely that some residues are incomplete or broken. To find out which are these residues, run a Clean operation with Simulaid - it will list residues with non-integral charge sums. If the non-integral fraction is of the order of 0.01e or less then allow Simulaid to fix it; otherwise fix manually with an editor as needed.
Make sure that there is no chain ID (i.e., col 22 is blank) in rec.pdb and xtal-lig.pdb.
Update/create the file hislist with the receptor histidine residues. The file has one line for each histidine residue; each line contains the residue name and the residue number (HIS nnnnnn).
Make sure that there is no blank line in it. One way to do this is executing
grep HIS rec.pdb | grep CA | awk '{print $4 " " $5}' > hislist
where you have to replace HIS with the residue name corresponding to the protonation state used. Also, a space has to be inserted between theresidue name and number in the file hislist. If needed, the Clean operation of Simulaid can remove the chain ID (answer yes when you are asked if you want to override/remove the segmentid).
Cysteines that do not form hydrogen bonds need to have hydrogens (HG) added to them. If they are not present then the following procedure can add them:
- > module load openbabel (make sure it is version 2.3.2 or higher)
- > obabel -ipdb rec.pdb -xr -p 5.0 -opdbqt > HS_rec.pdb
- > grep "HG CYS" HS_rec.pdb
Rename H atoms in rec.pdb and xtal-lig.pdb:
>./rename.atoms.4dock.sh rec.pdb hislist
If there are cysteines forming disulfide bond(s) (i.e., no HSG atoms), change the corresponding residue names from CYS to CYX
Make sure the last record of the pdb file is not TER
Make a copy of the file rec.pdb as rec.crg:
>cp rec.pdb rec.crg
Execute the Makefile (type make).
> make
Create folders to run the lead-like database:
> ./create_dockrun.pk.sh database_path database_dir 1 1
where database_path/database_dir is the full path to the directory containing the (gzipped) database file(s)
Submit the docking run by executing the interactive script run_dock.csh

3. Extracting results, post processing

The docking results will be scattered in the directories that the create_dockrun.pk.sh script created. The scripts MUD (Michaels Utilities for Docking) are designed to deal with this situation as follows.

Run the script extract.csh It will execute the following three steps (than can be executed separately as well):
- Remove directories without result files by running /sc/hydra/projects/mezeim01b/DOCK_screen/remove_empty.csh.
- Generate a combined score list
  > module load mud
  > module load python
  > python $mud/combine.py -i docking_dir/ --box ../grids/box --done
  where $mud is the path to the utilities MUD and docking_dir is the full path to the directory the docking was run
- Create a PDB file with the top-scoring ligand poses:
  > python $mud/topdock.py -i docking_dir/ -f docking_dir/combine.scores -o top1000.pdb -t 1000
  - replace the 1000 with the number of poses you need
The program dockres. (in /sc/hydra/projects/mezeim01b/fullscreen) can create PDB files with the target and ligand poses; filter the poses by various criteria; and provide detailed descriptions of the poses.
For screening the extracted ligands with Autodock, dockres can also write separate PDB files of the extracted ligands into a directory.
Alternatively, the program can split the aggregate filer into individual ligand .pdb files.
The top-scoring ligand set can also be clustered by running the script ~/sc/hydra/projects/mezeim01b/simulaid/cluster_top.csh. The script will
- Create canonical SMILES strings of the ligands withe OpenBabel
- Calculate a matrix of Tanimoto coefficients with the utility cSMILEStoMatrices from https://simtk.org in a form the program Simulaid can read
- Run Simulaid to cluster the ligands based on the similarity matrix
Two sets of ligand SMILES strings can be compared with the script /sc/hydra/projects/mezeim01b/DOCK_screen/compare_smiles.csh

4. How does DOCK work?

45 matching spheres were used. Matching spheres were based on the atoms of the crystallographic ligand.
The spheres were labeled for chemical matching based on the local receptor environment.
The degree of ligand sampling is determined by the bin size, bin size overlap, and distance tolerance. These three parameters were set to 0.2, 0.1 and 1.2 A respectively, for both the binding site matching spheres and the docked molecule. (default parameters)
Ligand conformations passed and initial steric filter, a physic-based scoring function is used to evaluate the fit to the receptor binding site.
For the best scoring conformation of each docked molecule, 100 steps of rigid-body minimization are carried out.
The score for each conformation is calculated as the sum of the receptor-ligand electrostatic and van der Waals interaction energy, corrected for ligand desolvation. These three terms are evaluated form precalculated grids.
The electrostatic potential in the binding site was prepared using the program Delphi. Partial charges from the united atom AMBER force field (Mg2+ and Ca2+ ions partial charges were set to 1.4).
The program CHEMGRID was used to generate a van der Waals grid, which is based on an united atom version of the AMBER ff.
The desolvation penalty for a ligand conformation is estimated from a precalculated transfer free energy of the molecule between solvents of dielectric 78 and 2. The desolvation energy is obtained by weighting the transfer free energy with a scaling factor that reflects the degree of burial of the ligand in the receptor binding site.
The partial charges are computed by AMSOL 6 using the SM5.42R solvation model in organic solvent. The AMSOL input parameters look like the following: CHARGE=$netchg AM1 1SCF CART TLIMIT=1 GEO-OK SM5.42R & SOLVNT=GENORG IOFR=1.4345 ALPHA=0.00 BETA=0.00 GAMMA=38.93 & DIELEC=2.06 FACARB=0.00 FEHALO=0.00 We only calculate the charges on one ligand conformation, but we use them for the whole ligand conformational ensemble. (thanks to Michael Mysinger for this phrasing)

Last modified: 06/09/2020