Instructions for running DOCK on Minerva
Running DOCK
involves three major steps:
- Preparation of the ligand conformational database(s), called flexibase
- Running the docking
- Extracting results, post processing
1. Preparation of the ligand conformational database
NOTE: This step is not needed if the ligands are from the ZINC database
since one can download flexibase versions of their ligand sets.
- Copy all files from the directory /sc/hydra/projects/~mezeim01/DOCK_FB
to your work directory
- Create a single file containing all your ligands in .sdf format.
- Files in other format can be converted with OpenBabel.
The script run_babel.csh will convert files in a directory to
.sdf-formatted files and write it in an other directory.
- The script aggregatesdf.csh will combine
.sdf files into a single file
- There is also a script add_mol2name.csh that
adds to the molecule name record of a .mol2 file the ligand file name
(without the .mol2 extension).
- Once the file containing the ligand .sdf files is at hand,
run the script subprepG2_L.csh. It will ask for the aggregated sdf file
of your ligands and create databases in the form DOCK expects them,
with names 1_** where ** is just two characters.
The script processes ca two ligands per minute.
2. Preparation of the target protein and running the docking
This phase also consists of several steps.
The programs and scripts needed are in the directory
/sc/hydra/projects/mezeim01b/DOCK_screen.
- Define the library paths by typing
> module load dock
- Copy all files from the directory /sc/hydra/projects/mezeim01b/DOCK_screen
to your work directory
- Make/obtain pdb files for receptor and a ligand at the target site.
Save them as rec.pdb and xtal-lig.pdb (keep these names).
If the files contain chain IDs, remove them.
- Take care of the hydrogens and charges
with AutoDockTools
(ADT).
- Read the file rec.pdb
- If there are non-polar hydrogens merge them (Edit->Hydrogens)
- Set the default protonation state
(ADT sets all histidines to be protonated on both nitrogens
(+1 charge) as the default!).
via File->Preferences->Set. This will open a
"Set User Preferences" widget; select the AutoDockTools tab and set the HIS
protonation state as needed (usual default: ND1).
- Alternatively, protonation states can be also set individually, as follows:
- If "Edit Histidine Hydrogens" option is missing from the
"Edit"->Hydrogens" menu then Load Pmv/repairCommands module:
- File->Browse Commands; click on Pmv under "Select a package"
- Scroll down until you see repairCommands and click on it
- click on "Load Module" and "DISMISS"
- Now "Edit Histidine Hydrogens" is available in the menu
displayed by clicking on Edit->Hydrogens.
- Clicking on Edit->Hydrogens->Edit Histidine Hydrogens opens a
widget listing each histidine present in the current selection (or in
the viewer if there is nothing currently selected) followed by 3
radiobuttons in columns labelled "0,HD1", "0, HE2" and "+1".
- You set the
pattern of protonation as you wish and then click on "Apply".
The most common choice is HE2 but the environment may affect it.
This option has the added advantage that it provides a list of the histidines
needed in the hislist file (see below).
- Add polar hydrogens (Edit->Hydrogens)
- Add charges (Edit->Charges-> Calculate Gasteiger Charges)
- Save as a .PDBQT file
- Change (if needed) the histidine residue names to HID (proton on ND1),
HIE (proton on NE2), HIP (both ND1 and NE2 are protonated) in the file
rec.pdb (Amber convention).
- Check for incomplete residues. When the total charge added is not
an integral number then it is likely that some residues are incomplete
or broken. To find out which are these residues, run a Clean operation
with Simulaid - it will list residues
with non-integral charge sums.
If the non-integral fraction is of the order of 0.01e or less then
allow Simulaid to fix it; otherwise fix manually with an editor as needed.
- Make sure that there is no chain ID (i.e., col 22 is blank) in
rec.pdb and xtal-lig.pdb.
- Update/create the file hislist with the receptor histidine residues.
The file has one line for each histidine residue; each line contains the
residue name and the residue number (HIS nnnnnn).
Make sure that there is no blank line in it.
One way to do this is executing
grep HIS rec.pdb | grep CA | awk '{print $4 " " $5}' > hislist
where you have to replace HIS with the residue name corresponding to the
protonation state used.
Also, a space has to be inserted between theresidue name and number
in the file hislist.
If needed, the Clean operation of Simulaid can remove the chain ID (answer yes when you are asked if you want to override/remove the
segmentid).
- Cysteines that do not form hydrogen bonds need to have hydrogens (HG)
added to them.
If they are not present then the following procedure can add them:
- > module load openbabel (make sure it is version 2.3.2 or higher)
- > obabel -ipdb rec.pdb -xr -p 5.0 -opdbqt > HS_rec.pdb
- > grep "HG CYS" HS_rec.pdb
NOTE: there are exactly two spaces between HG and CYS.
The lines extracted by this grep command have to be manually inserted after
the respective SG atoms because the atoms in the HS_rec.pdb file are
hopelessly scrambled.
- Rename H atoms in rec.pdb and xtal-lig.pdb:
>./rename.atoms.4dock.sh rec.pdb hislist
- If there are cysteines forming disulfide bond(s) (i.e., no HSG atoms),
change the corresponding residue names from CYS to CYX
- Make sure the last record of the pdb file is not TER
- Make a copy of the file rec.pdb as rec.crg:
>cp rec.pdb rec.crg
- Execute the Makefile (type make).
> make
If the receptor or ligand pdb files were edited on a Windows system, the
make may abort with some cryptic message.
If that happens, process the files as follows:
> /sc/hydra/projects/mezeim01b/simulaid/cleanM < rec.pdb > rrr.pdb
> mv rrr.pdb rec.pdb
> cp rec.pdb rec.crg
> /sc/hydra/projects/mezeim01b/simulaid/cleanM < xtal-lig.pdb > xxx.pdb
> mv xxx.pdb xtal-lig.pdb
After that, copy to rec.crg and repeat the make.
- Create folders to run the lead-like database:
> ./create_dockrun.pk.sh database_path database_dir 1 1
where database_path/database_dir is the full path to the directory
containing the (gzipped) database file(s)
- Submit the docking run by executing the interactive script run_dock.csh
3. Extracting results, post processing
The docking results will be scattered in the directories that the
create_dockrun.pk.sh script created. The scripts MUD
(Michaels Utilities for Docking)
are designed to deal with this situation as follows.
- Run the script extract.csh
It will execute the following three steps
(than can be executed separately as well):
- Remove directories without result files by running
/sc/hydra/projects/mezeim01b/DOCK_screen/remove_empty.csh.
- Generate a combined score list
> module load mud
> module load python
> python $mud/combine.py -i docking_dir/ --box ../grids/box --done
where $mud is the path to the utilities MUD and
docking_dir is the full path to the directory the docking was run
- Create a PDB file with the top-scoring ligand poses:
> python $mud/topdock.py -i docking_dir/ -f docking_dir/combine.scores -o top1000.pdb -t 1000
- replace the 1000 with the number of poses you need
- The program
dockres.
(in /sc/hydra/projects/mezeim01b/fullscreen)
can create PDB files with the target and ligand poses;
filter the poses by various criteria; and provide detailed descriptions of
the poses.
- For screening the extracted ligands with Autodock,
dockres can also write separate PDB
files of the extracted ligands into a directory.
- Alternatively, the program can split the aggregate filer
into individual ligand .pdb files.
- The top-scoring ligand set can also be clustered by running the script
~/sc/hydra/projects/mezeim01b/simulaid/cluster_top.csh. The script will
- Create canonical SMILES strings of the ligands withe OpenBabel
- Calculate a matrix of Tanimoto coefficients
with the utility cSMILEStoMatrices from
https://simtk.org
in a form the program Simulaid can read
- Run Simulaid
to cluster the ligands based on the similarity matrix
- Two sets of ligand SMILES strings can be compared with
the script /sc/hydra/projects/mezeim01b/DOCK_screen/compare_smiles.csh
4. How does DOCK work?
- 45 matching spheres were used.
Matching spheres were based on the atoms of the crystallographic ligand.
- The spheres were labeled for chemical matching
based on the local receptor environment.
- The degree of ligand sampling is determined by the bin size,
bin size overlap, and distance tolerance.
These three parameters were set to 0.2, 0.1 and 1.2 A respectively,
for both the binding site matching spheres and the docked molecule.
(default parameters)
- Ligand conformations passed and initial steric filter,
a physic-based scoring function is used to evaluate the fit to the
receptor binding site.
- For the best scoring conformation of each docked molecule,
100 steps of rigid-body minimization are carried out.
- The score for each conformation is calculated as the sum of the
receptor-ligand electrostatic and van der Waals interaction energy,
corrected for ligand desolvation.
These three terms are evaluated form precalculated grids.
- The electrostatic potential in the binding site was prepared
using the program Delphi.
Partial charges from the united atom AMBER force field
(Mg2+ and Ca2+ ions partial charges were set to 1.4).
- The program CHEMGRID was used to generate a van der Waals grid,
which is based on an united atom version of the AMBER ff.
- The desolvation penalty for a ligand conformation is estimated
from a precalculated transfer free energy of the molecule
between solvents of dielectric 78 and 2.
The desolvation energy is obtained by weighting the transfer free energy
with a scaling factor that reflects the degree of burial
of the ligand in the receptor binding site.
- The partial charges are computed by AMSOL 6 using the
SM5.42R solvation model in organic solvent.
The AMSOL input parameters look like the following:
CHARGE=$netchg AM1 1SCF CART TLIMIT=1 GEO-OK SM5.42R & SOLVNT=GENORG IOFR=1.4345 ALPHA=0.00 BETA=0.00 GAMMA=38.93 & DIELEC=2.06 FACARB=0.00 FEHALO=0.00
We only calculate the charges on one ligand conformation,
but we use them for the whole ligand conformational ensemble.
(thanks to Michael Mysinger for this phrasing)
Last modified: 06/09/2020