Circular variance for macromolecular topography

Mihaly Mezei

Published in the Journal of Molecular Graphics and Modeling, 21, 463-472 (2003).
DOI:10.1016/S1093-3263(02)00203-6

Download preprint in PDF format

The basic idea is that if there is a set of points and a query point (R_o below), the sum of vectors from the query point to the set of points will approach zero when the query point is in the 'middle' of the set and apprach the sum of the vector lengths when the query point is way outside:

Circular variance, used to characterize the spread in a set on angles is defined as

This can be written in the more general form of

This form is valid in three dimensions as well (in that case it is also referred to as spherical variance). Applying this form to the vectors from the query point gives a smooth 0-1 scale for the extent of burial of the query point in the set. In the example below a 6 A slice of bacteriorhodopsin simulated in water is shown, where each atom is color-coded by its circular variance w.r.t. the atoms of the protein.

Detecting pocket regions (steps 1-2)
1. Overlay a grid on the macromolecule and remove the gridpoints that are covered by an atom of the macromolecule (see also Stahl et al., Prot. Engng., 13, 218 (2001)).
2. Find the connected clusters of the remaining gridpoints. One of these clusters will include all the gridpoints external to the macromolecule while the rest will delineate various cavities.

Detecting pocket regions (steps 3-5)
3. Calculate the CV for the remaining gridpoints with respect to the macromolecule
4. Eliminate all gridpoints where the value of CV is below a threshold value, CV_max
5. Each connected cluster will represent one pocket. The more gridpoints are in the cluster, the larger the pocket is.

Example: pockets of a bromodomain. One of the pockets is the binding site.

Detecting domain-separation

Domain separation can be detected by examining the circular varaince map, determined by the representative atoms of each residue (e.g., the alpha carbons for proteins):

where r_ij is the vector from the representative atom of residue i to that of residue j. Since a domain-separating segment is outside both domains it seprates, a low-CV swath of this CV map is diagnostic of domain-separating regions.

Example: CV map of bacteriorhodopsin. Loops connecting the transmembrane helices are domain-separators. Black bars below the map show the positions of the (transmembrane) helices. The loop between helices 3 and 4 is buried.

The calculations were performed with the programs Simulaid (insideness labeling and domain-separation map) and MMC (pocket regions), available at this website.

Back to the Mezei Lab home page

Last modified: 11/15/2004 (MM)