Description of the program PFP for the generation of protein fingerprints

Mihaly Mezei

Department of Pharmacological Sciences,

Icahn School of Medicine at Mount Sinai

New York, NY 10029

Mihaly.Mezei@mssm.edu

July 25, 2003.

Theory

(M. Mezei, Prot. Engng.,, 16, 713-715 (2003))

For a protein of n residues its fingerprint is defined by up to three of n x n binary matrices. The elements of the primary fingerprint matrix FP0 are defined by the angles that the line connecting the carbonyl carbons of residue i and j forms with the C=O bond's direction on residue i:

FP0ij= sign {[r(Oi)-r(Ci)] . [r(Cj)-r(Ci)]},

The secondary fingerprint matrices FP1 and FP2 are defined to allow the differentiaition between parallel and antipararllel sheets, and between different packing of helices, respectively. FP1 is defined by the angle between the line connecting the carbonyl carbons of residue i and j and the line connecting the C and N atoms of residue i:

FP1ij= sign {[r(Ni)-r(Ci)] . [r(Cj)-r(Ci)]}.

while FP2 is defined by the angle between the line connecting the carbonyl carbons of residue i and j and the normal to the plane formed by the C and O atoms of residue i and the N of residue i+1:

FP2ij= sign {[r(Ni+1)-r(Ci)]x [r(Oi)-r(Ci)] . [r(Cj)-r(Ci)]}.

Since the C=O directions essentially alternate by 180o in sheets, FP0 will be dominated by alternating white and black bars in such regions. On the other hand, the C=O directions are essentially parallel in helices, resulting in black equilateral right angle triangles located above the diagonal. The information in FP1 encodes the direction the backbone path takes, allowing to separate parallel and antiparallel sheets. The information in FP2 encodes the relative position of the backbone segments, allowing to separate differently packed helices. Generally, FP0 contains the most information and in several cases it can serve in itself to characterize the fold. Combining two maps results in a matrix whose elements can take four values.

Running the program

The program is run interactively. At first, the user is asked to specify the type of maps (FP0 based on C=O or N-H vectors, FP1, FP2, FP0 & FP1 combined or FP0 & FP2 combined). Next, the user is asked if only a plot is required or a comparison of two fingerprints. For comparison of composite fingerprints the user has the option of asking to count exact matches only or to count the matches in FP0 and FPi separately. Finally the input file name(s) have to be specified. The program expects PDB files (files in other formats can be converted to PDB format, for example, with the program Simulaid).

Output the program

The fingerprint matrices and the result of the comparison are written on asci-formatted files. The fingerprints are also plotted in Postcscript format and, optionally (vide infra) on an SGI screen with Iris-GL calls.

For runs comparing two proteins the fingerprint of the smaller one will be compared with the fingerprint of the larger one in all possible positions. Both the difference in the fingerprints and the RMSD of the overlaid backbone segment are printed on the output file and plotted on the postscript file. Furthermore, the overlaid backbones of the best matches will also be plotted (in stereo).

Compiling

The program is written in Fortran-77. It has been tested on several platforms (including Linux). To include the Iris-GL calls (when compiling on an SGI graphics workstation) the comments C@GL have to be removed from the source file and reference to the graphics libraries has to be included:

cat pfp.f | sed 's/C@GL//' > pfp_tmp.f
f77 pfp_tmp.f -o pfp.bin -lfgl -lgl
rm pfp_tmp.f