The PsuGeometry library is designed to make producing geoemetric and data reports of protein structures simple, powerful and beautiful.
Protein structures and electron density matrices are downloaded from the Protein Databank in Europe (Velankar et al, 2009). Alternatively manual or edited pdbs can be used by having the files in the same directory as the pdb/electron density files.
7 plot types have been designed based on matplotlib and seaborn, allowing you to simply decide: which geometric measures to correlate; what colour; and for what.
You can define any geeomtric measures you like based on the standard atom naming of protein structures: these will be either distances (eg N:CA), angles (eg N:CA:C) or dihedrals (eg N:CA:C:N+1).
You are not limited to standard geeomtric measures: you can correlate N:C-3 against CA:CB+1 if you so wish
(distance between N of the reference residue and C 3 residues back against distance between CA of the reference residue and 1 CB forwards).
The library includes the ability to view a wide variety of hues for these measures, from other geoemtric measures for an extra dimension, to bfactors, amino acid, and most uniquely, electron density.
There is also the ability to look directly at the electron density (e.g. x,y,z or c,r,s coordinates against bfactor or 2FoFc) and to explore data in the pdb structures.
For example, you can correlate the electron density of the atoms against the number of electrons in the atoms, or the bfactor aginast the secondary structure
The code for running reports is given for the Level 1 reports and provided in the link for Levels 2 and 3. All the code is preceeded with these 4 lines:
from PsuGeometry import GeoReport as psu
pdbDataPath = 'ProteinDataFiles/pdb_data/' #This is whaetever directory you keep you pdb files in (ent format) - will download if missing
edDataPath = 'ProteinDataFiles/ccp4_data/' #This is whatever directory you keep your electron density files in (ccp4 format) - will download if missing
printPath = 'ProteinDataFiles/results_psu/' #This is where you want the html report to be written
Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., … De Hoon, M. J. L. (2009). Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163
Hamelryck, T., & Manderick, B. (2003). PDB file parser and structure class implemented in Py-thon. Bioinformatics, 19(17), 2308–2310. https://doi.org/10.1093/bioinformatics/btg299
Joosten, R. P., Te Beek, T. A. H., Krieger, E., Hekkelman, M. L., Hooft, R. W. W., Schneider, R., Vriend, G. (2011). A series of PDB related databases for everyday needs. Nucleic Ac-ids Research, 39(SUPPL. 1), 411–419. https://doi.org/10.1093/nar/gkq1105
Kabsch, W., & Sander, C. (1983). Dictionary of Protein Secondary Structure: Pattern Recogni-tion of Hydrogen-Bonded and Geometrical Features. Biopolymers, 22, 2577–2637.
Velankar, S., Best, C., Beuth, B., Boutselakis, C. H., Cobley, N., Sousa da Silva, A. W., … Kleywegt, G. J. (2009). PDBe: Protein Data Bank in Europe. Nucleic Acids Research, 38(SUPPL.1), 308–317. https://doi.org/10.1093/nar/gkp916
Yao, S., & Moseley, H. N. B. (2019). A chemical interpretation of protein electron density maps in the worldwide protein data bank Software and full results available at : https://www.biorxiv.org/content/10.1101/613109v1
Contact Rachel by
email.
- Return to Rachel's
home page -
Notes on citations here