Rachel Alcraft PsuGeometry

The PsuGeometry library is designed to make producing geoemetric and data reports of protein structures simple, powerful and beautiful.

Protein structures and electron density matrices are downloaded from the Protein Databank in Europe (Velankar et al, 2009).
Alternatively manual or edited pdbs can be used by having the files in the same directory as the pdb/electron density files.

7 plot types have been designed based on matplotlib and seaborn, allowing you to simply decide: which geometric measures to correlate; what colour; and for what. You can define any geeomtric measures you like based on the standard atom naming of protein structures: these will be either distances (eg N:CA), angles (eg N:CA:C) or dihedrals (eg N:CA:C:N+1). You are not limited to standard geeomtric measures: you can correlate N:C-3 against CA:CB+1 if you so wish (distance between N of the reference residue and C 3 residues back against distance between CA of the reference residue and 1 CB forwards).

The library includes the ability to view a wide variety of hues for these measures, from other geoemtric measures for an extra dimension, to bfactors, amino acid, and most uniquely, electron density.

There is also the ability to look directly at the electron density (e.g. x,y,z or c,r,s coordinates against bfactor or 2FoFc) and to explore data in the pdb structures. For example, you can correlate the electron density of the atoms against the number of electrons in the atoms, or the bfactor aginast the secondary structure

Each Plot Type

1.Scattter Plot

2.Probability Density Plot

3.Histogram

addScatter(geoX='',geoY='',data=None,title='',ghost=False,operation='',splitKey='',hue='bfactor',palette='viridis_r',centre=False,vmin=0,vmax=0,categorical=False,sort='ASC',restrictions={},exclusions={})

addProbability(geoX='',geoY='',data=None,title='',ghost=False,operation='',splitKey='',hue='bfactor',palette='viridis_r',centre=False,vmin=0,vmax=0,categorical=False,restrictions={},exclusions={})

addHistogram(geoX='',data=None,title='',ghost=False,operation='',splitKey='',hue='',palette='crimson',count=False,restrictions={},exclusions={})

count	mean	std	min	25%	50%	75%	max
525.0	1.48	0.01	1.44	1.47	1.48	1.48	1.51

4.Pdb Data Report

5.Electron Density Report

6.Contact Map

addDataView(pdbCode, geoX, geoY, palette='viridis', hue='2FoFc', categorical=False, title='',centre=False,sort='ASC')

addDensityView(pdbCode, geoX, geoY, peaks=True,divisor=10, palette='viridis', hue='2FoFc', categorical=False, title='')

addCloseContact(pdbCode,atomA,atomB,distanceLimit=8,ridLimit=2,palette='viridis',hue='distance',categorical=False,title='')

7.Probability Density Difference Report

addDifference(dataA=None,dataB=None,geoX='',geoY='',restrictionsA={},restrictionsB = {},exclusionsA={},exclusionsB={},title='',palette='seismic')

Installation

The library relies on these Bioinformatics librararies:

BioPython for protein structure coordinates. (Cock et al, 2009; Hamelryck et al, 2003)
PDB EDA for loading the electron density. (Yao et al, 2019)
DSSP for asigning secondary structure. (Joosten et al, 2015; Kabsch & Sander, 1983)

There are also the common dependencies of matplotlib, seaborn, numpy and pandas

When it is released it will be a simple pip install

The code can be found on GitHub

Running Reports

The code for running reports is given for the Level 1 reports and provided in the link for Levels 2 and 3. All the code is preceeded with these 4 lines:

A simple correlation report, with example


  pdbList = ['1i1w']
  georep = psu.GeoReport(pdbList,pdbDataPath,edDataPath,printPath,ed=True,dssp=True)
  georep.addScatter(geoX='PHI',geoY='PSI',title='Ramachandran Plot')
  georep.addScatter(geoX='N:O',geoY='CB:O',title='NO-CBO')
  georep.addScatter(geoX='PSI',geoY='N:O',title='PSI-NO')
  georep.addScatter(geoX='PSI',geoY='CB:O',title='PSI-CBO')
  georep.printToHtml('Simple Correlations',2,'SimpleCorr')

A predefined report, with example


pdbList = ['1us0','1ejg','2cnq']
georep = psu.GeoReport(pdbList,pdbDataPath,edDataPath,printPath)
georep.printReport('RachelsChoice', 'rachel')

Report Description	Report Code	Report Example
Using density 2Fo-Fc as a hue in a correlation report	Code	Example
Using geometric measures as a hue in a geometric correlatio	Code	Example
Analysing a pdb structure in more detail	Code	Example

Report Description	Report Code	Report Example
Serialising to file and back for memory - 1000 structures	Code	Example

Examples

Code documentation

GeoReport.py This is the only class you will interact with in most ordinary uses.

References

Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., … De Hoon, M. J. L. (2009). Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163

Hamelryck, T., & Manderick, B. (2003). PDB file parser and structure class implemented in Py-thon. Bioinformatics, 19(17), 2308–2310. https://doi.org/10.1093/bioinformatics/btg299

Joosten, R. P., Te Beek, T. A. H., Krieger, E., Hekkelman, M. L., Hooft, R. W. W., Schneider, R., Vriend, G. (2011). A series of PDB related databases for everyday needs. Nucleic Ac-ids Research, 39(SUPPL. 1), 411–419. https://doi.org/10.1093/nar/gkq1105

Kabsch, W., & Sander, C. (1983). Dictionary of Protein Secondary Structure: Pattern Recogni-tion of Hydrogen-Bonded and Geometrical Features. Biopolymers, 22, 2577–2637.

Velankar, S., Best, C., Beuth, B., Boutselakis, C. H., Cobley, N., Sousa da Silva, A. W., … Kleywegt, G. J. (2009). PDBe: Protein Data Bank in Europe. Nucleic Acids Research, 38(SUPPL.1), 308–317. https://doi.org/10.1093/nar/gkp916

Yao, S., & Moseley, H. N. B. (2019). A chemical interpretation of protein electron density maps in the worldwide protein data bank Software and full results available at : https://www.biorxiv.org/content/10.1101/613109v1

Contact Rachel by email. - Return to Rachel's home page - Notes on citations here

--- Rachel Alcraft, Bioinformatics ---