--- Rachel Alcraft, Bioinformatics ---

Home ~ Citing ~ PsuBirkbeck ~ PsuGeometry ~ PsuDensity ~ DensityFlight ~ PsuMultivarse


PSU Geometry


Library Description

The PsuGeometry library is designed to make producing geoemetric and data reports of protein structures simple, powerful and beautiful.

Protein structures and electron density matrices are downloaded from the Protein Databank in Europe (Velankar et al, 2009).
Alternatively manual or edited pdbs can be used by having the files in the same directory as the pdb/electron density files.

7 plot types have been designed based on matplotlib and seaborn, allowing you to simply decide: which geometric measures to correlate; what colour; and for what. You can define any geeomtric measures you like based on the standard atom naming of protein structures: these will be either distances (eg N:CA), angles (eg N:CA:C) or dihedrals (eg N:CA:C:N+1). You are not limited to standard geeomtric measures: you can correlate N:C-3 against CA:CB+1 if you so wish (distance between N of the reference residue and C 3 residues back against distance between CA of the reference residue and 1 CB forwards).

The library includes the ability to view a wide variety of hues for these measures, from other geoemtric measures for an extra dimension, to bfactors, amino acid, and most uniquely, electron density.

There is also the ability to look directly at the electron density (e.g. x,y,z or c,r,s coordinates against bfactor or 2FoFc) and to explore data in the pdb structures. For example, you can correlate the electron density of the atoms against the number of electrons in the atoms, or the bfactor aginast the secondary structure


Each Plot Type

1.Scattter Plot2.Probability Density Plot3.Histogram

addScatter(geoX='',geoY='',data=None,title='',ghost=False,operation='',splitKey='',hue='bfactor',palette='viridis_r',centre=False,vmin=0,vmax=0,categorical=False,sort='ASC',restrictions={},exclusions={})

addProbability(geoX='',geoY='',data=None,title='',ghost=False,operation='',splitKey='',hue='bfactor',palette='viridis_r',centre=False,vmin=0,vmax=0,categorical=False,restrictions={},exclusions={})

addHistogram(geoX='',data=None,title='',ghost=False,operation='',splitKey='',hue='',palette='crimson',count=False,restrictions={},exclusions={})

countmeanstdmin25%50%75%max
525.01.480.011.441.471.481.481.51
4.Pdb Data Report5.Electron Density Report6.Contact Map

addDataView(pdbCode, geoX, geoY, palette='viridis', hue='2FoFc', categorical=False, title='',centre=False,sort='ASC')

addDensityView(pdbCode, geoX, geoY, peaks=True,divisor=10, palette='viridis', hue='2FoFc', categorical=False, title='')

addCloseContact(pdbCode,atomA,atomB,distanceLimit=8,ridLimit=2,palette='viridis',hue='distance',categorical=False,title='')

7.Probability Density Difference Report

addDifference(dataA=None,dataB=None,geoX='',geoY='',restrictionsA={},restrictionsB = {},exclusionsA={},exclusionsB={},title='',palette='seismic')


Installation

The library relies on these Bioinformatics librararies:

When it is released it will be a simple pip install

The code can be found on GitHub


Running Reports

The code for running reports is given for the Level 1 reports and provided in the link for Levels 2 and 3. All the code is preceeded with these 4 lines:


from PsuGeometry import GeoReport as psu
pdbDataPath = 'ProteinDataFiles/pdb_data/'  #This is whaetever directory you keep you pdb files in (ent format) - will download if missing
edDataPath = 'ProteinDataFiles/ccp4_data/'  #This is whatever directory you keep your electron density files in (ccp4 format) - will download if missing
printPath = 'ProteinDataFiles/results_psu/' #This is where you want the html report to be written

Report Description Report Code Report Example
Using density 2Fo-Fc as a hue in a correlation report Code Example
Using geometric measures as a hue in a geometric correlatio Code Example
Analysing a pdb structure in more detail Code Example

Report Description Report Code Report Example
Serialising to file and back for memory - 1000 structures Code Example


Examples


Code documentation

  • GeoReport.py This is the only class you will interact with in most ordinary uses.
  • CloseContact.py
  • GeoAtom.py
  • GeoCalcs.py
  • GeoCsvReport.py
  • GeoDensity.py
  • GeoPdb.py
  • GeoPdbReport.py

  • References

    Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., … De Hoon, M. J. L. (2009). Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163

    Hamelryck, T., & Manderick, B. (2003). PDB file parser and structure class implemented in Py-thon. Bioinformatics, 19(17), 2308–2310. https://doi.org/10.1093/bioinformatics/btg299

    Joosten, R. P., Te Beek, T. A. H., Krieger, E., Hekkelman, M. L., Hooft, R. W. W., Schneider, R., Vriend, G. (2011). A series of PDB related databases for everyday needs. Nucleic Ac-ids Research, 39(SUPPL. 1), 411–419. https://doi.org/10.1093/nar/gkq1105

    Kabsch, W., & Sander, C. (1983). Dictionary of Protein Secondary Structure: Pattern Recogni-tion of Hydrogen-Bonded and Geometrical Features. Biopolymers, 22, 2577–2637.

    Velankar, S., Best, C., Beuth, B., Boutselakis, C. H., Cobley, N., Sousa da Silva, A. W., … Kleywegt, G. J. (2009). PDBe: Protein Data Bank in Europe. Nucleic Acids Research, 38(SUPPL.1), 308–317. https://doi.org/10.1093/nar/gkp916

    Yao, S., & Moseley, H. N. B. (2019). A chemical interpretation of protein electron density maps in the worldwide protein data bank Software and full results available at : https://www.biorxiv.org/content/10.1101/613109v1


    Contact Rachel by email. - Return to Rachel's home page - Notes on citations here