Home ~ Citing ~ LeucipPy ~ LeucipPlus ~ DensityFlight


LeucipPy : A Protein Geometry Correlation Python Library


Summary

LeucipPy is a python library that interfaces with biopython protein structures to facilitate geometric correlations.
The interface is brief, there are a few helper functions (e.g. that return links to the ebi, or Engh&Huber stats tables...).
There are also 2 primary classes:

  • GeoDataFrame returns a pandas dataframe which you can analyse how you like - matplotlib, export to R etc....
  • GeoHTML takes a dataframe or a numpy matrix and exports it to preformatted html reports (it doesn't have to have come from GeoDataFrame).

    The pre-formatted HTML reports are my preferences for analysis and include shortcuts for the kinds of things I look at that, including probabilty comparisons and overlaying surfaces.

    Installation

    Currently it can be installed from Test PyPi

    pip install -i https://test.pypi.org/simple/ LeucipPy-pkg-RachelAlcraft

    The library relies on these Bioinformatics librararies:


    Each Plot Type

    Not all these are implemented during transition to PyPi, and there are some more eg hexbins.

    1.Scattter Plot2.Probability Density Plot3.Histogram

    scatter

    probability

    histogram

    countmeanstdmin25%50%75%max
    525.01.480.011.441.471.481.481.51
    4.Pdb Data Report5.Electron Density Report6.Contact Map

    DataView

    DensityView (not currently implemented)

    Contacts

    7.Probability Density Difference Report

    Difference


    Running Reports

    Examples on how to run these reports and correlations are given on a Google Colaboratory script.

    Google colab example script


    Examples


    Code documentation

    from LeucipPy import LeucipPy as leu

    You need to first create a list of biopython objects. This downloads from the servers for you.
    (Nb this is code to create biopython structures.)

    import os
    from urllib.request import urlretrieve
    import Bio.PDB as bio
    from LeucipPy import LeucipPy as leu
    ########
    pdb_codes = ['1egj','3nir'] #A list of whatever pdb codes you wish to look at
    ########
    parser = bio.PDBParser()
    strucs = []
    for pdb_code in pdb_codes:    
      pdb_file, pdb_html_loc = leu.getPdbLink(pdb_code)
      print(pdb_file, pdb_html_loc)  
      if not os.path.exists(pdb_file):
          urlretrieve(pdb_html_loc, pdb_file)
      struc = parser.get_structure(pdb_code,pdb_file)
      strucs.append(struc)
     

    Use LeucipPy to generate a dataframes of geoemtric correlations

     from LeucipPy import GeoDataFrame as gdf
     
     ########
     geos = ['C:O','N:CA']
     hues = ['bfactor','aa']
     #######
     geo = gdf.GeoDataFrame(strucs)
     df = geo.calculateGeometry(geos,hues)
    

    This class is a shortcut to formatting reports and they are saved in an html format. The transition to PyPi is underway so not all reports are currently available. Currently:

    from LeucipPy import GeoHTML as ghm
    
    rep = ghm.GeoHTML('LeucipPy Colab Report','ColabLeucippy.html')
    rep.addPlot2d(df,'scatter','C:O','C:N+1','bfactor')
    rep.addPlot2d(df_data,'seaborn','atom_no','bfactor','element') 
    rep.addLineComment('1 dimensional data below')
    rep.addPlot1d(df,'histogram','C:O')
    rep.addDataFrame(df['C:O'].describe(),'C:O')
    rep.addBoxComment('Produced by LeucipPy')
    rep.printReport()
      


    References

    Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., … De Hoon, M. J. L. (2009). Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163

    Hamelryck, T., & Manderick, B. (2003). PDB file parser and structure class implemented in Py-thon. Bioinformatics, 19(17), 2308–2310. https://doi.org/10.1093/bioinformatics/btg299

    Joosten, R. P., Te Beek, T. A. H., Krieger, E., Hekkelman, M. L., Hooft, R. W. W., Schneider, R., Vriend, G. (2011). A series of PDB related databases for everyday needs. Nucleic Ac-ids Research, 39(SUPPL. 1), 411–419. https://doi.org/10.1093/nar/gkq1105

    Kabsch, W., & Sander, C. (1983). Dictionary of Protein Secondary Structure: Pattern Recogni-tion of Hydrogen-Bonded and Geometrical Features. Biopolymers, 22, 2577–2637.

    Velankar, S., Best, C., Beuth, B., Boutselakis, C. H., Cobley, N., Sousa da Silva, A. W., … Kleywegt, G. J. (2009). PDBe: Protein Data Bank in Europe. Nucleic Acids Research, 38(SUPPL.1), 308–317. https://doi.org/10.1093/nar/gkp916


    Created by: Rachel Alcraft ~ Home page: Leucippus ~ Supervisor: Mark A. Williams ~ Birkbeck, University of London (2021)