A current perspective on using R and Bioconductor for proteomics data analysis

Sebastian Gibb1, 2, Lisa M Breckels1, 3, Thomas Lin Pedersen4, Vladislav A Petyuk5, Kathryn S Lilley3 and Laurent Gatto1, 3

1 Introduction and objectives

With the continuous increase in data throughput and experimental designs complexity, the processing, analysis and interpretation of proteomics data becomes a major bottleneck that can be tackled by the appropriate use of statistical and computational tools. The R language and in particular the Bioconductor project have a major impact on other fields in high-throughput biology and benefited, in the recent years, from substantial contributions from the computational proteomics developers.

2 Methods

We summarise some of the latest R and Bioconductor developments in the field of proteomics, including the support of open community-driven formats for raw data and identification results, packages for peptide-spectrum matching, methods quantitative proteomics, mass spectrometry and quantitation data processing, visualisation and interpretation.

3 Results and Discussion

We provide figures of the number of new package submissions and downloads over the last Bioconductor releases to illustrate the recent interest of the proteomics community in the Bioconductor project. While the command line interface (CLI) represents a considerable novelty for many life scientists, numerous documentation and tutorials are available and an increasing number of tools also provide graphical user interfaces in addition to the CLI. We also discuss current needs and anticipated developments in the light of recent progress.

4 Conclusions

The R/Bioconductor environment addresses some important issues in computational proteomics and offers a unique set of interdisciplinary expertise, capabilities and flexibility in the existing proteomics software ecosystem. Noteworthy is also the introduction of tools and technique of R development and usage that permit open and reproducible computational research and data analysis, an area of increasing importance in the current data intensive area.

5 Resources

Footnotes:

1

Computational Proteomics Unit, Department of Biochemistry, University of Cambridge, Cambridge, UK

2

Department of Anesthesiology and Intensive Care, Medical Faculty Carl Gustav Carus, Technical University Dresden, Fetscherstr. 74, 01307 Dresden

3

Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK

4

Chr. Hansen A/S, Hørsholm, Denmark / Technical University of Denmark, Kgs. Lyngby, Denmark

5

Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA

Author: Laurent Gatto

Created: 2014-10-03 Fri 16:09

Emacs 24.3.1 (Org mode 8.2.7c)

Validate