Computational Proteomics Unit


The Computational Proteomics Unit was set up in August 2013 and is part of the Cambridge Centre for Proteomics. Its main activities centre around the sound analysis of proteomics data and integration of different sources of heterogeneous data. We work in close collaboration with biologists to tackle biologically challenging questions using statistics and machine learning to understand the data and uncover biologically relevant patterns. The development and publication of scientific software (1, 2) is an integral part of our work and is reflected by our contributions to the Bioconductor project.

Keywords – data analysis, experimental design, statistics, programming, R, scientific software, machine learning, reproducible research, proteomics.


Dr Laurent Gatto, head of the unit. Laurent moved to Cambridge in January 2010 to work in the Cambridge Centre for Proteomics on various aspects of quantitative and spatial proteomics, developing methodological advances and implementing computational tools with a strong emphasis on rigorous and reproducible data analysis. He is also a visiting scientist in the PRIDE team at the European Bioinformatics Institute, affiliate teaching staff at the Cambridge Computational Biology Institute, a Software Sustainability Institute fellow and a Software/Data Carpentry instructor.

Dr Lisa M. Breckels, Post-Doctoral Research Associate. Lisa joined the Cambridge Centre for Proteomics in November 2010 to work on the application of machipne learning techniques to the sub-cellular localisation of proteins using quantitative experimental organelle proteomics. She joined the CPU in August 2013.


Prof. Kathryn Lilley, Cambridge Centre for Proteomics, University of Cambridge.

Dr Sean Holden, Computer Science, University of Cambridge.

Dr Thomas Burger, CEA Grenoble, France.

Dr Christophe Dessimoz, University College London, UK.

Professor Alberto Paccanaro, Department of Computer Science, Royal Holloway University of London.

Dr Sebastian Gibb, Department of Anesthesiology and Intensive Care, University Medicine Greifswald.

Dr Johannes Rainer, Center for Biomedicine, EURAC, Bolzano, Italy

Former members

Mr Thomas Naake, undergraduate student. Thomas visited the group in Spring 2014 as an ERASMUS student affiliated to the University of Freiburg. He developed pRolocGUI, an interactive visualisation tools for organelle proteomics data.

Mrs Victoria Butt, Part III student in Systems Biology, applying graph-based methods to study protein sub-cellular localisation.

Selected publications

Breckels LM., Holden S., Wonjar D., Mulvey CM, Christoforou A,, Groen AJ., Kohlbacher O., Lilley KS., Gatto L. Learning from heterogeneous data sources: an application in spatial proteomics. PLoS Comput Biol. 2016 May 13;12(5):e1004920 doi:10.1371/journal.pcbi.1004920.

Gatto L, Breckels LM, Naake T and Gibb S Visualisation of proteomics data using R and Bioconductor. Proteomics. 2015 Feb 18. doi:10.1002/pmic.201400392. (PubMed, Publisher).

Huber W et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015 Jan 29;12(2):115-21 (PubMed, Publisher).

Gatto L., Breckels L.M., Burger T, Nightingale D.J.H., Groen A.J., Campbell C., Mulvey C.M., Christoforou A., Ferro M., Lilley K.S. A foundation for reliable spatial proteomics data analysis, Mol Cell Proteomics. 2014 May 20. (publisher, PubMed, software)

Vizcaino J.A. et al. ProteomeXchange: globally co-ordinated proteomics data submission and dissemination, Nature Biotechnology 2014, 32, 223 - 226. (PubMed, publisher)

Gatto L., Breckels L.M, Burger T, Wieczorek S. and Lilley K.S. Mass-spectrometry based spatial proteomics data analysis using pRoloc and pRolocdata, Bioinformatics, 2014 (software, PubMed, publisher).

Groen A., Sancho-Andrés G., Breckels LM., Gatto L., Aniento F. and Lilley K.S. Identification of Trans Golgi Network proteins in Arabidopsis thaliana root tissue Journal of Proteome Research, 2013 (PubMed, publisher).

Gatto L. and Christoforou A. Using R and Bioconductor for proteomics data analysis, Biochim Biophys Acta - Proteins and Proteomics, 2013. (PubMed, pre-print, software: Bioconductor - github)

Bond N.J., Shliaha P.V, Lilley K.S. and Gatto L. Improving qualitative and quantitative performance for MSE-based label free proteomics, J. Proteome Res., 2013 (PubMed, publisher, software).

Breckels L.M., Gatto L., Christoforou A., Groen A.J., Lilley K.S. and Trotter M.W.B. The Effect of Organelle Discovery upon Sub-Cellular Protein Localisation, Journal of Proteomics, 2013 (PubMed, software).

Gatto L. and Lilley K.S. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualisation, processing and quantitation, Bioinformatics, 28(2), 288-289, 2012 (PubMed - pdf - software).

Lilley K.S., Deery M.J. and Gatto L. Challenges for Proteomics Core Facilities, Proteomics, 11: 1017-1025, 2011 (PubMed - pdf).

Gatto L., Vizcaíno J.A., Hermjakob H., Huber W. and Lilley K.S. Organelle proteomics experimental designs and analysis Proteomics, 10:22, 3957-3969, 2010 (PubMed - publisher - pdf).

Selected posters

Laurent Gatto, Lisa M. Breckels, Thomas Naake, Samuel Wieczorek, Thomas Burger and Kathryn S. LilleyA state-of-the-art machine learning pipeline for the analysis of spatial proteomics data 5 - 8 October 2014, Madrid, HUPO meeting.

Sebastian Gibb, Lisa M Breckels, Thomas Lin Pedersen, Vladislav A Petyuk, Kathryn S Lilley and Laurent Gatto A current perspective on using R and Bioconductor for proteomics data analysis 5 - 8 October 2014, Madrid, HUPO meeting.

Lisa Breckels, Sean Holden, Kathryn Lilley, Laurent GattoA transfer learning framework for organelle proteomics data European Conference on Computational Biology 2014, 7 - 10 Sep 2014.


CPU acknowledges the support from the following funding bodies


Wellcome Trust
Prime-XS project FP7


Computational Proteomics Unit
Cambridge Centre for Proteomics
Cambridge Systems Biology Centre
University of Cambridge
Tennis Court Road
Cambridge, CB2 1GA, UK

Laurent Gatto
phone lg390
phone +44 (0) 1223 760253