plosPLoS Comput BiolploscompPLoS Computational Biology1553-734X1553-7358Public Library of ScienceSan Francisco, USAPCOMPBIOL-D-11-0078810.1371/journal.pcbi.1002277EducationBiologyProteomicsBiochemistryComputational Mass Spectrometry–Based ProteomicsKällLukas1VitekOlga2*Science for Life Laboratory, Royal Institute of Technology, Stockholm, SwedenDepartment of Statistics, Department of Computer Science, Purdue University, West Lafayette, Indiana, United States of AmericaLewitterFranEditorWhitehead Institute, United States of America* E-mail: ovitek@stat.purdue.edu
The authors have declared that no competing interests exist.
1220111122011712e10022772011Käll, VitekThis is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Funding was provided by NSF CAREER grant DBI-1054826 to OV http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1054826 and the Swedish Research council. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
This is an original PLoS Computational Biology tutorial.
Goals and Challenges of Proteomics
Proteomics is defined as the system-wide characterization of all the proteins in an organism in terms of their sequence, localization, abundance, post-translational modifications, and biomolecular interactions. Modern proteomic investigations are increasingly quantitative and comprehensive [1]. Examples include the relative quantification of over 4,000 proteins in haploid and diploid yeast, which identified the pheromone signaling pathway as enriched in differential abundance [2]; determination of site- and time-specific dynamics of more than 6,000 phosphorylation sites of HeLa cells stimulated with epidermal growth factor [3]; and characterization of 232 multiprotein complexes in Saccharomyces cerevisiae, which proposed new cellular roles for 344 proteins [4]. Such investigations are now successfully utilized in functional biology [5], [6], genomics [7], [8], and biomedical research [9].
Challenges of proteomic studies stem from the complexity of the proteome and to its broad dynamic range. For example, the human genome contains around 20,000 protein coding genes. Their translation, combined with splicing or proteolysis, yields an estimated 50,000–500,000 proteins, and over 10 million different protein forms can be derived by somatic DNA rearrangements and post-translational modifications [10]. The abundance of protein species in human plasma spans more than 10 orders of magnitude [11]. Unlike oligonucleotides, proteins cannot be amplified, and therefore the objectives of proteomics are achieved by sensitive and scalable technologies identifying and quantifying proteins [12]. The overall mass spectrometry–based proteomic workflow is summarized in Figure 1.
10.1371/journal.pcbi.1002277.g001
Quantitative mass spectrometry–based proteomic workflow.
The workflow requires a tight integration of biological and experimental (red) and computational and statistical (yellow) analysis steps.
Experimental Design
Quantitative proteomic investigations are conducted in the context of biological variation [13], technical variation due to sample processing and spectral acquisition, and ambiguities of spectral interpretation. Statistical experimental design [14], [15] accounts for these sources of variation. The first goal of experimental design is to avoid biases [16], [17] (i.e., systematic errors in interpretation) by clearly defining the populations of interest, matching the individuals with respect to the confounding factors, randomizing the selection of matched individuals from the population, and randomizing sample allocation to the processing steps. The second goal is to ensure efficiency (i.e., minimal random variation and uncertainty for a given cost) by choosing an appropriate number of biological and technical replicates, and by allocating the replicates to experimental resources in balanced blocks. The steps of the statistical experimental design are summarized in Figure 2.
10.1371/journal.pcbi.1002277.g002
Experimental design.
Statistical experimental design consists of (a) defining the populations of interest, (b) randomly selecting biological replicates from the population and (optionally) matching confounding factors, (c) randomly allocating biological samples to spectral acquisition and (optionally) grouping the samples in balanced blocks for joint profiling, and (d) (optionally) acquiring technical replicate measurements on the biological samples. Replication, randomization, and blocking are necessary to avoid biases and maximize the efficiency of the experiment.
Mass Spectrometry–Based MeasurementsGlobal Label-Free LC-MS/MS Workflow
Mass spectrometry is currently the only technology for protein identification and quantification that is both high-accuracy and high-throughput [18]–[20]. Although many alternatives exist, shotgun liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS; overview in Figure 3) is most frequently used. Mass spectrometry is better amenable to characterizing peptides; therefore, LC-MS/MS starts by enzymatically digesting proteins into a peptide mixture. Next, liquid chromatography (LC) separates the peptides, and the separated peptides are ionized and further separated by the mass spectrometer according to their mass-to-charge ratio in a mass spectrum (MS). The mass spectra obtained from the same sample at different elution times form an LC-MS run, and intensities of MS peaks, are related to peptide abundance. For identification, the mass spectrometer isolates the biological material of selected MS peaks, subjects it to collision energy or another type of fragmentation, and separates the resulting fragments in a secondary (MS/MS) mass spectrum. The distances between the MS/MS peaks are used to infer the amino acid sequence of the parent MS peak. Since abundant MS1 peaks are more likely to be selected for fragmentation, relative peptide quantification can also be achieved by counting the number of identified MS/MS spectra.
10.1371/journal.pcbi.1002277.g003
Mass spectrometry–based measurements.
(a) Sample processing. Label-free quantification requires minimal sample manipulation, and acquires spectra from each sample in a separate mass spectrometry run. Label-based quantification varies in the timing and type of the labeling steps, but always simultaneously profiles two or more biological samples within a run. (b) Global label-free workflows achieve relative quantification by comparing counts of MS/MS spectra, or intensities of MS peaks between runs. Global label-based workflows compare intensities of reporter MS/MS fragments (iTRAQ) or MS peaks (SILAC, synthetic peptides). (c) Targeted workflows are an alternative to global quantification. They are most sensitive, but require an a priori knowledge of the proteins of interest, and of the technological characteristics of their peptides. Label-free targeted experiments compare intensities of transitions between runs, and label-based experiments within a run.
An LC-MS/MS experiment can identify and quantify thousands of proteins in complex mixtures. It requires minimal manipulation of the sample, and minimal prior information regarding its composition. However, the workflow has a number of deficiencies. Enzymatic digestion increases the complexity of the mixture. For example, a proteome comprising 5,000 proteins is expected to yield over 250,000 tryptic peptides, and minor cleavage and fragmentations of abundant proteins can obscure major events of low-abundant proteins, complicating the interpretation [21]. Dynamic range of mass spectrometers is limited to 3–4 orders of magnitude, and the direct LC-MS/MS analysis is biased towards most abundant peptides [22]. Technical variation can further undermine the identification and the quantification steps. A variety of extensions to this basic workflow have therefore been proposed.
The LC-MS/MS workflow is enhanced by labeling samples from different conditions metabolically (e.g., with SILAC [23], where stable isotopes are included in the growth medium of an organism), or chemically (e.g., with iTRAQ [24] or TMT [25], where reacting chemical labels are applied during sample processing). Samples with different labels are combined and analyzed by a mass spectrometer within a single LC-MS run. Peaks from the samples are subsequently recognized by label-induced mass shifts in MS (SILAC) or MS/MS (iTRAQ, TMT) spectra, and used for relative quantification. Labeling enables within-run comparisons of protein abundance, and improves the precision of quantification. Experimental design can further gain efficiency through optimal allocation of samples to the labels, e.g., in reciprocal or reference designs [26] or by using labeled synthetic peptides as references. However, labeling requires extra sample manipulation and increases the complexity of the sample.
Overcoming Limits of Dynamic Range: Targeted Workflows
The complexity of a biological mixture can be overcome by fractionation [27]; however, this severely undermines the throughput. A valuable alternative is selected reaction monitoring (SRM) (also referred to as multiple reaction monitoring, MRM), a targeted workflow where the mass spectrometer isolates a set of pre-defined peptides and their fragments during mass analysis [28]–[31]. The resulting peptide-fragment pairs (called transitions) are used for quantification. Since the isolation is highly specific, SRM enables the most sensitive mass spectrometry–based quantification currently available. For example, proteins expressed with fewer than 50 copies/cell were quantified in total yeast lysates [32]. As shown in Figure 3, SRM can be conducted in conjunction with both label-free and label-based workflows. The drawback of targeted workflows is that they only quantify a priori known proteins, require optimized experimental protocols, and limit the number of measurements per run to a few hundreds. Further technological developments [33] and optimal experimental designs [34] will help alleviate these drawbacks.
Computation and StatisticsIdentification of Peptides and Proteins
The computational and statistical analyses of the acquired spectra are illustrated in Figure 4. With the shotgun LC-MS/MS workflow, the first step is to identify sequences of amino acids that correspond to the MS/MS spectra. This has received much attention from both algorithmic and statistical viewpoints [35]–37. A predominant approach is the database search, which compares each observed spectrum to the theoretical spectra predicted from a genomic sequence database (or to the previously identified experimental spectra in a library [38]), and reports the best-scoring peptide-spectrum match (PSM). Emerging alternatives are de novo identifications and hybrid searches [39], [40].
10.1371/journal.pcbi.1002277.g004
Computation and statistics.
Analysis of the acquired spectra includes (a, b) signal processing, (c, d) significance analysis, and (e–h) downstream analysis. Methods in (a–d) must reflect the technological properties of the workflows. Methods in (e–h) are technology-independent and are similar to the analysis of gene expression microarrays, but their use is affected by uncertainty in protein identities and the incomplete sampling of the proteome.
Due to the stochastic nature of the MS/MS spectra [41], and to deficiencies of scoring functions and databases, the best-scoring PSMs are not necessarily correct. Statistical characterization of the identifications is necessary, and is now required by most journals [42]. This problem is frequently formalized as controlling the false discovery rate (FDR) in the list of reported PSMs [43], [44]. Representative methods for controlling FDR are two-group models, which view the reported PSMs as a mixture of correct and incorrect identifications [45], and methods utilizing decoy databases [46]. Typically, only around 30% of MS/MS spectra are confidently identified, and developing improved methods is an active area of research.
The task of identification extends to inferring peptides and proteins in the sample from the identified MS/MS spectra. This is challenging due to the “many-to-many” mapping of peptides to proteins, and of MS/MS spectra to peptides. Inference must enable parsimonious results, while maintaining the sensitivity and characterizing the confidence in the identifications. The problem of protein inference is not entirely solved. For example, arguments exist in favor [47] and against [48] reporting single-peptide protein identifications, and in favor [49] and against [50] the exclusive use of protease-specific peptides.
A typical experiment generates hundreds of thousands of MS/MS spectra, and open-source and commercial pipelines such as the Trans-Proteomic Pipeline [51] streamline spectral handling and interpretation through common infrastructure.
Quantification of Spectral Features
The next step in quantitative label-free LC-MS/MS experiments is to locate and quantify MS peaks, annotate them with peptide and sequence identities, and establish the correspondence of peaks between runs [52]. Label-based workflows with MS quantification (e.g., SILAC) search for pairs of peaks with known mass shifts that correspond to a same peptide. Workflows with MS/MS quantification (e.g., iTRAQ) locate and quantify reporter MS/MS fragments. All these tasks can be made difficult by irregular, overlapped, and missing peaks, chromatographic variations between runs, and incomplete and incorrect identifications. As a result, only a subset of the identified proteins is typically quantified [53]. A variety of signal processing software tools are reviewed in [54], and the representative ones are OpenMS [55] for label-based quantification and MaxQuant [56] for quantification with SILAC.
Targeted SRM experiments sidestep the need for identifying and aligning peaks, and signal processing focuses on peak detection, quantification, and annotation. However, difficulties can arise with overlapped or suppressed signals or incorrectly calibrated transitions, and computational methods can help filter out poor quality transitions [57], [58]. Pipelines such as Skyline [59], [60] and ATAQS [61] streamline these tasks.
Frequently, sample handling induces differences in the quantitative signals between runs, and global between-run normalization is necessary to distinguish true biological changes from these artifacts. Two common approaches to global normalization are sample-based and control-based. Sample-based normalization, e.g., quantile normalization or normalization based on the total ion current, makes the best use of the data, but assumes that the majority of features do not change in abundance [62]. Control-based normalization in preferred in experiments with few measurements or many biological changes.
Finding Differentially Abundant Proteins
Typical statistical goals of quantitative proteomics are protein quantification, i.e., estimation of protein concentration in a sample on a relative or absolute scale, and class comparison, i.e., determination of proteins that change in average abundance between conditions. To achieve this, it is often necessary to summarize the quantitative information across all the features that pertain to a protein. One such approach is spectral counting [63], which is based on the insight that in global LC-MS/MS peaks from abundant proteins are more frequently selected for fragmentation, and uses the number of identified MS/MS spectra as a proxy for the abundance. The approach involves minimal signal processing; however, it requires specialized statistical modeling, is limited to finding large changes among abundant proteins, and is most successful with mixtures of low complexity, e.g., for determination of protein complexes [64].
Alternative approaches are based on summarizing signals from quantified spectral peaks. With other technologies such as gene expression microarrays, similar summarization is performed by some form of averaging, e.g., with Robust Multiarray Averaging (RMA) [65]. Unfortunately, averaging fails to produce accurate results in mass spectrometry–based proteomics. Length, charge, and other chemical properties of peptides greatly affect the quality of the signals, and averaging obscures these difference in information content.
A more successful summarization requires probabilistic modeling, which represents all features of a protein and characterizes their variation. A diverse range of such models has been proposed, and there is no single generally accepted procedure. The models differ in using raw or log-transformed intensities, comparing groups in terms of ratios or differences, and using general-purpose [66] or specialized [67] classes of statistical models. Important aspects are accurate representation of the experimental design and of within-run groupings of peaks in label-based workflows, treatment of missing data (e.g., using specialized [68] or general-purpose [69], [70] techniques), incorporating confidence in feature identifications [71], expanding the scope of conclusions to the underlying populations or restricting it to the selected samples [66], and controlling the FDR in the list of differentially abundant proteins. In some cases, e.g., in samples enriched in post-translational modifications, changes in peak intensities can be due to both differential abundance and differential modifications. Comparisons at the feature level are then more appropriate; however, they should be adjusted for the overall changes in protein abundance [72].
Given the diversity of experimental designs and analysis steps, all these tasks can rarely be performed in a fully automated fashion, and consultations with statisticians are highly recommended.
Downstream Analysis
The high-throughput nature of proteomic data is similar to that of gene expression microarrays, and many downstream analysis methods can also be applied in proteomics [73]. In particular, all analyses benefit from data visualization [74]. Unsupervised class discovery helps find functionally related proteins, or biological samples homogeneous with respect to the quantitative protein profiles. Supervised class prediction, e.g., prediction of the disease status of a patient based on his or her protein abundance [75], and its thorough validation [76], are the required steps for discovery of biomarkers of disease.
Enrichment analysis tests whether pre-specified sets of proteins, e.g., those sharing a function, change in abundance more systematically than as expected by chance. This is referred to as pathway analysis when the protein set forms a pathway. The analysis investigates hypotheses that are more directly relevant to the biological function, and can help detect small but consistent changes in abundance within the set. Many enrichment analysis methods exist and are systematically reviewed in [77], [78], and representative examples are the hypergeometric (equivalently, Fisher's exact) test and Gene Set Enrichment Analysis (GSEA) [79]. A particular challenge in proteomics is to map the protein identitifiers to gene-centric knowledge bases. The tools for this task are reviewed in [80], and a representative one is DAVID [81].
A frequently asked question is the correlation between the expression of protein-coding genes and the abundances of the corresponding proteins [82]–[84]. Many studies reported that in bacteria and uni-cellular eukaryotes, proteins and mRNA exhibit moderate correlation in a steady state (Pearson correlation of the order of 0.4), but it improves to the order of 0.6–0.7 for proteins that are directly affected by a relevant condition or a stress [2]. An even lower correlation has been historically reported for multi-cellular eukaryotes; however, technological improvements now also point to a steady state correlation in human samples of the order of 0.4 [85].
The moderate correlation of transcript and protein abundance indicates a major role of post-translational regulation in the activity of the cell. Therefore, the best functional insight can be obtained by combining measurements across technologies, and searching for broader groups of genes, proteins, and metabolites forming regulatory relationships [86], [87]. Such integrative studies are increasingly appearing [88], [89]. They remain challenging, however, due to the complexity of the underlying processes, incomplete sampling of the proteome, uncertainty in protein identities and difficulties of resolving multiple proteomic, genomic, and technological identifiers across platforms. New specialized methods and algorithms are needed to address these challenges.
Outlook
Despite the challenges, mass spectrometry–based proteomics continues to bring high promise for basic science and clinical research [90]. Several studies recently demonstrated that with appropriate care and training, it is now possible to accurately and reproducibly identify and quantify proteins across laboratories and instrument platforms [91]–[93]. In shotgun proteomics, most repeatable peptide identifications corresponded to enzyme-specific cleavage sites, intense MS peaks, and proteins that generated many distinct peptides. Targeted quantification could reproducibly detect low µg/ml protein concentrations in unfractionated plasma.
To date, only 65% of all predicted human proteins have been reliably observed by mass spectrometry [90]. Therefore, future experimental developments will focus on improving the sensitivity, reproducibility, and comprehensiveness of protein identifications, and the sensitivity and accuracy of quantification. All studies consistently emphasize the key role of computation [94]. Future computational efforts will involve the development of proteome-centric knowledge bases such as neXtProt (http://www.nextprot.org/), repositories of experimental data, and the development of methods for optimal experimental design and data interpretation. Venues such as RECOMB Satellite Conference on Computational Proteomics [95] aim at closing the communication gap between biologists, chemists, and statisticians, and enable integrative and collaborative research.
This material was first presented as a tutorial at ISMB 2010 and 2011. We thank the organizers for the opportunity to present the tutorial. We thank O'Reilly Science Art (http://www.oreillyscienceart.com/) for help preparing the figures.
ReferencesBeckMClaassenMAebersoldR2011Comprehensive proteomics.2238de GodoyLMFOlsenJVCoxJNielsenMLHubnerNC2008Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast.4551251OlsenJVBlagoevBGnadFMacekBKumarC2006Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.127635648GavinACBöscheMKrauseRGrandiPMarziochM2002Functional or ganization of the yeast proteome by systematic analysis of protein complexes.415141147CoxJMannM2011Quantitative, high-resolution proteomics for data-driven systems biology.80273299GstaigerMAebersoldR2009Applying mass spectrometry-based proteomics to genetics, genomics and network biology.10617627CastellanaNBafnaV2010Proteogenomics to discover the full coding content of genomes: A computational perspective.7321242135AnsongCPurvineSAdkinsJLiptonMSmithR2008Proteogenomics: needs and roles to be filled by proteomics in genome annotation.75062HanashSTaguchiA2010The grand challenge to decipher the cancer proteome.10652660UhlenMPontenF2005Antibody-based proteomics for human tissue profiling.4384AndersonNLAndersonNG2002The human plasma proteome: history, character, and diagnostic prospects.1845AhrensCHBrunnerEQeliEBaslerKAebersoldR2010Generating and navigating proteome maps using mass spectrometry.11789801CorzettTHFodorIKChoiMWWalsworthVLTurteltaubKW2010Statistical analysis of variation in the human plasma proteome.2010258494ObergALVitekO2009Statistical design of quantitative mass spectrometry-based proteomic experiments.821442156ValledorLJorrínJ2010Back to the basics: maximizing the information obtained by quantitative two dimensional gel electrophoresis analyses by an appropriate experimental design and statistical analyses.74118RansohoffDF2005Bias as a threat to the validity of cancer molecular-marker research.5142HuJCoombesKRMorrisJSBaggerlyKA2005The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales.3322331MallickPKusterB2010Proteomics: a pragmatic perspective.28695709WaltherTCMannM2010Mass spectrometry-based proteomics in cell biology.190491DomonBAebersoldR2010Options and considerations when selecting a quantitative proteomics strategy.28710721DuncanMWAebersoldRCaprioliRM2010The pros and cons of peptide-centric proteomics.28659664MannMMichalskiACoxJ2011More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data dependent LC MS/MS.1017851793OngSEMannM2006A practical recipe for stable isotope labeling by amino acids in cell culture (SILAC).126502660RossPLHuangYNMarcheseJNWilliamsonBParkerK2004Multiplexed protein quantitation in saccharomyces cerevisiae using amine-reactive isobaric tagging reagents.311541169ThompsonASchäferJKuhnKKienleSSchwarzJ2003Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS.7518951904GeigerTWisniewskiJRCoxJZanivanSKrugerM2011Use of stable isotope labeling by amino acids in cell culture as a spike-in standard in quantitative proteomics.6147157RifaiNGilletteMACarrSA2006Protein biomarker discovery and validation: the long and uncertain path to clinical utility.24971983YocumAKChinnaiyanAM2009Current affairs in quantitative targeted proteomics: Multiple reaction monitoring-mass spectrometry.8145157KitteringhamNRJenkinsRELaneCSElliottVLParkBK2009Multiple reaction monitoring for quantitative biomarker analysis in proteomics and metabolomics.87712291239PanSAebersoldRChenRRushJGoodlettDR2009Mass spectrometry based targeted protein quantification: methods and applications.8787797LangeVPicottiPDomonBAebersoldR2008Selected reaction monitoring for quantitative proteomics: a tutorial.4114PicottiPBodenmillerBMuellerLNDomonBAebersoldR2009Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics.138795806PicottiPRinnerOStallmachRDautelFFarrahT2010High-throughput generation of selected reaction-monitoring assays for proteins and proteomes.7436BertschAJungSZerckAPfeiferNNahnsenS2010Optimal de novo design of MRM experiments for rapid assay development in targeted proteomics.926962704GranholmVKällL2011Quality assessments of peptide?spectrum matches in shotgun proteomics.1110861093NesvizhskiiAI2010A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics.7320922123NesvizhskiiAIVitekOAebersoldR2007Analysis and validation of proteomic data generated by tandem mass spectrometry.4787797LamHAebersoldR2011Building and searching tandem mass (MS/MS) spectral libraries for peptide identification in proteomics.54424431JeongKKimSBandeiraNPevznerPA2011Gapped spectral dictionaries and their applications for database searches of tandem mass spectra.10M110.002220DasariSChambersMSlebosRZimmermanLHamA2010TagRecon: high-throughput mutation identification through sequence tagging.917161726VenableJDYatesJR2004Impact of ion trap tandem mass spectra variability on the identification of peptides.769282937CarrSAebersoldRBaldwinMBurlingameAClauserK2004The need for guidelines in publication of peptide and protein identification data.3531KällLStoreyJMacCossMNobleW2008Assigning significance to peptides identified by tandem mass spectrometry using decoy databases.72934HCINA2008False discovery rates and related statistical concepts in mass spectrometry-based proteomics.74750KellerANesvizhskiiAIKolkerEAebersoldR2002Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.7453835392MooreRYoungMLeeT2002Qscore: an algorithm for evaluating SEQUEST database search results.13378386GuptaNPevznerPA2009False discovery rates of protein identifications: a strike against the two-peptide rule.841734181ReiterLClaassenMSchrimpfSJovanovicMSchmidtA2009Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry.82405OlsenJVOngSEMannM2004Trypsin cleaves exclusively C-terminal to arginine and lysine residues.3608614GuptaNHixsonKKCulleyDESmithRDPevznerPA2010Analyzing protease specificity and detecting in vivo proteolytic events using tandem mass spectrometry.1028332844DeutschEWMendozaLShteynbergDFarrahTLamH2010A guided tour of the Trans-Proteomic Pipeline.1011501159AmericaAHPCordewenerJHG2008Comparative LC-MS: a landscape of peaks and valleys.8731749SchulzeWXUsadelB2010Quantitation in mass-spectrometry-based proteomics.61491516MuellerLNBrusniakMYManiDRAebersoldR2008An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data.75161SturmMBertschAGröplCHildebrandtAHussongR2008OpenMS – An open-source software framework for mass spectrometry.9111CoxJMannM2008MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.2613671372AbbatielloSManiDRKeshishianHCarrS2010Automated detection of inaccurate and imprecise transitions in peptide quantification by multiple reaction monitoring mass spectrometry.56291ReiterLRinnerOPicottiPHüttenhainRBeckM2011mProphet: automated data processing and statistical validation for large-scale SRM experiments.8430MacLeanBTomazelaDShulmanNChambersMFinneyG2010Skyline: an open source document editor for creating and analyzing targeted proteomics experiments.26966Cham MeadJABiancoLBessantC2010Free computational resources for designing selected reaction monitoring transitions.1011061126BrusniakMYKKwokSTChristiansenMCampbellDReiterL2011ATAQS: a computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry.127893CallisterSJBarryRCAdkinsJNJohnsonETQianWJ2006Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics.5277286LundgrenDHHwangSWuLHanDK2010Role of spectral counting in quantitative proteomics.73953ChoiHLarsenBLinZYBreitkreutzAMellacheruvuD2010SAINT: probabilistic scoring of affinity purification-mass spectrometry data.870IrizarryRAHobbsBCollinFBeazer-BarclayYDAntonellisKJ2003Exploration, normalization, and summaries of high density oligonucleotide array probe level data.4249264CloughTKeyMOttIRaggSSchadowG2009Protein quantification in label-free LC-MS experiments.852755284GriffinNMYuJLongFOhPShoreS2010Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis.288389KarpievitchYStanleyJTavernerTHuangJAdkinsJN2009A statistical framework for protein quantitation in bottom-up MS-based proteomics.2520282034LiewAWLawNFYanH2010Missing value imputation for gene expression data: computational techniques to recover missing data from available information.12498513AittokallioT2010Dealing with missing values in large-scale studies: microarray data imputation and beyond.11253264LiYFArnoldRJTangHRadivojacP2010The importance of peptide detectability for protein identification, quantification, and experiment design in MS/MS proteomics.962886297WuRDephoureNHaasWHuttlinELZhaiB2011Correct interpretation of comprehensive phosphorylation dynamics requires normalization by protein expression changes.10M111.009654KumarCMannM2009Bioinformatics analysis of mass spectrometry-based proteomics data sets.58317031721GehlenborgNO'DonoghueSIBaligaNSGoesmannAHibbsMA2010Visualization of omics data for systems biology.7S56ClarkeRRessomHWWangAXuanJLiuMC2008The properties of high-dimensional data spaces: implications for exploring gene and protein expression data.83749BoulesteixALSauerbreiW2011Added predictive value of high-throughput molecular data to clinical data and its validation.12215229Emmert-StreibFGlazkoGV2011Pathway analysis of expression data: deciphering functional building bocks of complex diseases.7e1002053doi:10.1371/journal.pcbi.1002053AckermannMStrimmerK2009A general modular framework for gene set enrichment analysis.1047SubramanianATamayoPMoothaVKMukherjeeSEbertBL2005Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.1021554515550HuangDShermanBTLempickiR2009Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.37113HuangDWShermanBTLempickiRA2009Systematic and integrative analysis of large gene lists using david bioinformatics resources.44457de Sousa AbreuRPenalvaLOMarcotteEMVogelC2009Global signatures of protein and mRNA expression levels.515121526MaierTGuellMSerranoL2009Correlation of mRNA and protein in complex biological samples.58339663973NieLWuGCulleyDEScholtenJCMZhangW2007Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications.276375SchwanhäusserBBusseDLiNDittmarGSchuchhardtJ2011Global quantification of mammalian gene expression control.473337342JoyceARPalssonBØ2006The model organism as a system: integrating ‘omics’ data sets.7198210SharanRUlitskyIShamirR2007Network-based prediction of protein function.388101NibbeRKKoyuturkMChanceMR2010An integrative -omics approach to identify functional sub-networks in human colorectal cancer.6e1000639doi:10.1371/journal.pcbi.1000639HuangSSFraenkelE2010Integration of proteomic, transcriptional, and interactome data reveals hidden signaling components.2ra40NilssonTMannMAebersoldRYatesJRIIIBairochA2010Mass spectrometry in high-throughput proteomics: ready for the big time.7681685TabbDLVega-MontotoLRudnickPVariyathAHamA2010Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry.9761776BellADeutschEAuCKearneyRBeavisR2009A HUPO test sample study reveals common problems in mass spectrometry–based proteomics.6423430AddonaTAAbbatielloSESchillingBSkatesSJManiDR2009Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma.27633864AebersoldR2009A stress test for mass spectrometry-based proteomics.6411412BandeiraNNesvizhskiiAMcIntoshM2011Advancing next-generation proteomics through computational research.1028952895