THE PRE-PROTEOMICS ERA
- Genaro Pimienta
- Jun 2, 2024
- 3 min read
Updated: 10 hours ago
But mass spectrometry-based proteomics (MS-proteomics), as we know it, was not born in 1995, which is when the term proteome first appeared in the literature. The sample preparation methods and sequencing principles, which we use in MS-proteomics date back to the 1950s.
What changed? Since the early 200s, MS-proteomics relies on high resolution mass spectrometers, instead of an EDMAN sequencer to sequence peptides and proteins.
In this blog you will learn the following:
Many of the sample preparation approaches developed initially for EDMAN-based protein sequencing eventually became core components of MS-proteomics workflows.
The availability of genome sequence databases and the bioinformatics tools developed for the case, have had a transformational impact on MS-proteomics data analysis.
For better understanding, the text below is divided in four sections:
Pre-history of MS-proteomics
Legacy MS-proteomics workflows
MS-proteomics in the genomic era
MS-proteomics in the post-genomic era
1. PRE-HISTORY OF MS-PROTEOMICS
We travel back to the early 1950s, which is when Fred Sanger and Pehr Edman independently developed methodologies for protein amino acid sequencing.
Each of these inventions went onto impact initial efforts to sequence genes and proteins in different ways.
Sanger’s protein sequencing methodology incidentally became the foundation of the so-called “Sanger DNA sequencing” technology for gene nucleotide sequence determination.
Edman’s approach on the other hand, was eventually automated and became the working horse in protein sequencing for years to come.
EDMAN-based protein sequencing has two major limitations:
Low throughput ⎯only one highly purified protein can be sequenced at a time.
Indirect protein inference ⎯like in MS-proteomics, protein identity is inferred indirectly. Proteins must be cut into positively charge peptides ~40 amino acids long. To obtain a protein’s full sequence, multiple overlapping peptides must be sequenced.
2. LEGACY MS-PROTEOMICS WORKFLOWS
There are two approaches in MS-proteomics. The bottom-up approach has been until very recently the most common workflow in high-throughput MS-proteomics. The top-down approach, in which intact proteins are analyzed, has traditionally been limited by mass spectrometer mass range and isotope level resolving power.
Most of the protein/peptide separation and chemical modification approaches currently used in MS-proteomics, are adaptations of workflows developed in the EDMAN era. The difference is the modern instrumentation used in MS-proteomics.
Reasons:
In EDMAN and the bottom-up MS-proteomics approach, positively charged peptides ~40 amino acids long, are used as surrogates for protein identification.
In the case of top-down proteomics, the painstaking protein separation principles worked out during the EDMAN era, have been adapted to MS-proteomics.

Figure 1. Legacy proteomics workflows
The use of two-dimensional gel electrophoresis for protein purification, followed by protein spot excision and peptide chemical modification ⎯trypsin and/or Lys-C and Cysteine reduction/alkylation⎯ is a workflow originally developed for EDMAN-based protein sequencing. The same workflow was later adapted to bottom-up MS-proteomics.
3. MS-PROTEOMICS IN THE GENOMIC ERA
The genomic era can be said to have started in the late 1970s, when a team led by Fred Sanger published the DNA sequence of bacteriophage phi X174.
High-throughput gene sequencing technology continued to evolve, and by the mid-1990s several small genomes were reported.
Haemophilus influenzae genome (1995)
Saccharomyces cerevisiae (1996)
Escherichia coli (1997)
The MS-proteomics field benefited from the emergent availability of gene sequence databases, which catalyzed the successful development of the first computational tools for the automated analysis of MS-proteomics data.
FRAGFIT (1993)
PeptideSearch (1994)
SEQUEST (1994)
MS-proteomics data could now be analyzed in an automated fashion, provided a reference database of protein sequences inferred from their gene counterparts.
4. MS-PROTEOMICS IN THE POST-GENOMIC ERA
The turning of the century was marked by the completion of genomes important model organisms.
Arabidopsis thaliana (2000)
Drosophila melanogaster (2000)
Mus musculus (2002)
Homo sapiens (2003)
Genome browsers and other genomics bioinformatics tools, developed at the time, catalyzed a boom in peptide search engine development, for database-dependent MS-proteomics data analysis.
Sequest ⎯the first peptide search engine reported⎯ was optimized in the coming years and went on to become the prototype for most peptide search engines developed in last two decades 2000-2020.
In the following blogs, we will dive deeper into two important subjects:
Quantitative MS-proteomics
MS-proteomics data analysis.
Stay tuned!
GPR
Disclosure: At BioTech Writing and Consulting we believe in the use AI in Data Science, but do not use AI to generate text or images.
Comments