Mass Spectrometry Advances Fossilomics

March 1, 2011

Special Issues

Volume 0, Issue 0

Fossilomics uses MS to extract amino acid sequence information from subpicomole quantities of protein and peptide fragments that remain in certain fossil samples. The sequences are compared to databases and validated with search statistics and high-confidence sequences. The validated sequences can then be used to place the fossils on the evolutionary tree.

Cutting-edge mass spectrometry technology provides further evidence that it is possible to identify collagen in 80-million-year-old fossils.

Advances in mass spectrometry (MS) have the potential to revolutionize the understanding of how ancient and extinct species evolved over millions of years. Today, analyses of genetic material, such as DNA and RNA, are used to place organisms on the evolutionary tree. Unfortunately, DNA and RNA are very fragile and present in very low amounts compared to other tissues. Obtaining DNA or RNA from samples of ancient and extinct species can be extremely difficult, if not impossible.

Liquid chromatography–mass spectrometry (LC–MS) is used extensively for protein identification. It is often used to identify and validate drug targets and biomarkers of disease that could prove helpful in early detection and treatment, which is one of the projects of the laboratory at the Beth Israel Deaconess Medical Center (BIDMC, Boston, Massachusetts). Common to the study of both ancient fossils and human samples is that the proteins of interest can be present in miniscule amounts, which is a significant obstacle to determining their sequence. As fossils age, protein concentrations fall below the limits of detection of all but the most advanced mass spectrometers. The ability to correctly and more completely sequence degraded protein is also dependent on new mass spectrometer designs with enhanced mass accuracy.

Recently, a new MS-based approach to placing ancient organisms on the evolutionary tree was developed — "fossilomics," leveraged from proteomics research to develop tests for cancer and other diseases. Fossilomics uses cutting-edge MS approaches to extract amino acid sequence information from subpicomole quantities of protein and peptide fragments that, given the right conditions, remain in certain fossil samples. The sequences are compared to databases of existing sequences and validated with search statistics and high-confidence sequences. The validated sequences can then be used to place the fossils on the phylogenetic tree — the tree of evolution.

This approach, first demonstrated when collagen was identified in the bones of a 68-million-year-old Tyrannosaurus rex (T. rex), linked dinosaurs more closely with birds than reptiles (1–3). The results were met with significant skepticism. Current thinking is that soft tissue, DNA, proteins, and other biological material is completely replaced by minerals in fossils dating over 1 million years. Respected researchers expressed concern that the T. rex samples were contaminated with protein from other sources. Others raised concerns that the statistical evidence for the peptide sequence matches was weak.

New, More Stringent Study Addresses Controversy

To address these criticisms, the experiments were repeated on an older dinosaur fossil found in a completely different location. More-rigorous controls were applied to prevent and detect contamination and an advanced MS approach was used to make better sequence calls. Additional statistical analysis and validation of data was also undertaken. In this study, published in the May 1, 2009, issue of Science, collagen was identified in 80-million-year-old fossilized Brachylophosauras canadensis (B. canadensis) bone (4). The results complemented the first study, providing supporting evidence that dinosaurs and birds are evolutionarily related. More exciting, they provide further evidence that proteinaceous material can be preserved in very ancient fossilized bone and tissue.

To address contamination concerns, more stringent handling protocols were implemented, including wrapping the fossil and surrounding sediments in a plaster to be sawed off and prepared in a controlled environment at the laboratory (Figure 1). Field-emission scanning electron microscopy (FESEM) revealed possible vestiges of bone cells, blood cells, and vessels entwined within a fibrous structure that looked like collagen. Antibody analyses confirmed that collagen and other proteins were present. Control samples including buffer and sediment did not contain collagen.

Figure 1: B. canadensis fossil wrapped with surrounding sediment in plaster to protect it and avoid contamination before going to the laboratory. (Photo courtesy of M.H. Schweitzer.)

Before MS analysis, whole bone extracts were enzymatically digested, purified, and then concentrated using micro-reversed-phase chromatography. Next, the samples were analyzed by reversed-phase microcapillary liquid chromatography–tandem mass spectrometry (LC–MS-MS) using linear ion trap and orbitrap hybrid mass spectrometer systems (LTQ and LTQ Orbitrap XL MS systems, respectively, Thermo Fisher Scientific, San Jose, California). The resulting mass spectra were searched against the reversed Swiss-Prot database of protein sequences using the SEQUEST (University of Washington, Seattle, Washington) and Mascot (Matrix Science Ltd., Boston, Massachusetts) search algorithms accessed using Proteome Discoverer software (Thermo Fisher Scientific). Its multiple-database search capability enabled the application of multiple search algorithms and the combination of their outputs to cross-validate results.

Eight collagen peptide sequences were identified, totaling 149 amino acids, almost twice the amount extracted from the T. rex fossil, but still less than 10% of the size of a full-length collagen sequence. The sequences were validated using scoring statistics, decoy databases, manual inspection of spectra, spectral comparisons to high-confidence spectra from synthetic peptides, and peptides of identical sequences from existing organisms.

The eight peptides identified as collagen were the top ranked matches (Table I) other than common laboratory contaminants such as keratins. All validation and calculation of false discovery rates and expectation values were based on the reversed Swiss-Prot database searches using SEQUEST and Mascot. All of the sequences are high-confidence and their spectra produced Mascot expectation values of less than one. The expectation value is the number of times one would expect to get that match score or better by chance. A completely random match has an expectation value of one or more, so the better the match the smaller the expectation value.

Table I: Collagen sequences acquired by linear ion trap and hybrid mass spectrometers, identified via spectral database search. Sequence validation method is also shown.

Four of the B. canadensis spectra yielded very high-confidence matches to existing protein databases and statistical validation was sufficient to identify them. The four lowest scoring (Mascot score) MS-MS spectra are shown in Figure 2. These were subjected to additional validation with the MS Search 2.0 spectral comparison algorithm from the National Institute of Standards and Technology (NIST, Gaithersburg, Maryland) against a database of more than 200,000 random peptide fragmentation spectra from various taxa. Included in this search were high-confidence versions of MS-MS spectra from four collagen sequences, ostrich, and synthetic peptides, which were added to the NIST spectral database. The mass-to-charge ratios and relative intensities of the fragment ions of the high-confidence versions of the same sequences were the best matches to the four spectra from the B. canadensis sequences (Figure 2).

Figure 2: Collision-induced dissociation (CID) product ion spectra acquired using the hybrid mass spectrometer were validated with high-confidence versions of the same sequences using the spectral comparison tools of MS Search 2.0 software from NIST. The mass-to-charge ratios and relative intensities of the fragment ions of the high-confidence versions of the sequences were the best matches to the mass spectra of the B. canadensis sequences.

As shown in Table I, sequence identities were associated with species using basic local alignment search tool (BLAST) searches of the all-species National Center for Biotechnology Information (NCBI, Bethesda, Maryland) protein database. Later phylogenetic analysis of the sequences was performed based on Bayesian algorithms (BayesPhylogenies) to place B. canadensis on the tree of life.

Advantages of LC–MS Systems

The linear ion trap mass spectrometer produced all the sequences for the previous T. rex study and approximately half for the new study of B. canadensis. Though the linear ion trap system used was very sensitive, the ultrahigh mass resolution and sub-2-ppm mass accuracy of the hybrid mass spectrometer used here enabled more sequences to be produced with confidence. The "instrument rank" column in Table I shows the instrument that produced the sequence that was the best match (a top hit with the reversed Swiss-Prot database search.) In some cases, the mass accuracy of the hybrid mass spectrometer showed that the top ranked sequence generated by the linear ion trap mass spectrometer was not correct, and that the rank 2 sequence was the correct sequence.

Subtle differences in mass can make sequence calls difficult to make. Hydroxyproline (Hyp), a major component of the collagen that plays a role in its stability, is produced by hydroxylation of the amino acid proline by the enzyme prolyl hydroxylase following protein synthesis. Hyp residues are only 23 ppm different than isoleucine/leucine (Ile/Leu) residues. The hybrid mass spectrometer's high mass accuracy enabled improved interpretation of the sequence GLPGESGAVGPAGPP(OH)GSR, shown at the bottom of Table I. Hydroxyproline is the more accurate identification than isoleucine/leucine at sequence position 15 by 0.3064 Da. As shown in Figure 3, the hybrid mass spectrometer resolved a similarly difficult sequence call. The correct sequence, GETGPAGPAGPP(OH)GPAGAR, is also shown in Table I.

Figure 3: Analysis of a hadrosaur collagen peptide using a hybrid mass spectrometer.

The multiple collision modes available on the hybrid mass spectrometer, such as collision-induced dissociation (CID) and higher-energy collisional dissociation (HCD), provided structural information that helped resolve close sequence calls and increased confidence in sequence identifications. As shown in Figure 4, we took advantage of HCD to distinguish between hydroxylproline and isoleucine/leucine to identify the correct peptide sequence.

Figure 4: Analysis of a collagen peptide using a hybrid mass spectrometer with higher-energy collisional dissociation to distinguish between hydroxyproline and isoleucine/leucine residues.

Looking to the Future

These results open the door to further research on many other fossil types. Studies of fossils from the Miocene era (fossils 1–20 million years old) are planned next to bridge the gap between younger, 160,000- to 600,000-year-old fossils that present many collagen sequences, and ancient, multimillion-year-old dinosaurs that show few sequences. In addition, MS-based sequencing will be used to phylogenetically place unidentified fossil fragments that are too old for DNA sequence analysis. Advanced hybrid and linear ion trap MS will play a key role in these studies. And, as MS instrumentation continues to advance, the technique is expected to generate much longer sequences that will reveal more about dinosaurs and other extinct species. Ultimately, it may be able to phylogenetically classify all dinosaur species, which today is only possible by analyzing bone morphology. It is truly exciting to participate in the dawn of fossilomics.

John M. Asara, PhD, is the Director of the mass spectrometry core facility at the Beth Israel Deaconess Medical Center (BIDMC) and Assistant Professor of Medicine, Harvard Medical School, Boston, Massachusetts.


(1) J.M. Asara, M.H. Schweitzer, L.M. Freimark, M. Phillips, and L.C. Cantley, Science 316(5822), 280–285 (2007).

(2) M.H. Schweitzer, Z. Suo, R. Avci, J.M Asara, M.A. Allen, F.T. Arce, and J.R. Horner, Science 316(5822), 277–280 (2007).

(3) C.L. Organ, M.H. Schweitzer, W. Zheng, L.M Freimark, L.C. Cantley, and J.M. Asara, Science 320(5875), 499 (2008).

(4) M.H. Schweitzer, W. Zheng, C.L. Organ, R. Avci, Z. Suo, L.M. Freimark, V.S. Lebleu, M.B. Duncan, M.G. Vander Heiden, J.M. Neveu, W.S. Lane, J.S. Cottrell, J.R. Horner, L.C. Cantley, R. Kalluri, and J.M. Asara, Science 324(5927), 626–631 (2009).