The identification of nontargeted species in environmental and commercial samples by mass spectrometry can be very difficult.
In this article, authors from Eastman Chemical Company describe their systematic approach for the identification of nontargeted
species using nominal and accurate mass data, searching both mass spectral and "spectra-less" databases.
Organic mass spectrometry (MS) has witnessed an extraordinary increase in capabilities this past decade because of major advances
in ionization sources, analyzers, detectors, chromatography, and computer technology. Many of these technological advances
focus on biological applications, a fact plainly evident to attendees of the American Society for Mass Spectrometry's (ASMS)
annual conferences. Yet the significance of this ever-sophisticated technology has not been lost on industrial, environmental,
and forensic mass spectrometrists, whose work involves characterizing commercial chemical products.
Eastman Chemical Company is a global manufacturer of polymers, fibers, coatings, additives, solvents, adhesives, and many
other products. Gas chromatography–mass spectrometry (GC–MS) and liquid chromatography–mass spectrometry (LC–MS) have proven
to be essential for characterizing our company's products and those of other companies. With reasonable effort, we routinely
and reliably obtain mass spectral data from these highly sensitive and yet robust techniques. However, unless the data can
be converted into structural information, it is not useful as a knowledge base to resolve the analytical problem at hand.
In the last 34 years, we developed and refined a systematic process (1,2) for the identification of nontargeted species using
GC–MS and LC–MS analyses. We refer to these types of species as "known unknowns" — that is, species known in the chemical
literature or MS reference databases, but unknown to the investigator. The essence of the process is finding candidate structures
by searching mass spectral databases, Chemical Abstract Services databases, and ChemSpider databases. Figure 1 presents a
simplified flowchart of the overall process; the subsequent sections discuss individual steps and illustrate three examples
in the identification of known unknowns.
Figure 1: Simplified flowchart for identifying "known unknowns." MF = molecular formula and MW = molecular weight.
Computer-Searchable Mass Spectral Databases
The first step in the process is computer searching of spectra against mass spectral databases. This approach (3) is very
powerful and efficient for the identification of unknowns typically requiring 3–5 s for each component in a mixture. Electron
ionization (EI) databases are used for identifying compounds in GC–MS analyses, and collision-induced dissociation (CID) databases
are used for LC–MS analyses. The databases are purchased from commercial sources or are created from compounds characterized
at our company (see Table I).
Table I: Spectra with associated structures searched with NIST search software
The results of the EI mass spectral searches are normally more successful than CID searches for two reasons. First, the number
of entries in EI databases for GC–MS is approximately 10 times larger than that for CID databases for LC–MS. Second, 70-eV
EI spectra are much more reproducible than CID spectra, which can vary significantly depending on instrument design and user-specified