What are the steps to take once an outlier is discovered? There are several options.
This column is the continuation of our previous installments dealing with the question of outliers. Here we consider what to do about an outlier, once one is detected.
You have developed a set of data and have a reading that is suspected to be an outlier. You have applied one or more of the tests described in our previous column (1) and confirmed that the reading is, indeed, an outlier. Now what do you do?
There are three actions you might consider taking:
These possible actions could be considered the statistical approaches to dealing with the outlier. Other approaches can also be considered, and, depending on the circumstances, might be preferred. For example, if the outlier arises when calibrating a spectrometer for quantitative analysis (using chemometrics, of course, which is the default activity for all our columns), then the origin of the outlier could be either in the reference laboratory values or among the instrumental values. In either case, an alternative (nonstatistical) approach would be to identify the source of the discordant value (instrument or laboratory), and then engage in investigation of the chemistry, physics, and background of the readings to find the fundamental cause. In a pharmaceutical context, for example, this is what would be called root cause analysis.
A variant of this approach is to concentrate all of one's attention on just the outliers and ignore the rest of the data, with a view toward learning some new fundamental science, the rationale being that some new and unexpected effect has created the discordancy in the values. An outlier can be an indicator of a scientific accident, of the sort that has led to discoveries such as quinine, the smallpox vaccine, X-rays, insulin, penicillin, Teflon, and the cosmic microwave background (2-4) (Don't hold your breath for this, but it does happen!). After all, to a lesser degree, we saw previously (5) that the presence of a set of outliers that persisted through changes of data transformations and other manipulations of the data was indicative of a previously unsuspected systematic effect influencing the data. The discovery of this effect gave us much information about the meaning of calibration transferability, and how to achieve it. General scientific principles come into play here. Is the effect reproducible? Can it be created or avoided at will? Does it lead to predictions of new phenomena? Is there a causal link between the discordant data and fundamental physics, chemistry, biology, or math? Can you do a controlled experiment (or more generally, what is sometimes loosely called a statistical experimental design)? Is there a theory (from another science, such as chemistry, physics, biology) that can explain the findings? Do other scientists get similar results, or at least have you had someone check your work? Here we will avoid departing from our mission of describing the chemometric and statistical effects on the data, and so we will not pursue those other topics. However, the reader should keep these alternate considerations in mind; after all, not everything has a statistical explanation! We will now consider the possibilities listed earlier.
In the context of using chemometric calibration to perform chemical analysis, the most common action taken when an outlier is identified is to delete the discordant reading. As we saw in the discussion of the nature of outliers (6), however, it seems likely that samples summarily rejected this way are sometimes (perhaps even often) misidentified as outliers, when in actuality, they are simply the extreme samples of the distribution.
Alternatively, as described above, the distribution could be other than expected. If an outlier is identified based on an expectation of a normal (or at least a symmetric) distribution but the actual distribution is exponential, or χ2, or some other asymmetric distribution, then large values are to be expected and summary deletion is obviously an incorrect action. Another incorrect action would be to deal with the data as though the assumption of normality applied.
A consequence of deleting outliers is that the data no longer represent a random sample. Given that the discordant observation is, almost by definition, extreme in one characteristic or another, deleting the outliers is a form of censorship of the data. In extreme cases, where multiple outliers are deleted from a data set, deleting the outlier is tantamount to fitting the data to the calibration model, rather than a model to the data. If more than two outliers appear to need deletion, then it is more appropriate to reconsider whether they are actually outliers of the data set, or evidence of samples from a different population than the majority of the samples, having been mixed into the data.
A careful statistician will never simply delete observations, even with good reason. Even when some observations must be deleted from a data set, a careful worker will keep a record of the deleted samples: which ones they were, their values, and the rationale for deleting them. This record will allow the purported outlier to be added back into the dataset, if it turns out that the act of deleting it was erroneous. A set of good recommendations is available in short articles by Tom Fearn (7,8), describing common sense methods of evaluating suspected outliers and making good decisions about removing them from a data set.
For example, manually entered data is notorious for being rife with data entry and data transcription errors. Modern spectroscopic instruments, even low-end ones, generally transfer data to the analysis computer electronically, so this sort of error should not occur. Reference laboratory values, on the other hand, may well be subject to being manually copied, or entered into a database, or transferred to the analysis computer by manual entry, possibly through multiple steps involving manual handling. Typos and other types of errors can occur under these conditions.
Data transformations are common in spectroscopic calibration practice. This is often done in an ad hoc manner, using the shotgun method of trying every available or conceivable data transform until one of them works in some manner. While we can't completely denigrate this approach (if for no other reason than that we've used it ourselves on occasion in the past), we do wish to discourage it, and thereby introduce more science into what is currently much more of an art of calibration. Therefore when we discuss a technique, or a method, or an algorithm, we do so with the expectation (or at least the hope!) that it will be used only under the specific conditions for which it is suited, to achieve a specific predefined result.
We noted in a prior column (9) that one reason to transform data is to convert the data from having a log-normal distribution to having a normal distribution by taking the logarithm of that data. In chemometric practice, this would typically correspond the reference laboratory values. Another scenario warranting a transformation of the data is when the error of the reference laboratory measurement is not constant across the range of the analyte, but increases in proportion to the analyte value; that is, the "relative error" (instead of the absolute error) of the reference method is constant, as happens in some reference laboratory methods. In this case, also, the reference values should be replaced with their logarithms; then the errors of those reference logarithm values will be constant, and then when the calibration calculations are performed, those logarithms are used as the dependent variables, instead of the original reference values.
There is nothing wrong with transforming the reference laboratory values to their logarithms, performing the calibrations for these new (reference laboratory) values, and creating a calibration model for those new reference laboratory values. In using the model for analyzing future unknown samples, the computation will result in the transformed values being predicted by the model. The user must then remember to back-transform the result into the original domain of the data. The user must also take care to ensure that the data transform did not introduce an unwanted nonlinearity into the data.
Another transform of the data is an extension of standard good statistical practice, to collect data from multiple aliquots (at least two, although three is preferred, and may be required in some cases, as will be described). If one of the multiple readings is found to be an outlier, then it should be compared with the other readings. If the readings differ by an amount comparable to the magnitude of the outlier, then it seems clear that one of the readings is incorrect, although it may not be obvious which one is wrong. If data from three or more replicates were collected, however, then a "vote" can be taken; the two that agree are most likely correct, and the one that disagrees, especially if it is the discordant reading, is probably wrong. It may not be clear why that reading is different, but this may be a case where a type of root cause analysis, as described above, might be warranted. If none of the three (or more) readings differs noticeably from the others, then it might be appropriate to average all the like readings together, prior to performing the calibration calculations. The data may also be "windsorized" by averaging together the highest and lowest values with their respective nearest neighbors, as a way to reduce the influence of a possible outlier. This may be ineffective if the data is multivariate, since the outlier may not correspond to any individual extreme reading.
Accommodation of an outlier is not necessarily a way to identify it. It is often a technique for reducing or eliminating the effect of a possible outlier on the results of the calculations, whether any outliers are present or not.
There are many methods of accommodating outliers. One that is rarely used but probably should be used more commonly, is to use what are called robust methods of analysis. These robust methods replace our usual methods of calculation with methods that produce essentially the same results, but are resistant to the effects of outliers. The simplest example of a robust method is the use of the median instead of the mean, as the measure of the central value of a set of data. Given that the value of the median depends almost entirely on the numbers at or near the center of the data, the values at the extremes are virtually immaterial. The largest and smallest values can be correct or off by a large amount, but that difference has no influence at all on the value of the median. Contrast that with the mean, where the influence of the extreme values increases as the square of the difference between the mean and the values themselves, so that the incorrect extreme value has undue influence on the results. There are robust equivalents to most of the common chemometric methods that are popular, including multiple linear regression (MLR) and principal component regression (PCR).
A somewhat less extreme way of accommodating the (potential) outliers is to use weighted calculations. This approach permits the use of the standard algorithms and methods of analysis. The key here is the fact that outliers are, in general, the extreme observations of some characteristic of the data. This being the case, the weighted variations of the standard algorithms are used, with the weights corresponding to the degree of extremeness of the pertinent characteristic: the more extreme the value of the characteristic, the lower the weight assigned to the sample containing that extreme value. This approach has the advantages of retaining and using all the data, while still minimizing the effect of the outlier on the results. Equations for weighted as well as unweighted calibration algorithms for implementing MLR, PCR, and partial least squares (PLS) algorithms are described in ASTM standard E1655-05 (10).
As noted at the beginning of this column, deletion of an outlier is the most common way that most spectroscopists know to deal with them. As we have seen, however, there are a number of alternate methods available to take outlier readings into account in spectroscopic analysis. Add that to the fact that in much calibration work, a great excess of enthusiasm is shown in the deletion of purported or even suspected outliers, so we hope that having alternate methods available to deal with them can reduce the overenthusiasm for deleting outliers and preserve data sets.
(1) H. Mark and, J. Workman, Spectroscopy 33(2), 24-38 (2018).
(2) L. Krook, Nova, February 27, 2001 (http://www.pbs.org/wgbh/nova/body/accidental-discoveries.html, accessed September 14, 2018).
(3) "The History of Teflon," https://www.chemours.com/Teflon/en_US/products/history.html.
(4) APS News, 11(7), July 2002. https://www.aps.org/publications/apsnews/200207/history.cfm.
(5) H. Mark and J. Workman, Spectroscopy 32(6), 22–25 (2017).
(6) H. Mark and J. Workman, Spectroscopy 28(2), 24–37 (2013).
(7) T. Fearn, NIR News 27(5), 25 (2016).
(8) T. Fearn, NIR News 27(6), 24–25 (2016).
(9) H. Mark and J. Workman, Spectroscopy 33(6), 22–26 (2018).
(10) E1655-05, A.; American Society for Testing and Materials, West Conshohocken, PA, USA.
Jerome Workman Jr. serves on the Editorial Advisory Board of Spectroscopy and is the Senior Technical Editor of LCGC and Spectroscopy. He is also an adjunct professor at National University in La Jolla, California. He has previously held a number of senior R&D positions within instrumentation and biotechnology companies.
Howard Mark serves on the Editorial Advisory Board of Spectroscopy and runs a consulting service, Mark Electronics, in Suffern, New York. Direct correspondence to: SpectroscopyEdit@UBM.com