In our last few columns (1–4), we discussed shortcomings of current methods used to assess the presence of non-linearity in
data, and presented a new method that addresses those shortcomings. This new method is statistically sound, provides an objective
means to determine if non-linearity is present in the relationship between two sets of data, and is inherently suitable for
implementation as a computer program.
Jerome Workman Jr. & Howard Mark
A shortcoming of the method presented is one that it has in common with virtually all statistical tests: while it provides
a means of unambiguously and objectively determining the presence of non-linearity, if we find that non-linearity is present,
it does not address the question of how much non-linearity is present. This column, therefore, presents results from some computer experiments designed to assess a method
of quantifying the amount of non-linearity present in a data set, assuming that the test for the presence of non-linearity
already has been applied and found that indeed, a measurable, statistically significant degree of non-linearity exists.
The spectroscopic community, and indeed, the chemical community at large, is not the only group of scientists concerned with
these issues. Other scientific disciplines also are concerned with ways to evaluate methods of chemical analysis. Notable
among them are the pharmaceutical communities and the clinical chemistry communities. In those communities, considerations
of the sort we are addressing are even more important, for at least two reasons:
- These disciplines are regulated by governmental agencies, especially the Food and Drug Administration. In fact, it was considerations
of the requirements of a regulatory agency that created the impetus for this series of columns in the first place (1).
- The second reason is what drives the whole effort of ensuring that everything that is done, is done "right." An error in an
analytical result can conceivably, in literal fact, cause illness or even death.
Thus, the clinical chemistry community also has investigated issues such as the linearity of the relationship between test
results and actual chemical composition, and an interesting article provides the impetus for creating a method of assessing
the degree of non-linearity present in the relationship between two sets of data (5).
Degree of Non-linearity
The basis for this calculation of the amount of non-linearity is illustrated in Figure 1. In Figure 1a, we see a set of data
showing some non-linearity between the test results and the actual values. If a straight line and a quadratic polynomial both
are fit to the data, then the difference between the predicted values from the two curves gives a measure of the amount of
non-linearity. Figure 1a shows data subject to both random error and nonlinearity, and the different ways linear and quadratic
polynomials fit the data. As shown in Figure 1a, at any given point, there is a difference between the two functions that
represents the difference between the Y-values corresponding to a given X-value.
Figure 1b shows that irrespective of the random error of the data, the difference between the two functions depends only upon
the nature of the functions and can be calculated from the difference between the Y-values corresponding to each X-value. If there is no non-linearity at all, then the two functions will coincide, and all the differences will be zero. Increasing
amounts of non-linearity will cause increasingly large differences between the values of the two functions corresponding to
each X-value, and these can be used to calculate the nonlinearity.