Howard Mark

Howard Mark

Howard Mark serves on the Editorial Advisory Board of Spectroscopy and runs a consulting service, Mark Electronics that provides assistance, training, and consultation in near-IR spectroscopy as well as custom hardware and software design and development.

Articles by Howard Mark

This column addresses the issue of degrees of freedom (df) for regression models. The use of smaller degrees of freedom (df) (e.g., n or n-1) underestimates the size of the standard error; and possibly the larger df (e.g., n-k-1) overestimates the size of the standard deviation. It seems one should use the same df for both SEE and SECV, but what is a clear statistical explanation for selecting the appropriate df? It is a good time to raise this question once again and it seems there is some confusion among experts about the use of df for the various calibration and prediction situations - the standard error parameters should be comparable and are related to the total independent samples, data channels containing information (i.e., wavelengths or wavenumbers), and number of factors or terms in the regression. By convention everyone could just choose a definition but is there a more correct one that should be verified and discussed for each case? The problem with this subject is in computing the standard deviation using different df without a more rigorous explanation and then putting an over emphasis on the actual number derived for SEE and SECV, rather than on using properly computed confidence intervals. Note that confidence limit computations for standard error have been discussed previously and are routinely derived in standard statistical texts (4).

We present the first of a short set of columns dealing with the subject of statistics. This current series is organized as a “top down” view of the subject, as opposed to the usual literature (and our own previous) approach of giving “bottom up” description of the multitude of equations that are encountered. We hope this different approach will succeed in giving our readers a more coherent view of the subject, as well as persuading them to undertake further study of the field.

The archnemesis of calibration modeling and the routine use of multivariate models for quantitative analysis in spectroscopy is the confounded bias or slope adjustments that must be continually implemented to maintain calibration prediction accuracy over time. A perfectly developed calibration model that predicted well on day one suddenly has to be bias adjusted on a regular basis to pass a simple bias test when predicted values are compared to reference values at a later date. Why does this problem continue to plague researchers and users of chemometrics and spectroscopy?