News|Articles|May 20, 2026

Development of an Algorithm That Minimizes/Eliminates Sampling Repack Variation

Listen
0:00 / 0:00

Key Takeaways

  • Repack/particle-size effects create systematic reflectance changes between nominally identical sub-samples, even after grinding, confounding NIR quantitative models and limiting repeatability for heterogeneous natural products.
  • Common “fixes” like derivatives may reduce SEP yet cannot be assumed to reduce repack variance, because SEP conflates reference error, noise, environment, optics, and sample presentation effects.
SHOW MORE

In their milestone 250th column, Howard Mark and Jerome Workman, Jr. describe a mathematically rigorous algorithm that minimizes or eliminates sampling repack variation in near-infrared spectroscopy. The method separates systematic spectral changes caused by sample rearrangement from true compositional information, enabling more robust calibration models and significantly improving analytical repeatability for powdered and heterogeneous solid samples.

On this 250th Anniversary of America year, we publish our 250th column in over 40 years on the topics of statistics, chemometrics, and artificial intelligence (AI). As an analytical method, near-infrared (NIR) spectroscopy has suffered from the inherently randomized nature of powdered solid samples. These variations give rise to differences in the optical readings when multiple sub-samples are measured. This effect results in differences in the analytical answers produced by the measurements. Neither physical theories of fundamental optics nor optical theories based on diffusion properties of light have been able to adequately explain the phenomenon. Ad hoc methods have enabled the technology to enjoy commercial success. Based on an advanced well-founded mathematical approach, we have developed an algorithm that enables the creation of calibration models that include mathematically justified corrections for the systematic variations that represent the effect of repack differences of the spectra.

Introduction - Our 250th Column!

Ever since Karl Norris invented the analytical technology (exact dates unknown but are contemporaneous with or precede some of the referenced documents), that we know in modern times as “NIR Analysis” the technique has been plagued with an effect variously known as “repack effect” or “particle size effect.” These are labels used to describe the phenomenon wherein powdered solid samples do not present the same surfaces to the measurement device when a sample is removed from the sample holder and then replaced, resulting in systematic yet uncontrollable changes of the reflected radiation.1-5 An example can be seen in Figure 1, which illustrates the phenomenon using spectra of two repacks from each of 31 ground samples of hard red wheat. Certainly, by 1976 when Howard joined the NIR development group at Technicon Instrument Corp., the phenomenon was already considered a serious roadblock to the understanding of how this effect could change the ability of an instrument to be able to repeat a set of optical measurements and thereby obtain the same analytical answer for the composition of a sample, when the same sample is measured multiple times.

In fact, at Technicon Instrument Corp.(one of the first companies to commercialize Karl’s technology), the problem posed by this effect was described along with a summary of the then-current methods available to try to mitigate it: averaging multiple repacks, spectral derivatives (dA/dλ), and others. These discussions ended with a wish list: "Ideally we would like to have a calibration algorithm that would be insensitive to the spectral changes caused by repack effects but still would be sensitive to the changes in the spectrum due to real changes in the sample composition." At that time and to this day, no such algorithm was known, despite attempts to apply more complex calibration methods, for example, principal components analysis (PCR), or partial lest squares (PLS).

The phenomenon of diffuse reflection had been studied for a long time (references can be traced back to Schuster’s work in 19052,3 in other contexts (paints, inks, dyes), and several books have been written attempting to relate optical, chemical and physical properties of a material to the reflectance of a powdered sample,6,7 and most recently, a good review of the previous work has been compiled.8

Simultaneously, the science of Statistics morphed into a more advanced version of itself, now called Chemometrics,9,10 and tried to address the problem in another way: by empirically using the measured optical properties of a sample to ascertain the composition of that sample. Both approaches enjoyed empirical success in relating the optical properties of powdered (or otherwise inhomogeneous) samples to their chemical composition. Several compendia of empirical studies are available,11-13 making the underlying technology commercially successful. That success, however, is only partial because the science behind it all contains many gaps, making our knowledge highly incomplete.

One of the biggest gaps, and the one this article is concerned with, is the question of describing repack variability well enough to enable a user to remove the effect to reliably be able to calibrate an instrument to measure subsamples, or even re-measure the same sample multiple times and get the same final answer. The various references that we’ve already mentioned describe several approaches that can be traced back to the pioneering theoretical work of Schuster.3 Theories based on treating the samples as a continuum(with various properties), theories based on treating the samples as a collection of discrete particles (also with various properties), as points, one-dimensional or two-dimensional particles, and innumerable other variations. All of them “work” to some degree, for the sample type they were designed for, but seem to have been unable to be generalized to describe any randomly chosen type of sample.The most recent attempt to develop a method on this basis, called by the father-and-son team of developers of the “Representative Layer” theory, is described in their book14 as well as in several refereed publications.15-24 Additionally, another scientist had unknowingly been working in parallel on a similar although different, approach.25 Thomas Meyerhofer was attempting to relate fundamental physical properties (refractive index, polarizability) of clear solutions to the measurable optical properties (mainly absorbance) that had enormous commercial impact.

Independently, although more recently than those studies, following Karl Norris’ development of multi-wavelength NIR measurements to measure important constituents of wheat and other natural products, there has been a surge of interest in using optical measurements combined with sophisticated mathematical algorithms to unravel the complexities of natural products and relate the underlying properties of those natural products to the composition of the product. The Chemometrics community and the discipline it fostered, developed independently from the problems it sought to solve, and learned from and applied similar algorithms to different sets of problems, for example (archaeometrics, biometrics, econometrics, sabermetrics, sensometrics, stylometrics, sociometrics, technometrics). Each discipline, of course, found similar algorithms to be useful for their own problem types.

An important application (arguably the most important application) of chemometrics is its application to the measurement of the composition and other properties of natural products, especially foodstuffs (more people eat than do anything else!). The application to many of these types of products which require the sample (virtually all of which are solids) to exist (or be prepared for measurement) largely in the form of powder (or otherwise comminuted). Agricultural products consist in such foodstuffs as (grains, rice, meat, eggs, fruits), non-food agricultural products (for example, cotton, tobacco, wood) as well as minerals or soils and other substances of geological interest. All these materials, not being liquids are, for the most part, heterogeneous, which makes them very different from the types of samples that classical chemistry (and chemists) are used to dealing with. Indeed, prior to the advent of spectroscopic instrumentation, classical chemical analysis consisted of treating the sample in a way that prepared the component of interest as (or in) a liquid, which could then be easily and reliably sub-sampled. To deal with the inherent heterogeneity of the materials, an early (if not the first) step in the preparation of the sample for analysis is to grind the sample. This made the resulting material more homogeneous and therefore, it was easy to select sub-samples that were nearly all the same.

Figure 1. Ground wheat repacks—spectra from 31 calibration samples, two repacks per sample, of ground hard red wheat.

Grinding

The advantages of grinding a sample apply when optical methods of measurement (for example, spectroscopic methods) are used. Additional advantages also accrue to grinding solid samples for spectroscopic measurements:

  • A (presumably representative) part of a large original sample can be fit into a small sample cell
  • The grinding process helps to distribute the various components of the sample uniformly throughout the sub-sample that is presented to the instrument
  • Proportions of components of the original sample are retained and distributed throughout the subsample
  • The ground sub-sample can be spread out evenly for the optical measurement
  • We expect that all parts of. the ground sample are “the same” so that all subsamples are “the same”
  • Grinding samples to a consistent particle size will help reduce the reading-to-reading variability of the measured spectrum.

Even with all those benefits, however, a difficulty arises: when subsamples are taken from the ground sample and their spectra measured, the spectra of the. subsamples do not duplicate each other. Figure 1 shows an example of this phenomenon. Also see, as some other figures in the literature, for examples the figure on p.295 of reference 26, page 211 reference 27, or page 285 in reference 13. Each of these examples show multiple spectra of various sub-samples of a nominally well-mixed, powdered material, and the differences between readings. Indeed, even re-scanning the same subsample is generally found to give a different spectrum than the one first measured. Visual inspection of spectral plots of multiple samples such as are seen in Figure 1 has led to the belief that the relationship between spectra of repacked samples is that repack causes shifts in spectrum along the Y-axis (ordinate axis) accompanied by a change in the amplitude of the spectrum (also along the Y-axis).

While sometimes called “particle size effect” since larger variations in the measured spectra occur in samples ground to larger particle sizes, that variation is seen even when a sample is ground. to a given nominal particle size and the spectra of multiple repacks are measured.10,13

In the context of attempting to calibrate spectrometers to analyze samples for constituents of interest, these extraneous variations of the signal due to repack are enormously distracting; they prevent the analyst from knowing what phenomenon is creating errors, and more importantly, what is needed to improve the analytical results.

Since studies based on fundamental physics failed to indicate how to improve the analytical process, and since methods based on fundamental physics were of no avail and chemometrics methods were central to the concepts that the technology was based on, the community turned to empirical approaches based on chemometrics calculations. Of necessity the scientists of those days attacked the problem empirically; the mantra “math is easier (and cheaper!) than physics” (especially when the math can be implemented on a computer) led those well-meaning workers into somewhat of a trap.

The Appendix to this paper presents a summary by ChatGPT of the current state of knowledge for methods to deal with the variability due to the repack effect.

Repack

Interestingly, while compiling this paper, some differences between scientists performing NIR analysis came to light as to how they handled the samples. This brought the realization that "repack" is not a well-defined term scientifically. Everyone has a vague notion of what it means, but implementations can vary widely. Here we describe some of the methods used to create"repacks"(presumably the "sample", for example, the material presented to the spectrometer for measurement of its spectrum has been ground, well-mixed and loaded into the sample holder (often referred to as a "cup") for insertion into the instrument). While these variations in technique differed between the scientists, each person was consistent in their handling of their samples. We don't expect these variations to appreciably affect the results we obtained, although they might serve as a good starting point for future investigations into the meaning of "repack" and a consensus formed by the community as to the proper way to create and determine "repacks".

Repack Variations:

  1. After measuring the spectrum of a powdered sample, the material is removed from the sample holder and then the same material is reloaded into the sample holder (cup),possibly with a little extra material to make up for potential losses in handling. Often the cup is designed so that when closed it compresses the sample by a reproducible amount.
  2. Similar to point 1), except that when the sample is removed from the cup it is poured into a larger container where it is mixed with a larger amount of the same ground sample material. The material for the "repack" is then taken from the larger container. While some material will be re-used, this procedure is tantamount to measuring multiple aliquots, rather than multiple repacks.
  3. Either procedure 1) or 2) above may be used with more than one sample cup. This procedure might add error due to variations between the cups within manufacturing tolerances, to the actual repack variations.
  4. To expedite the measurement procedure, a scientist might pack several sample cups (with the same or different sample material) at one time, then measure their spectra in quick succession. This procedure might. also add differences between cups into the actual repack variations.
  5. Other variations of the procedure are possible, particularly when the nature of the sample material affects how it can be handled.

What not to do

The most common data transformation methodology reported in the literature was to perform a standard calibration/validation exercise consistent with ASTM recommendations,28 using two or more repacks of each sample, and apply a data transform that was expected to reduce the effect of repack on the predicted values of the analyte (the standard error of prediction: SEP); transforming the spectroscopic data by computing a first or second derivative (dA/dλ or d2A/dλ2) of the spectral data and using those values for the independent variables of the calibration calculations in place of the original absorbance values. Oftentimes a methodology like this does in fact result in a reduced value of SEP for the validation. This is a desirable result for an application of NIR analysis, so as a practical matter we can conclude that using those data transforms made an important contribution to success in the application of NIR to determine the desired analytical result.

However, as a scientific principle, we must recognize the fact that there are many contributions to the SEP (for example, the total disagreements between reference values and the results from the spectroscopic measurement): errors of the reference values themselves, noise of the instrument signal, variations of. the instrument readings due to repack (to be sure! and which may differ depending on how the samples are handled), optical effects depending on the instrument, the sample cup and the type of measurement (reflection or transmission), extraneous effects of the environment (temperature and/or humidity fluctuations and even acoustic noise, for example).

Therefore, despite a plethora of papers in the literature assuming an equivalence, overall SEP of an analysis is NOT the same as the effect of repack; repack is only one contributor to the total error. Therefore, attributing the reduction of SEP to reduction of the repack effect without any external justification or confirmatory evidence is an exercise in faulty logic. The standard deviation of repack cannot simply be replaced by the SEP, or the SEP used instead of the standard deviation of repack; they measure different things and are sensitive to different error sources. Indeed, to determine whether the repack effect is in fact reduced by any change to a physical factor or any manipulation of the data, at the very least it is necessary to directly measure and calculate the contribution of repack variation to the total error. This must be part of the experimental design for the measurements, as well as to correctly perform the necessary calculations.

Another approach to reducing the effect of repack on NIR analysis, besides empirical and fundamental physical studies, is to develop and apply a calibration algorithm that generates a model that is immune, or at least resistant, to the effect of repack. For a long time, however, no such algorithm was known. Some of the data transformations that have been applied in the routine use of NIR analysis have been intended to confer that property on the resulting calibration model. As discussed above, however, until now nobody has previously been able to demonstrate a data transform or an algorithm that directly addresses that criterion. Recently, however, we discovered (or invented; not sure which term applies to this unique application of previously known mathematics) an algorithm with that property. It turns out that the mathematics community had developed an algorithm with that property several hundred years ago. It was just never connected to our current problem, not by mathematicians, chemometricians, chemists or the NIR community. That method will be described in the Theory section of this paper, below.

THEORY - Development of a calibration algorithm to minimize or eliminate systematic repack variation.

The property we would like to see our model have is that two readings of a given sample should produce the same analytical value, even though the absorbance values change when the sample is repacked. Our calibration theory with this goal is based on a variation of the multiple linear regression (MLR) algorithm. At the time of this writing, it is unknown whether a similar variation can be applied to calibrations based on the PCR or PLS algorithms; here is a good project for the Chemometrics community to tackle!

Our statement above "no such algorithm was known" is, strictly speaking, not quite correct. A more accurate statement should be "no such algorithm was known to the NIR or Chemometrics communities". Mathematicians have known about a relevant algorithm for roughly 200 years, but nobody thought to apply it to this situation.

Our algorithm is based on work of the mathematician Joseph Louis Lagrange (Jan25, 1736 – April 10, 1813) who was, in his own time, already considered to be one of the finest mathematicians of his day. Arguably his best-known work is his solution to the three-body problem of astronomy, and which is commemorated to this day by the definition of the “Lagrange points” of equilibrium for three bodies orbiting freely and affected only by their mutual gravitational attraction.

Lagrange is also noted for developing what is now called the Lagrange Method of Undetermined Multipliers, which we apply here and has the capability (among other properties) to apply a “constraint” to the solution of a mathematical problem. Here we use Lagrange’s method to find the least squares solution to the problem of relating the measured absorbance values to the independently determined values of a constituent in the samples of interest according to the standard MLR method.

Constraints

Taylor describes and explains many applications of Lagrange's mathematics.29 Of interest to us here is the application to constrained functions, that is, where the solution to a given problem is required to meet some auxiliary condition, the constraint. As Taylor describes the situation mathematically: if F(x,y,z) describes the function of interest, and G(x, y, z) describes the constraint, then the problem to be solved can be written as:

u = F(x,y,z) + λG(x,y,z) [1]

where λ represents an (initially unknown) constant (which should not be. confused with the use of λ. as the symbol for indicating optical wavelength).

In the interest of keeping the expressions simple and preventing them from becoming excessively unwieldy when more wavelengths are under consideration, we limit the number of wavelengths we consider for our presentation here to three. In actual situations, of course, any number of wavelengths may be required, in which case the equations can be expanded in an obvious way. Thus, for sets of two spectra from each given sample (with repack between them) we have the following equations from standard MLR considerations. Mathematically, we start by expressing the fundamental relationship this way:

First reading: C = b0 + b1A1+ b2 A2 + b3A3+… [2]

Second reading: C' = b0+b1'A'1+b2A'2 +b3A'3+… [3]

where the primed vs unprimed variables represent the readings from two different repacks of the same sample. Since we want the two concentration readings to be the same, we want the following condition to obtain (this is the basis of the Lagrangian constraint):

0 = C–C’= (b0+b1A1+b2A2+b3A3+…) –(b0+b1A1'+b2A2'+b3A3'+…) [4]

When C=C’ and since b0 is a constant, b0-b0=0 and the equation becomes:

0 = (b1A1 - b1A1’) + (b2A2 - b2A2’)+ (b3A3- b3A3’) + …. [5]

and finally:

0 = (b1ΔA1)+(b2ΔA2)+(b3ΔA3)+… [6]

where the various ΔAi represents the absorbance differences at wavelength i between two repacks.This development is like the one presented in Chapter 12, section 12.4.2.2, of reference (13) and reference (30). The difference is that in that case the development is subject to the special condition that the coefficients must sum to zero, which limits the applicability of that algorithm. Here we will remove that limitation and consider the situation in its full generality.

For the purpose of measurement, we consider that (absolute) ΔA: |ΔA| can vary from zero to some maximum value in any given dataset. The sign of ΔA in any reading will depend on the order in which the two repacks were measured. This, however, turns out to be immaterial since we square the differences, so the result is positive in either case.

Equation 2 represents a standard MLR calibration equation. But now we wish to impose a condition (or “Constraint”, in Lagrange’s terminology) on the solution, that the various Ci be the same (i.e., ΔC=0). The condition, expressed in equation 6, is the Lagrange constraint for our situation.

Equation 6 specifies the condition we want to impose on the solution to the problem of calibrating a spectrometer. Since the variations of the absorbance readings due to repack are presumably random and independent of the composition of the samples and the repack variations are presumably linear with respect to each other, there are several auxiliary conditions the spectral data needs to adhere to:

Repack conditions;

Since any given sample with multiple repacks should not exhibit compositional differences, it is expected that the variations at different wavelengths due to repack of any given sample are linear with respect to each other, so that the data would comply with several auxiliary conditions:

  • For all wavelengths used in a calibration, the variations due to repack at any wavelength is in constant proportion to the corresponding variations at other wavelengths,
  • For multiple repacks of a sample, if the spectra are sorted in order of their values of absorbances at a given wavelength, that sorting order should be the same for the absorbances at any wavelength. For the purposes of performing chemometrics we want the spectra to have certain characteristics.
  • Differences in spectra from repacks of the same sample should represent only the differences due to the repack. There should be no spectral differences due to compositional differences (or from any extraneous source such as temperature) between the two spectra. Therefore, we expect that any differences between spectra from a given sample should exclusively represent only repack and changes of scale. While some spectra might be larger or smaller than another (possibly due to differences in the compaction of the samples; for example), one aspect of this condition is that when plotted the spectra should not cross each other.
  • Changes at one wavelength due to repack should be linearly related to corresponding changes at other wavelengths.

A previous variation of MLR30 used the constraint that Σbi = 0, based on an assumption that ΔA, the difference in absorbance between repacks of the same sample, was the same at all wavelengths, in which case:

ΔC = (b1 *ΔA1)+ (b2*ΔA2) +(b3* ΔA3) [7]

ΔC = Σbi * ΔAi [8]

This is a very strong condition, and from Figure 1 we can see that assumption is not true for all cases. Rather, we see from Figure 1 that ΔAi, the changes at the various wavelengths, are indeed (at least roughly) proportional to each other, depending on the wavelength.

Based on this relationship, the constraint for the calibration model should be based on an extension of the constraint for the previous case, where the various ΔAi are not necessarily the same, so expanding equation 8 we arrive at our constraint:

b1ΔA1+b2ΔA2+b3ΔA3 = 0 [9]

Having defined the constraint, we now begin our derivation with equation 10,(below), which, as before, is the standard model for an MLR equation and again we limit the model to a three wavelength model for simplicity in writing out the equations, as we did when discussing the basic MLR model previously (30):

The true model is C = b1A1+ b2A2+b3A3. (Equation [2] repeated)

and therefore, from the calibration, the computed model for the ith sample, and the error, is found here (note that we use the subscript k to designate the sample and subscript i to designate the wavelength in the calibration model to avoid having to put double subscripts on everything):

Ck = b0+b1Ai,1+b2Ai,2+b3Ai,3 + error [10]

We rearrange equation 10 and calculate the error (substituting Ek for the error, which is, the difference between the true value of the analyte determined by external means, for example, chemical analysis, and the value calculated from the model):

Ek = Ck-(b0+b1Ai,1+b2Ai,2+ b3Ai,3) [11]

Then, as we did before,30 we create the sums of squared errors to be minimized, and recalling that all summations are taken over the samples (which are indexed by j when explicit, and which is inferred when not explicit).

The sum-squared error is therefore:

ΣEj2 = Σ(Cj-(b0+b1Ai,1+b2Ai,2+b3Ai,3))2 [12]

which, without any constraint, comprises the precursor equations to the development of what are termed the Normal Equations in MLR-speak.

Thus, in ordinary MLR, ΔCj = ΣbiΔAi (as per equation [11]), where bi. represents the ith calibration coefficient multiplying the absorbance at the ith wavelength).

Now we also need to add the constraint, which, as described above in equation 8, is: b1ΔA1+ b2ΔA2 + b3ΔA3 = [0] [13] (is equation [8] repeated)

In accordance with Lagrange's specifications, we add the constraint of equation 13 to equation 12, after multiplying the constraint by λ, an undetermined multiplier.

ΣEj2 = Σ(Cj -(b0 + b1Ai,1+ b2Ai,2 + b3Ai,3))2 + λ(b1ΔA1+ b2ΔA2 + b3ΔA3) [14]

The constraint that λ multiplies must equal zero to prevent the additional term from affecting the value of the rest of the equation. In our case, since the constraint is that the sum of coefficients times the change in the associated variable, a quantity that is already zero, we need only add the constraint itself to the equations. Here is the equation with the Lagrange multiplier and the constraint (from equation 8 but in compressed form), added:

ΣEj2 = Σ(Cj - (b0 + b1Ai,1+ b2Ai,2 + b3Ai,3))2 + λ [0] [15]

where ΔAi represents the absorbance change at the ith wavelength due to the effect of repack. For now, we simply observe that, from the repack conditions listed above, that the relations between values of ΔA at different wavelengths are in constant proportions to each other, so that any set of measured values of ΔA at the various wavelengths is representative and can be used in the calculations. On the other hand, other characteristics of the spectra, especially the noise, does not necessarily show that behavior. Therefore, to minimize perturbations of the calculations by the noise content of the spectra, the values of ΔA used for the calculations should be the ones whose set of ΔA values are the largest of the ones from all the samples in the data set.

For the time being we will leave the net value [0] of the constraint in the equation, to simplify the expressions. Now expanding the squared term (in multiple steps):

(first step of expansion, applying the rule that (a - b)2 = a2 - 2ab + b2):

ΣEj2=Σ(Cj2-2Cj(b0+b1Ai,1+b2Ai,2+b3Ai,3))+(b0+b1Ai,1+b2Ai,2+b3Ai,3)2)+λ[0] [16]

(Second step of expansion, applying the rule for multiplying two multiple-term expressions to the squared term):

Now inserting the expanded constraint from equation 14 for [0] instead of just the net value, reminding ourselves that ΔAi represents the change of absorbance at the ith wavelength due to the effect of repack; then we obtain:

and upon distributing the summations:

Collecting like terms:

The next step is to minimize the error by the standard procedure of taking derivatives and setting them to zero. So now we will take the derivatives of Equation 20 with respect to each of the bi and with respect to λ. This step is very intricate and generates many terms in each of the equations formed (many of which are zero, fortunately). Note that at this stage we retain all the zeros generated by taking the derivative of a constant or an uninvolved variable; this helps us to keep track of our location in the derivation. These zeros will be dealt with in a later step. (We sometimes wonder how the mathematicians of old kept track of all this without computers.)

This taking of derivatives with respect to five different variables results in five parallel threads of the derivatization, which we label with a letter in the equation number. Note that in the following subsets of equations (25a...25e,26a...26e, etc.)each subset presents the development (i.e., expanding the parentheses, separating parenthesized terms, replacing the zeros, etc.) of one of the derivatives of equation 20). Note that each term can be traced back to the corresponding term in equation 20. The first step is to express the derivative with respect to each variable:

Upon deleting superfluous zeros:

Setting the derivatives equal to zero:

Divide each equation through by 2 (note that while 0/2 still equals 0 we retain the expanded form for consistency with the rest of the equations):

Add the constituent-containing term CjAi,k to both sides of the corresponding equation, thereby isolating the constituent terms:

The equations can now be rearranged thusly, to isolate the constituent terms:

We now have 5 equations in 5 unknowns: four for the regression coefficients (b0 through b3) and one coefficient for λ. If we compare these equations with the Normal equations for ordinary MLR, we see that equations 26 are of the same form as the Normal equations with some revisions:

  1. The presence of a term including λ on the right-hand side (RHS) of each equation. This comprises another variable to solve for. This is necessary, since adding the constraint to the Normal equations created an extra equation, and we therefore need an extra variable in order to have a set of equations that are solvable algebraically.
  2. An extra equation specifying the constraint. This equation, added to the Normal equations, ensures that the coefficients arrived at will not only be the ones that minimize the sum squared error, but since all five equations must be satisfied, the coefficients must also have the property that their weighted sum (weighted by the variability of the corresponding spectral data) of the coefficients equals zero.

We're now ready to invoke the magic of a matrix algebra approach to multivariate analysis. We rewrite equations 26 in matrix format: This was Lagranges' genius: to develop a method to take the desired external constraint and bring it inside the matrix, as we will now see when we convert the above equations (26a:26e) to matrix notation:

Factor the RHS of equation 27:

Equation 28 represents the full written-out expression of the matrix equation we need to solve for b. Expressed in shorthand matrix notation equation 28 can be seen to be expressed this way:

ATC = [ATA] [b|λ] [29]

where [A] is the first matrix on the RHS of equation 28.

Equation 29 (in shorthand form) is essentially identical to the Normal equations of MLR.

In the shorthand notation of the matrices, therefore, equation 29 can be seen to be identical to the Normal Equations of MLR except that now the matrices contain variables beyond the absorbances that the usual Normal Equations contain (that is, it also includes λ and an extra row and column). Thus, in the shorthand notation of matrices, since equation 29 is identical to the Normal Equations of MLR, the solution of equation 29 is the same as for ordinary MLR:

Unsurprisingly, therefore, the solution (expressed in shorthand matrix notation) is the same as for ordinary MLR:

[b|λ] = [ATA]-1 (ATC) [30]

The matrix calculations specified here are the same as with ordinary regression, the differences are in the nature of the matrices that are used. And now, the vector of least-square coefficients [b], additionally has the property that the resulting calibration model is resistant to the effect of repack variation on the absorbances at the various wavelengths

From these equations we can see that the differences between repacks can be calculated from the expression

D = Σ(biΔAi) [31]

Ideally this should equal zero.

Historically, investigations into the optical behavior of particulate materials we performed without regard to considerations of the relationships between measurements at different wavelengths. This was undoubtedly largely due to the lack of instruments with sufficient stability, precision and resolution to evaluate the difference between similar setups. With modern instrumentation giving us the ability to collect better data, we expected to be able to determine causes for some of the differences that arise. Our failure to accomplish the identification of any of the causes beyond a fairly crude assignment of the effects on the spectra to the overall sample treatment leads to the conclusion that the behavior is, indeed, random, and that the historical approaches, of studying the fundamental physics as Meyerhofer,25 are indeed correct.

Future extensions of these studies:

  1. Investigations into the effect of the variations in the interpretation of the meaning of "repack", and recommendations for a standardized definition and procedure for determining that characteristic.
  2. Investigate whether the variations in the implementation of the repack procedure, as described above in the "Repack" section of this paper are promising approaches.
  3. Investigations to confirm varying concentrations of cinnamon oil in cinnamon sticks as the source of absorbance bands at ~1600-1800 nm.
  4. Confirm the effectiveness of our theory on actual data.
References
  1. Murray, I. Near Infrared Diffuse Reflectance/Transmittance Spectroscopy. In Proceedings of the International NIR/NIT Conference; Akadémiai Kiadó: Budapest, Hungary, 1986; pp 26–27.
  2. Schuster, A. XXII. The Influence of Radiation on the Transmission of Heat. Philos. Mag. 1903, 5, 243–250. https://doi.org/10.1080/14786440309462919
  3. Schuster, A. Radiation Through a Foggy Atmosphere. Astrophys. J. 1905, 21, 1–22. https://doi.org/10.1086/141186
  4. Stokes, G. G. Proc. R. Soc. London 1862, 11, 545–556.
  5. Melamed, N. T. J. Appl. Phys. 1963, 34, 560–570. https://doi.org/10.1063/1.1729309
  6. Kortüm, G. Reflectance Spectroscopy: Principles, Methods, Applications; Springer-Verlag: New York, 1969.
  7. Wendlandt, W. W.; Hecht, H. G. Reflectance Spectroscopy; Interscience Publishers, John Wiley & Sons: New York, 1966.
  8. Dahm, D. J.; Dahm, K. D. Interpreting Diffuse Reflectance and Transmittance: A Theoretical Introduction to Absorption Spectroscopy of Scattering Materials; IM Publications: Chichester, West Sussex, U.K., 2007.
  9. Mark, H. Comparative Study of Calibration Methods for Near-Infrared Reflectance Analysis Using a Nested Experimental Design. Anal. Chem. 1986, 58 (13), 2814–2819. https://doi.org/10.1021/ac00126a051
  10. Mark, H.; Workman, J., Jr. Effect of Repack on Calibrations Produced for Near-Infrared Reflectance Analysis. Anal. Chem. 1986, 58 (7), 1454–1459. https://doi.org/10.1021/ac00298a041
  11. Saranwong, S.; Kasemsumran, S.; Thanapase, W.; Williams, P. Near Infrared Spectroscopy. In Proceedings of the 14th International Conference on Near Infrared Spectroscopy; IM Publications: Chichester, West Sussex, U.K., 2010; p 1208.
  12. Ciurczak, E. W.; Igne, B.; Workman, J., Jr.; Burns, D. A. Handbook of Near-Infrared Analysis, 4th ed.; CRC Press: Boca Raton, FL, 2021.
  13. Burns, D. A.; Ciurczak, E. W., Eds. Handbook of Near-Infrared Analysis, 2nd ed.; Practical Spectroscopy Series; CRC Press: Boca Raton, FL, 2001.
  14. Dahm, K. D.; Dahm, D. J. Principles of Diffuse Reflectance Spectroscopy. In Handbook of Near-Infrared Analysis, 4th ed.; Ciurczak, E. W.; Igne, B.; Workman, J., Jr.; Burns, D. A. CRC Press: Boca Raton, FL, 2021; Chapter 2, pp 17–40.
  15. Dahm, D. J.; Dahm, K. D. J. Near Infrared Spectrosc. 1995, 3, 53–59.
  16. Dahm, D. J.; Dahm, K. D. Appl. Spectrosc. 1999, 53, 647–654.
  17. Dahm, D. J.; Dahm, K. D. J. Near Infrared Spectrosc. 1999, 7, 47–53.
  18. Dahm, D. J.; Dahm, K. D.; Norris, K. H. J. Near Infrared Spectrosc. 2000, 8, 171–181.
  19. Dahm, D. J.; Dahm, K. D. The Physics of Near-Infrared Scattering. In Near-Infrared Technology in the Agricultural and Food Industries; Williams, P.; Norris, K. H., Eds.; AACC International: St. Paul, MN, 2001; pp 1–17.
  20. Dahm, D. J.; Dahm, K. D.; Norris, K. H. J. Near Infrared Spectrosc. 2002, 10, 53–59.
  21. Dahm, D. J.; Dahm, K. D. Discontinuum Theory of Diffuse Reflection. In Handbook of Vibrational Spectroscopy; Chalmers, J. M.; Griffiths, P. R., Eds.; John Wiley & Sons: Chichester, U.K., 2002; Vol. 2, pp 1149–1153.
  22. Dahm, D. J. NIR News 2004, 15 (4), 6–10.
  23. Dahm, D. J. NIR News 2006, 17 (1), 4–7.
  24. Dahm, D. J. NIR News 2006, 17 (2), 12–15.
  25. Mayerhöfer, T. G.; Ilchenko, O.; Kutsyk, A.; Popp, J. Beyond Beer’s Law: Quasi-Ideal Binary Liquid MixturesAppl. Spectrosc. 2022, 76 (1), 92–104. https://doi.org/10.1177/00037028211056293
  26. Hildrum, K. I.; Isaksson, T.; Næs, T.; Tandberg, A. Near Infrared Spectroscopy: Bridging the Gap Between Data Analysis and NIR Applications; Ellis Horwood: New York, 1992.
  27. Burns, D. A.; Ciurczak, E. W., Eds. Handbook of Near-Infrared Analysis, 3rd ed.; Practical Spectroscopy Series, Vol. 35; CRC Press: Boca Raton, FL, 2008.
  28. ASTM E1655-05, Standard Practices for Infrared Multivariate Quantitative Analysis; ASTM International: West Conshohocken, PA, 2005.
  29. Taylor, A. E. Advanced Calculus; Ginn and Company: New York, 1955.
  30. Mark, H.; Workman, J., Jr. Spectroscopy 2023, 38 (10), 14–18.

APPENDIX

Ask AI: The conventional wisdom for eliminating Repack Effect in NIR according to ChatGPT

Reference:

https://chat.openai.com/c/41e6889a-1dd5-4e7c-954b-d4bd50b155dd

"User:

How can I eliminate repack effect in near-infrared analysis?

ChatGPT:

Eliminating the repack effect in near-infrared (NIR) analysis can be a challenging task, but there are several strategies you can employ to minimize or mitigate its impact. The repack effect is an undesirable phenomenon in NIR analysis where the distribution of particles in a sample is not uniform, leading to variations in the spectra obtained. Here are some methods to address this issue.

Homogenize the Sample:

Ensure that your sample is well-mixed and homogeneous. Inhomogeneous samples can lead to variations in the NIR spectra. Using mechanical mixing, blending, or other sample preparation techniques can help achieve better homogeneity.

Sample Presentation:

Pay attention to how the sample is presented or loaded into the NIR instrument. Ensure that the sample is uniformly distributed in the sample cup or holder.

Optimal Sample Cup Design:

Select the appropriate sample cup or holder design. Some sample cup designs are better at maintaining sample uniformity than others. Choose one that minimizes the repack effect.

Sample Size:

Consider the size of the sample. Smaller samples may be more prone to the repack effect. Using a larger sample size can help reduce this effect.

Sample Compression:

Some samples can be compressed in the sample cup to reduce the repack effect. However, this method may not be suitable for all sample types, so be cautious when using it.

Multiple Measurements

Take multiple measurements at different locations within the sample to account for potential inhomogeneity. The average of these measurements can provide a more representative spectrum.

Calibration Models

Use robust calibration models. A well-developed calibration model can help correct for variations in the spectra due to sample inhomogeneity. Building a calibration model on a representative dataset is essential.

Data Preprocessing

Employ data preprocessing techniques, such as baseline correction and scatter correction, to remove or reduce unwanted variations in the spectra.

Reference Methods

If possible, use reference methods to validate the accuracy of your NIR analysis. Reference methods can provide a reliable way to cross-check NIR results.

Sample Handling Procedures

Implement standard operating procedures for sample handling to ensure consistency in sample preparation and presentation.

Proper Instrumentation

Ensure that your NIR instrument is well-maintained and properly calibrated. Regular maintenance and calibration can help minimize measurement errors.

Expert Guidance

If the repack effect is a significant concern for your specific application, consider seeking expert guidance or consulting with NIR spectroscopy specialists.

It's important to recognize that the severity of the repack effect can vary depending on the nature of your samples and the specific NIR instrument being used. Experimentation and optimization may be required to find the best approach for your particular application."

We would say that's pretty good for a computer! On the other hand, while ChatGPT did a surprisingly good job of summarizing the existing knowledge regarding improvement of NIR calibrations, its answer illustrates some key limitations of artificial intelligence (AI): it contains no information as to how to implement any of those methods that it summarizes, nor does it contain any hint of the possibility of extending that knowledge with new approaches to the problem (for example, a new calibration algorithm!). Furthermore, its response really addresses the question of optimizing the performance of a calibration (as conventionally measured by SEP) rather than the more focused question of minimizing the effect of repack, which was the question asked. But then, lots of people, when faced with a question they can't answer, will also change the question to one that they CAN answer.