Paper clip calibration in Youth Court
It is sometimes difficult to explain the meaning of "calibration" and "calibration curve" during forensic evidence or argument in Youth Court. I like to use paper clips to explain the multi-step approach to digital instrument calibration or re-calibration.
As you can see in the image above, a "measurement" is a "comparison". In this example we are comparing a pencil, the thing to be measured (the "measurand") against paper clips. We discover experimentally that 1 pencil equals 9 paper clips.
Think of the paper clips as units of an electrical signal. The measuring system responds to the one pencil in the sample chamber by producing 9 paper clip units of electricity.
Another experimenter might get a result of 8 or 10 paper clips. The original experimenter might get a result of 8, 9, or 10 paper clips running the same experiment 30 seconds or 6 months later.
To improve accuracy of the measurement we perform many experiments and take an average.
To check precision we run a number of experiments and calculate standard deviation - we compare the distance each of the many results is from the average and calculate the average distance of all those differences from the average.
To check reliability we run the experiment with different experimenters, and in different locations, and at different times.
Here's a link to a fun video on Youtube that helps me to understand the important differences among accuracy, precision, and reliability. Please note that I respectfully disagree with the author of the video, somewhat, with respect to "reliability". The meaning of "reliability", as I see it applied outside social sciences, is not so much repeatability, but rather repeatability by different experimenters, and in different locations, and at different times. It is important for Youth Court lawyers who deal with forensic evidence in criminal cases to understand the differences among these terms and be ready to cross-examine government experts on these terms. Judges and lawyers often confuse them.
That's just the first step. We experimentally have measured the mass of the pencil in units. However, the units we have chosen, namely paper clips, are not acceptable in the international system of units. Paper clips vary in size, colour, and mass. Paper clips of the same size and colour don't necessarily have the same mass. The paper clips used in Ottawa may not have the same mass as the paper clips used in Toronto. We are violating paragraph 35 of Magna Carta. We are not using a standard unit of measurement. Our measuring system is definitely not suitable for trade in Canada (see Weights and Measures Act section 4(1)) and shouldn't be suitable for forensic purposes.
Measurement results must be "traceable" to the International System of Units. We accomplish "traceability" of a measurement result (an issue that Youth Court lawyers should always cross-examine police about) by "calibrating" our measuring system.
Please note that "calibration" is NOT the same thing as a "control check", "cal. check", or "accuracy check". Calibration is done at the factory or at a factory authorized service depot. Calibration for a "quantitative analysis" requires multiple calibrators (Motherisk Inquiry Report).
I suggested earlier that paper clips are like the electrical signals that are the instrument's response to the sample in the sample chamber. Whether the instrument is at the factory or whether it is being used to measure in the field, both the sample in the sample chamber and the paper clips have "unknown" mass. They are both "unknowns". We need to tie those "unknowns" to "knowns". If we don't, the measurement is meaningless. A "cal. check" does not tie the measurement result to a"known" - it does not adjust the measurement result - it is only a control test.
"Calibration" always uses "standard reference materials". To calibrate our measuring system we might obtain such reference materials from the National Research Council in Canada or NIST in the United States. Those reference materials are, in turn, directly traceable to the standard kilogram mass that sits in a glass jar at the offices of the BIPM in France.
In "calibration" we need to establish a mathematical relationship between the sample in the sample chamber and the "indication" on the instrument's display. In order to do that we (on older instruments) use a screwdriver to adjust a variable resistor (a pot) which controls the relationship between the paper clips electrical signal and the indication. Alternatively, we use software (on newer instruments) to adjust the electronics to fix the relationship between the paper clips electrical signal and the indication. That relationship stays fixed in the instrument's hardware or software until the instrument is "re-calibrated". It DOES NOT change every time the police officer runs a control test or a subject test sequence. (The situation may be different with some measuring systems such as gas chromatography in a forensic lab where a new calibration curve may be calculated every day it is used).
What happens if "over time" the relationship between unknowns in the sample chamber and the electrical signal (the paper clips) changes due to a failing light bulb, dirt in the sample chamber, or a failing IR light filter? Maybe "over time" 1 pencil starts to equal (on average) 7 or 11 paperclips? The instrument still has a fixed mathematical relationship in its hardware or software between paper clips (the electrical signal) and the digital indication on the display. The calibration relationship, the "calibration curve" hasn't changed but the instrument's response to a sample in the sample chamber has changed.
However, if you compare the old calibration curve fixed in the machine, with what the calibration curve ought to be, you can think of the curve as shifting up, down, rotating about any axis, or stretching. In that sense the "calibration curve", the correct calibration curve for the instrument if it is to function properly, has changed.
Please note that I am referring to the relationship between the paper clips and the indication as a "calibration curve". It is a "curve" NOT a "line". When you look at the following images it's tempting to think that all relationships in our measuring system are linear as we increase the number of paperclips (i.e. each paperclip corresponds to exactly .5 gm):
Wow! These images look like linearity.
In reality, when a "calibration curve" is created by tweaking pots with a screwdriver or semi-automatically during an auto calibration sequence, at the factory or authorized maintenance depot, the relationship is much more complex than just a 2:1 relationship. On a graph it is probably a parabola, a quadratic expression.
1. We have to remember that ultimately we are not measuring paper clips. We are measuring pencils - in units of mass.
2. We have to remember that we are not just measuring the mass of one pencil but rather the
a) mass of many pencils or fractions of pencils (the "measuring interval")
b) over the intended "calibration interval" of the instrument.
The instrument is going to be out in the field for a period of time before it is brought in for re-calibration.
If the relationship between the paper clips (the electrical signal) and the indication is not linear, then most certainly the relationship between the true mass of pencils in the sample chamber and the indication is NOT LINEAR.
It would be so much simpler if we could just assume linear relationships. We can't, even if it is tempting to do so. Don't fall into the trap of assuming linearity. Ask the proponent of the digital evidence to prove it empirically! To do this you will need to see maintenance records and control tests.
Even if the instrument contains software that makes an attempt to create a calibration curve that approximates linearity, you should not assume linearity. The relationship between the paper clips and the indication may be fixed in the software memory but it may be erroneous as the response of the instrument (the relationship between the sample in the sample chamber and the paper clips) changes "over time". That's long-term drift. Significant drift causes unreliablity. See the Hodgson definition of "reliability" in his article referred to by the SCC in R v St-Onge Lamoureux reference below)
Even if a representative instrument was evaluated during "type approval" and written up in a peer-reviewed article that says its measurement results have been found to be linear:
1. YOU CANNOT ASSUME LINEARITY IN ALL INSTRUMENTS OF THE SAME TYPE - each one has a different calibration curve.
2. YOU CANNOT ASSUME LINEARITY OF A PARTICULAR INSTRUMENT OVER TIME - instrument response changes over time and so the "correct" calibration curve - the one that best approximates linearity changes over time.
NASA says that there is a phenomenon called "uncertainty growth". Uncertainty grows with time since last calibration. Reliability diminishes with time since last calibration. Maybe the ON CA got R. v. Jackson right, on its facts, because it was a new instrument and maybe the AB CA got R. v. Vallentgoed right because the particular instrument, ON THE EVIDENCE OF RE-CALIBRATIONS DISCLOSED, had received "RE-CALIBRATION" on regular periodic short intervals. In both cases the instruments were recently calibrated or re-calibrated, by the factory or an independent factory-authorized service centre. Hopefully the SCC on the appeal of Vallentgoed will take a look at what NASA has to say about "uncertainty growth" and consider Hodgson's concept of "over time" (see Hodgson definition of “reliability” in relation to “over time” in article referred to by SCC in St-Onge: “The Validity of Evidential Breath Alcohol Testing” (2008), 41 Can. Soc. Forensic Sci. J. 83.) .
The relationship between the paper clips and the indication on the display may be fixed - carved in stone - but the relationship (the instrument's response) between the unknown sample and the paper clips CHANGES OVER TIME. That means that the relationship between the unknown sample and the indication on the digital display CHANGES OVER TIME. If you don't re-calibrate at regular intervals the the measuring system becomes unreliable.
Note: In this blog entry I have attempted to use terminology consistent with the VIM: International Vocabulary of Metrology. Please correct me if I have erred. I suggest lawyers and jurists would be wise to refer to this document in any writings about forensic metrology in Canada.