|[ Proceedings Contents ] [ Forum 1997 Abstracts ] [ WAIER Home Page ]
Boolean versus continuous variables as tools of measurementJonathan Hippisley
Graduate School of Education
University of Western Australia
Instruments yielding only a Boolean result are seldom used in natural science. A botanist, interested in the growth rate of a certain plant, does not venture out into the garden armed only with an unmarked measuring rod (Figure 1). If he did so, his contribution to science might be rather limited. Suppose he recorded the result that the plant was shorter than the measuring rod with a 0 and the result that the plant was longer than the measuring rod with a 1. His record of plant height over time might look like Table 1 (below).
Table 1: Plant height expressed as a Boolean variable against time
Nor does he trek out with a bundle of unmarked rods, in an attempt to convert a collection of Boolean scores into an arbitrary quantum score. In Figure 2, the plant is shorter than six rods, but taller than three, so we could award the plant a score of 3 out of 9, or approximately 33%. Such a score on its own would be quite without meaning, because nothing is said about the length of any of the rods with respect to each other or anything else. All that can be deduced from the score is that six rods are taller than the plant. Tomorrow, if the plant is taller than 5 of the rods, has it grown a lot or a little. Maybe two of the rods were just a little taller than the plant on the previous day. Maybe these two rods are themselves identical in length.
Natural scientists usually design their instruments to yield results on a scale which might not quite be continuous, but whose graduations are both regular and small enough to meet the needs of their study. A botanist will not use an unmarked rod to measure the height of his plants on a Boolean scale; nor will he grab a collection of unmarked rods to measure height on and arbitrary an undefined quantum scale. He may mark one of his rods with a regular scale, and because he is lucky enough to be measuring in a dimension which has been the subject of study for thousands of years, he may even choose to mark his rod with a standard scale, shared by other scientists, so that he can share his results with them
In Figure 3, two children are measured on a Boolean scale by an unmarked rod. Both children are shorter than the rod, yielding a score of zero on the Boolean scale. The children are not identical in height, but they yield an identical height result on the Boolean scale.
In Figure 4 a collection of unmarked rods has been gathered together to create an arbitrary quantum scale. The scale is arbitrary because no attempt has been made to standardise the incremental difference in length between the rods in the collection. There is a big length gap between some of the rods, a small length gap between other rods, and no length gap between others. As it happens, the heights of both these children fall between the same two rods. So although they are not identical in height, on this arbitrary quantum scale, both children score six marks out of a possible nine.
Real life examiners try to spread the length of their measuring rods over a range broad enough to cover the students in the study. Items with zero or perfect scores are usually excluded from the analysis (Ebel, 1979), and in some (rare) cases attempts are made to place the length of the rods, or more literally, difficulty of the items in a test, on a standard scale (Rasch, 1960).
Nevertheless, the fact remains that the entire examination, comprising, ten fifty or a hundred items, is reduced to a single measure, and since children, unlike plants, do not stand still to be measured on a constant base; since they run around and bend over and jump up and down; any single measure, even of something as easy to define as their height, may not be entirely accurate. In any measurement, even in natural science, there will be errors, and it can be shown mathematically (Sijtsma 1993) that the scale of the errors is reduced as the number of measurements is increased.
Using an entire examination to produce a single measure is not therefore a particularly efficient method of measurement. Rather than using an array of Boolean instruments to produce a single score, the botanist has the right idea when he uses a graduated instrument to produce a measure on a continuous (or sufficiently near to continuous for the purpose of his study) scale. He might then repeat each measure after an hour for increased accuracy, or ask an assistant to repeat each measure for him. If he has several assistants he might ask each one of them to record their own measurements of the plant each day.
How would it be in education if we could construct a device to measure each item in a test on a continuous scale? Every item on a test would then be like a whole exam in its own right. A test with ten items would be like ten exams. In theory we should expect a reduction of errors through averaging, or to use a term used widely in psychometrics, we should expect an increase in the reliability of the test.
Figure 6 shows the same conceptual idea applied to a child. The rods are unmarked, and their lengths are unknown, so for the three rods which are taller than the child, the old Boolean score of zero has to be recorded; but for the other six rods, a variable score from the tapes can be read. If the same set of rods is used to measure a collection of children, not one, but many variable scores will be recorded for each child, thereby increasing the reliability with which the heights of the children may be compared.
Figure 7 shows the effect of appending a variable scale to the Boolean items used to construct Figure 5. The tape measure appended to the unmarked rod, or item, was time. For graphical convenience this was converted into a rate by dividing the time spent on the item into the old Boolean result. That yields a zero for incorrect answers, so for this composite measure items well within the ability range of the students are preferred. Figure 7 was constructed by dividing the total time taken into the quantum score for the test. It is shown to contrast the two distribution curves. Figure 5 shows that most of the items are within the ability range of most of the children. Figure 7 shows that while the students are getting most items right, some do so much more quickly than others. Figure 7 distinguishes between Simon, and David and Neil in the example given above; Figure 5 does not.
By combining time with the Boolean item result, a variable measure is derived from every item answered correctly. Since many items have been answered correctly, the statistical rules of averaging should lead us to predict relatively lower errors, or conversely higher reliability. A calculation of Cronbach's alpha confirms this to be the case. Cronbach's alpha for scoring rate for the same children using the same test as used in the previous calculation was 0.96. This is exceptionally high by the standards of most written tests.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. University of Chicago Press, Chicago
Sijtsma, K. (1993). Current trends in theories and assessment of intelligence. In Hamers, J. H. M., Sijtsma, K. and Ruijssenaars, J. M. (Eds), Learning Potential Assessment: Theoretical, methodological and practical issues. Swets & Zeitlinger, Amsterdam N.L.
|Please cite as: Hippisley, J. (1997). Boolean versus continuous variables as tools of measurement. Proceedings Western Australian Institute for Educational Research Forum 1997. http://www.waier.org.au/forums/1997/hippisley.html|