A comparison of the labeled magnitude (LAM) scale, an 11-point category scale and the traditional 9-point hedonic scale
Introduction
The labeled affective magnitude scale (LAM, Fig. 1) was developed by Schutz and Cardello (2001) as an alternative to the commonly used 9-point category scale for measuring food acceptability (Jones et al., 1955, Peryam and Girardot, 1952, Peryam and Pilgrim, 1957). The LAM scale was an extension of the labeled magnitude scale (LMS) for psychophysical intensity scaling developed by Green, Shaffer, and Gilmore (1993), based on earlier work by Borg (1982) for a so-called “category–ratio scale”. The LAM scale has been used recently for evaluation of consumer liking for teas (Chung and Vickers, 2007a, Chung and Vickers, 2007b), to study the genetic factors in sweet taste perception (Keskitalo et al., 2007) and in a comparison of young and older person’s liking for different orange juices (Forde & Delahunty, 2004). Recently, Jaeger and Cardello (2009) compared the LAM to best–worst scaling and found approximate parity in discrimination. The LMS and LAM scales may have the following properties:
- (1)
Because the scales were based on magnitude estimates (ratio scaling instructions) of the verbal anchor word meanings, the resulting scale values are thought by some to represent ratio scale data (Stevens, 1971). If true, one could make valid statements such as “this product was liked twice as much as that one”. The LMS and LAM produce data similar to that from magnitude estimation (Green et al., 1993, Schutz and Cardello, 2001).
- (2)
Because the scales had a high end anchor of “strongest imaginable” for the LMS or greatest imaginable like (or dislike) for the LAM, subjects in these scaling studies might have a similar idea of the intensity of the experience suggested by those phrases, and thus be placed on the same subjective scale. This was based on an argument by Borg (1982), who, for example, in studying perceived effort or exertion, thought that exerting oneself maximally, i.e., to the point of exhaustion, should be a similar experience among different people. This assumption was later challenged (Bartoshuk et al., 2002).
- (3)
Because the scales have commonly understood labels (weak, moderate, strong), the data could be interpreted in light of these labels, unlike magnitude estimation, which produced scale data based on proportions (i.e., one stimulus was twice as strong as another), but with no absolute anchor for whether these experience were weak or strong (one could be twice the other but both could be weak).
Regarding food acceptability testing, the question arises as to whether the LAM scale provides any advantages over the commonly used 9-point hedonic scale. An important criterion is whether one scale is better at finding differences among products (see, for example, Lawless & Malone, 1986). In the original set of studies (Schutz & Cardello, 2001) performance of the LAM scale and the 9-point scales were very similar in this regard. Two direct comparisons were conducted, one involving 51 food names and one involving five foods that were actually tasted. Correlations between the mean values obtained on the two scales were +0.99 for the 51 food names and +0.98 for the tasted foods. Statistical differentiation was almost equivalent in both cases. Analysis of variance for the tasted foods showed 27.6% of variance accounted for by the food differences for the 9-point scales and 26.6% for the LAM. For the food names, there were 467 pairs of means (out of 1275 possible comparisons) that were significantly different for the LAM scale and 459 for the hedonic scale (not significantly different by binomial test on proportions). The only appreciable difference that was found was in an examination of foods that scored above the overall mean across products, i.e., well-liked foods, and considering only those pairings in which one scale showed a difference but the other did not (87 possible pairs, suggesting that about 43 would be above the mean). In these specific cases, the LAM scale was responsible for 86% of the differences (37 out of 43). For foods below the grand mean the split was about even. The higher end of the scale range was used more frequently with the LAM scale, consistent with the idea that it might be valuable for differentiating well-liked foods.
The performance of the two scales has been evaluated in several other direct comparisons. Greene, Bratka, Drake, and Sanders (2006) examined consumers’ reactions to peanuts with fruity-fermented flavor defects. The 9-point hedonic scale only uncovered one significant pair of differences, whereas the LAM scale showed four pairs of significant differences (out of 12 possible). Hein, Jaeger, Carr, and Delahunty (2008) performed a comparison of the 9-point, LAM, a line scale, ranking and best–worst scaling in a replicated test of breakfast bars with large groups of consumers. Best–worst is a variation of choice/ranking whose analyses yield scale values. Among the other three true scaling methods, the first replication showed similar discrimination (similar F-ratios) for the LAM, line scale and 9-point, but the 9-point had a much higher F-ratio on the second replicate and showed more paired differences among means. El Dine and Olabi (2009) found similar performance of the LAM and 9-point scale in differentiating a set of both familiar and novel foods, with the LAM scale differentiating better among the three highest rated items, a finding in line with the original observation of Schutz and Cardello (2001).
Another criterion for comparing scales concerns the ability of the scale to detect different patterns of preference in consumer segments. Recently, Villanueva and Da Silva (2009) compared the traditional 9-point hedonic scale to a hedonic line scale, which was called a “hybrid” scale, previously studied by this group (Villanueva, Petenate, & Da Silva, 2005). The authors introduced a potentially important criterion for comparing the effectiveness of hedonic scales that has rarely been used: the segmentation of consumers as shown by internal preference mapping. They concluded that the hybrid scale has superior properties in terms of its ability to uncover segments of consumers in an evaluation of eight red wines. Such a comparison has not been made between the 9-point scale and the LAM scale. Another useful relationship is between acceptance ratings and consumer preferences in the real world. Although there are many reasons why stated usage might not correspond to liking ratings (for example, I might prefer a certain style of potato chip but its cost might discourage me from frequent purchases), one would expect at least a moderate correlation across a group of individuals from different preference segments.
Given the relatively few direct comparisons of the LAM scale to the 9-point scale, sensory professionals might be cautious in substituting the newer LAM for the 9-point hedonic scale, an industry standard. However, there are some hints in the literature that an expanded of 11-point scale could be useful for measuring product acceptability. Peryam (1989) in his reflections on the early days of sensory science, offered the following observation:
“Why does the hedonic scale have nine categories, rather than more or less? Economy perhaps? Preliminary investigation had shown that discrimination between foods and reliability tended to increase up to eleven categories, but we encountered, in addition to the dearth of appropriate adverbs, a mechanical problem due to equipment limitations. Official government paper was only 8″ wide and we found that typing eleven categories horizontally was not possible. So we sacrificed a theoretical modicum of precision for a real improvement in efficiency at the moment.” p. 23.
He further pointed to a potential advantage to having more room for positive evaluations on a scale as follows: “An 8-point unbalanced scale with more “like” than “dislike” categories was shown to be somewhat better than the standard 9-point one, but only when dealing with relatively well-liked foods.” p. 24.
Thus there is a need for further evaluation of the performance of the LAM and 9-point scales in head to head comparisons over different products and conditions. The LAM scale has potential advantages, with greater room for more extreme responses than the 9-point scale. It is not clear whether the added phrases are key or whether the added line length is important as well. To see whether the line itself made any difference we included a scale with the LAM’s verbal phrases only, similar to the original portrayal of the 9-point hedonic scale (Peryam & Girardot, 1952). The main objective of the study was to compare the scales using four criteria: (1) the ability to differentiate products, (2) the ability to differentiate consumer segments, (3) relation of acceptance scores to consumption choices, and (4) reliability. Representative products of four product categories were used in this study.
Section snippets
Participants
A total of 302 consumers completed the study, 99 using the LAM scale, 103 using the 11-point category scale and 100 using the 9-point category scale. Consumers were recruited from Peryam and Kroll (Chicago, IL) databases in four cities, of persons available for testing and represented a range of ages (37% from 18 to 35, 43 % from 36 to 55, and 20% from 56 to 65 years of age) and both genders equally (50.5% male and 49.5% female). None of the three groups deviated from these overall percentages
Product discrimination
The overall pattern of results was that the scales worked about equally well. All three methods were able to differentiate the chips, cookies and orange juice products with a high degree of statistical significance. None of the scales were able to differentiate the ice creams, which had roughly equal and high acceptability. It is possible that this failure was due to the ice creams being presented last and that some fatigue had set in. Table 1 shows the F-ratios, intraclass correlations,
Discussion
To our knowledge, this is the first large-scale consumer study comparing the LAM and 9-point hedonic scales in several different product systems. A further advantage of this study is the between-groups comparison a design also adopted by Greene et al., 2006, Hein et al., 2008, El Dine and Olabi, 2009. Thus each participant only used one type of scale and was not influenced by recent experience with another scale type. Early studies of the LAM used within-subjects comparisons but people received
Acknowledgement
The authors thank Terry Mongoven for assistance in supervising the field study.
References (30)
- et al.
Long-term acceptability and choice of teas differing in sweetness
Food Quality and Preference
(2007) - et al.
Influence of sweetness on the sensory-specific satiety and long-term acceptability of tea
Food Quality and Preference
(2007) - et al.
Understanding the role cross-modal sensory interactions play in food acceptability in younger and older consumers
Food Quality and Preference
(2004) - et al.
Comparison of five common acceptance and preference methods
Food Quality and Preference
(2008) - et al.
Direct and indirect hedonic scaling methods: A comparison of the labeled affective magnitude (LAM) scale and best–worst scaling
Food Quality and Preference
(2009) - et al.
Sweet taste preference are partly genetically determined: Identification of a trait locus on chromosome 161–3
American Journal of Clinical Nutrition
(2007) - et al.
Performance of the nine-point hedonic, hybrid and self-adjusting scales in the generation of internal preference maps
Food Quality and Preference
(2009) - et al.
Comparative performance of the hybrid hedonic scale as compared to the traditional hedonic, self-adjusting and ranking scales
Food Quality and Preference
(2005) - et al.
Measurement of specific anosmia
Perceptual and Motor Skills
(1968) - et al.
Fundamentals of scaling and psychophysics
(1978)