A comparison of the labeled magnitude (LAM) scale, an 11-point category scale and the traditional 9-point hedonic scale

https://doi.org/10.1016/j.foodqual.2009.06.009Get rights and content

Abstract

Schutz and Cardello [Schutz, H. G. & Cardello, A. V. (2001). A labeled affective magnitude (LAM) scale for assessing food liking/disliking. Journal of Sensory Studies, 16, 117–159] proposed the labeled magnitude (LAM) scale for measuring food acceptance. The LAM is a line scale anchored at its end points with the phrases “greatest imaginable like” and “greatest imaginable dislike” and uses as intermediate anchors the nine phrases of the traditional hedonic scale. In this study, three hedonic scales were compared, including the widely-used 9-point hedonic scale, the LAM scale, and an 11-point category scale using the LAM’s verbal anchors as category labels. Three groups of consumers (N = about 100 each) used one of the three scales to evaluate the acceptability of highly liked foods (orange juices, potato chips, cookies, and ice cream, with four samples of each). Scales were evaluated primarily on their ability to show differences in acceptability, the correspondence of acceptance ratings to preference ranking and the correspondence of stated product usage (e.g., purchase of pulp vs. non-pulp orange juice) to the product scoring highest. All three scales performed equally well, with no one scale showing a consistent superiority over another. All three scales were able to differentiate acceptability of the orange juices, chips and cookies. No scale differentiated among the ice creams, which had equal and high acceptability. All scales showed a strong correspondence between liking and preference rankings and also between the product rated highest and the type of product usually consumed, within each of the product categories.

Introduction

The labeled affective magnitude scale (LAM, Fig. 1) was developed by Schutz and Cardello (2001) as an alternative to the commonly used 9-point category scale for measuring food acceptability (Jones et al., 1955, Peryam and Girardot, 1952, Peryam and Pilgrim, 1957). The LAM scale was an extension of the labeled magnitude scale (LMS) for psychophysical intensity scaling developed by Green, Shaffer, and Gilmore (1993), based on earlier work by Borg (1982) for a so-called “category–ratio scale”. The LAM scale has been used recently for evaluation of consumer liking for teas (Chung and Vickers, 2007a, Chung and Vickers, 2007b), to study the genetic factors in sweet taste perception (Keskitalo et al., 2007) and in a comparison of young and older person’s liking for different orange juices (Forde & Delahunty, 2004). Recently, Jaeger and Cardello (2009) compared the LAM to best–worst scaling and found approximate parity in discrimination. The LMS and LAM scales may have the following properties:

  • (1)

    Because the scales were based on magnitude estimates (ratio scaling instructions) of the verbal anchor word meanings, the resulting scale values are thought by some to represent ratio scale data (Stevens, 1971). If true, one could make valid statements such as “this product was liked twice as much as that one”. The LMS and LAM produce data similar to that from magnitude estimation (Green et al., 1993, Schutz and Cardello, 2001).

  • (2)

    Because the scales had a high end anchor of “strongest imaginable” for the LMS or greatest imaginable like (or dislike) for the LAM, subjects in these scaling studies might have a similar idea of the intensity of the experience suggested by those phrases, and thus be placed on the same subjective scale. This was based on an argument by Borg (1982), who, for example, in studying perceived effort or exertion, thought that exerting oneself maximally, i.e., to the point of exhaustion, should be a similar experience among different people. This assumption was later challenged (Bartoshuk et al., 2002).

  • (3)

    Because the scales have commonly understood labels (weak, moderate, strong), the data could be interpreted in light of these labels, unlike magnitude estimation, which produced scale data based on proportions (i.e., one stimulus was twice as strong as another), but with no absolute anchor for whether these experience were weak or strong (one could be twice the other but both could be weak).

Regarding food acceptability testing, the question arises as to whether the LAM scale provides any advantages over the commonly used 9-point hedonic scale. An important criterion is whether one scale is better at finding differences among products (see, for example, Lawless & Malone, 1986). In the original set of studies (Schutz & Cardello, 2001) performance of the LAM scale and the 9-point scales were very similar in this regard. Two direct comparisons were conducted, one involving 51 food names and one involving five foods that were actually tasted. Correlations between the mean values obtained on the two scales were +0.99 for the 51 food names and +0.98 for the tasted foods. Statistical differentiation was almost equivalent in both cases. Analysis of variance for the tasted foods showed 27.6% of variance accounted for by the food differences for the 9-point scales and 26.6% for the LAM. For the food names, there were 467 pairs of means (out of 1275 possible comparisons) that were significantly different for the LAM scale and 459 for the hedonic scale (not significantly different by binomial test on proportions). The only appreciable difference that was found was in an examination of foods that scored above the overall mean across products, i.e., well-liked foods, and considering only those pairings in which one scale showed a difference but the other did not (87 possible pairs, suggesting that about 43 would be above the mean). In these specific cases, the LAM scale was responsible for 86% of the differences (37 out of 43). For foods below the grand mean the split was about even. The higher end of the scale range was used more frequently with the LAM scale, consistent with the idea that it might be valuable for differentiating well-liked foods.

The performance of the two scales has been evaluated in several other direct comparisons. Greene, Bratka, Drake, and Sanders (2006) examined consumers’ reactions to peanuts with fruity-fermented flavor defects. The 9-point hedonic scale only uncovered one significant pair of differences, whereas the LAM scale showed four pairs of significant differences (out of 12 possible). Hein, Jaeger, Carr, and Delahunty (2008) performed a comparison of the 9-point, LAM, a line scale, ranking and best–worst scaling in a replicated test of breakfast bars with large groups of consumers. Best–worst is a variation of choice/ranking whose analyses yield scale values. Among the other three true scaling methods, the first replication showed similar discrimination (similar F-ratios) for the LAM, line scale and 9-point, but the 9-point had a much higher F-ratio on the second replicate and showed more paired differences among means. El Dine and Olabi (2009) found similar performance of the LAM and 9-point scale in differentiating a set of both familiar and novel foods, with the LAM scale differentiating better among the three highest rated items, a finding in line with the original observation of Schutz and Cardello (2001).

Another criterion for comparing scales concerns the ability of the scale to detect different patterns of preference in consumer segments. Recently, Villanueva and Da Silva (2009) compared the traditional 9-point hedonic scale to a hedonic line scale, which was called a “hybrid” scale, previously studied by this group (Villanueva, Petenate, & Da Silva, 2005). The authors introduced a potentially important criterion for comparing the effectiveness of hedonic scales that has rarely been used: the segmentation of consumers as shown by internal preference mapping. They concluded that the hybrid scale has superior properties in terms of its ability to uncover segments of consumers in an evaluation of eight red wines. Such a comparison has not been made between the 9-point scale and the LAM scale. Another useful relationship is between acceptance ratings and consumer preferences in the real world. Although there are many reasons why stated usage might not correspond to liking ratings (for example, I might prefer a certain style of potato chip but its cost might discourage me from frequent purchases), one would expect at least a moderate correlation across a group of individuals from different preference segments.

Given the relatively few direct comparisons of the LAM scale to the 9-point scale, sensory professionals might be cautious in substituting the newer LAM for the 9-point hedonic scale, an industry standard. However, there are some hints in the literature that an expanded of 11-point scale could be useful for measuring product acceptability. Peryam (1989) in his reflections on the early days of sensory science, offered the following observation:

“Why does the hedonic scale have nine categories, rather than more or less? Economy perhaps? Preliminary investigation had shown that discrimination between foods and reliability tended to increase up to eleven categories, but we encountered, in addition to the dearth of appropriate adverbs, a mechanical problem due to equipment limitations. Official government paper was only 8″ wide and we found that typing eleven categories horizontally was not possible. So we sacrificed a theoretical modicum of precision for a real improvement in efficiency at the moment.” p. 23.

He further pointed to a potential advantage to having more room for positive evaluations on a scale as follows: “An 8-point unbalanced scale with more “like” than “dislike” categories was shown to be somewhat better than the standard 9-point one, but only when dealing with relatively well-liked foods.” p. 24.

Thus there is a need for further evaluation of the performance of the LAM and 9-point scales in head to head comparisons over different products and conditions. The LAM scale has potential advantages, with greater room for more extreme responses than the 9-point scale. It is not clear whether the added phrases are key or whether the added line length is important as well. To see whether the line itself made any difference we included a scale with the LAM’s verbal phrases only, similar to the original portrayal of the 9-point hedonic scale (Peryam & Girardot, 1952). The main objective of the study was to compare the scales using four criteria: (1) the ability to differentiate products, (2) the ability to differentiate consumer segments, (3) relation of acceptance scores to consumption choices, and (4) reliability. Representative products of four product categories were used in this study.

Section snippets

Participants

A total of 302 consumers completed the study, 99 using the LAM scale, 103 using the 11-point category scale and 100 using the 9-point category scale. Consumers were recruited from Peryam and Kroll (Chicago, IL) databases in four cities, of persons available for testing and represented a range of ages (37% from 18 to 35, 43 % from 36 to 55, and 20% from 56 to 65 years of age) and both genders equally (50.5% male and 49.5% female). None of the three groups deviated from these overall percentages

Product discrimination

The overall pattern of results was that the scales worked about equally well. All three methods were able to differentiate the chips, cookies and orange juice products with a high degree of statistical significance. None of the scales were able to differentiate the ice creams, which had roughly equal and high acceptability. It is possible that this failure was due to the ice creams being presented last and that some fatigue had set in. Table 1 shows the F-ratios, intraclass correlations,

Discussion

To our knowledge, this is the first large-scale consumer study comparing the LAM and 9-point hedonic scales in several different product systems. A further advantage of this study is the between-groups comparison a design also adopted by Greene et al., 2006, Hein et al., 2008, El Dine and Olabi, 2009. Thus each participant only used one type of scale and was not influenced by recent experience with another scale type. Early studies of the LAM used within-subjects comparisons but people received

Acknowledgement

The authors thank Terry Mongoven for assistance in supervising the field study.

References (30)

  • L.M. Bartoshuk et al.

    Labeled scales (e.g. category, Likert, VAS) and invalid across-group comparisons: What we have learned from genetic variation in taste

    Food Quality and Preference

    (2002)
  • A.W. Bendig et al.

    Effect of number of verbal anchoring and number of rating scale categories upon transmitted information

    Journal of Experimental Psychology

    (1953)
  • G. Borg

    A category scale with ratio properties for intermodal and interindividual comparisons

  • A. Cardello et al.

    Effects of extreme anchors and interior label spacing on labeled magnitude scales

    Food Quality and Preference

    (2008)
  • A.V. Cardello et al.

    Research note. Numerical scale-point locations for constructing the LAM (labeled affective magnitude) scale

    Journal of Sensory Studies

    (2004)
  • Cited by (0)

    View full text