Elsevier

The Leadership Quarterly

Volume 23, Issue 1, February 2012, Pages 132-145
The Leadership Quarterly

Impact of rater personality on transformational and transactional leadership ratings

https://doi.org/10.1016/j.leaqua.2011.11.011Get rights and content

Abstract

This study addresses the role of rater personality in ratings of transformational and transactional leadership. In a naturalistic field study, we found that rater personality (i.e., agreeableness, openness, extraversion, and conscientiousness) was positively associated with ratings of transformational leadership, but significant rater personality effects were not found in an experimental study where leadership behavior was invariant. These results suggest that disagreements among raters about leaders' behaviors are not due solely to random error and may instead reflect true differences either in (a) the behaviors leaders exhibit toward individual followers or (b) personality-related differences between followers in attention to and recall of leadership behaviors. We also found that personality (of subordinates and peers) was not randomly distributed across leaders, though clustering effects were generally small. Practically, our results suggest that (a) individual reports of leadership may be better at predicting leadership outcomes than aggregated group reports – especially those related to individual attitudes and behaviors – though they are rarely used in the literature; (b) aggregation is complicated because rater personality is associated with leadership ratings and is not randomly distributed across leaders; and (c) corrections for measurement error based on inter-rater agreement may not be appropriate due to non-random unique rater variance.

Introduction

Across the organizational sciences, low agreement has been found when multiple individuals rate characteristics or behaviors of another individual. Only modest agreement is found among ratings of personality (Funder, 2001) and job performance (Murphy, 2008, Ones et al., 2008). Scullen, Mount, and Goff (2000) showed that more variance in multi-source ratings of managerial job performance could be attributed to rater idiosyncrasies than to either the individuals being rated or their organizational roles (e.g., direct report, peer, supervisor). Atwater and Yammarino (1992) report low agreement on leadership ratings between managers and their bosses and subordinates. Murphy and DeShon (2000) identified this unexplained rater variance as an important issue for researchers to address.

In the leadership domain, the focus of agreement research has been primarily on two issues. First, more than 30 years ago, Graen and colleagues (i.e., leader–member exchange [LMX]; Dansereau et al., 1973, Dansereau et al., 1975, Liden and Graen, 1980) provided convincing evidence that subordinates differed in their perceptions of leadership and that unique subordinate perceptions represented valid variance, providing information about the quality of the relationship (LMX) between managers and subordinates. Subsequently, there has been some research aimed at understanding the antecedents of LMX, much of it focused on similarity between leader and follower (e.g., Phillips & Bedeian, 1994). Second, there is a well-developed literature on self-other agreement in leadership ratings. Most of this literature has focused on the effects of self-other agreement on leadership effectiveness (e.g., Atwater et al., 1998, Atwater and Yammarino, 1992), but Atwater and her colleagues have also examined the antecedents of agreement (and disagreement). Ostroff, Atwater, and Feinberg (2004) examined rater and ratee characteristics that propel agreement (i.e., being female, Caucasian, younger, having less experience, and having higher education) and Atwater, Wang, Smither, and Fleenor (2009) discovered that cultural characteristics such as assertiveness and power distance also account for the extent to which leaders and others agree in ratings of leadership behavior.

Both of these lines of research have made valuable contributions to our understanding of leadership ratings, but neither provides us with adequate insight into the source of unique rater variance. Without clarification about the nature of unique variance in ratings, disagreement among raters has and will continue to plague researchers in the leadership domain (e.g. Yammarino, Spangler, & Dubinsky, 1998). Leadership scholars make choices about whether to aggregate leadership ratings obtained from multiple observers, and about how and when to correct observed correlations for measurement error, a common practice in meta-analyses (e.g., Bono and Judge, 2004, Judge and Piccolo, 2004). Such decisions are based, at least in part, on assumptions a researcher makes about the source of disagreements between raters because each source of rater disagreement (e.g., random error, systematic rater biases, or real differences in leader behavior) carries a different normative implication for aggregation and correction decisions. If rater disagreements are predominantly random measurement error, it is sensible to aggregate multiple rating because a group mean would represent the best estimate of a leader's true behavior. Even if rater disagreements are not random, but the source of the unique variance (e.g., rater personality) were randomly distributed across leaders, then research based on aggregated leadership ratings would provide the greatest degree of generalizability, though predictive validity might be improved by using individual ratings. Moreover, aggregated ratings can be useful in developing an understanding of leader characteristics generally associated with certain leadership behaviors or with leadership effectiveness (Bono and Judge, 2004, Judge et al., 2002). Problems arise, however, if a substantial proportion of variance in rater disagreement is systematic or if the source of non-random variance is not randomly distributed across leaders. Then aggregation and correction decisions become more complex (Murphy, 2008, Murphy and DeShon, 2000, Schmidt et al., 2000), and corrections based on inter-rater agreement may lead to substantial overestimates of associations between leader characteristics and behaviors (e.g., Bono & Judge, 2004) and associations between leader behavior and outcomes (e.g., Judge & Piccolo, 2004). Furthermore, if disagreements among raters reflect true differences in leader behavior, rating variance that has been typically treated as random measurement error may actually have incremental predictive validity, as has been found in the LMX literature. Even if differences among raters represent only perceptions and not true differences in leader behavior, individual ratings may have incremental predictive validity because perceptions of a leader's behavior are expected to directly influence rater attitudes and behavior. In such cases, true associations between leadership and outcomes would be underestimated with aggregation. Several statistical techniques (e.g., rwg, WABA, ICC; see Bliese, 2000) aid researchers in determining the magnitude of agreement (or disagreement) among raters, but there has been little systematic research focused on the source and nature of such disagreement, including whether it is random or systematic, whether or not it is linked to rater characteristics, or whether or not it has predictive validity. We aim to fill this gap in the literature by directly addressing the notion of whether or not rater variance is random and whether or not it is randomly distributed across leaders.

Accordingly, the purpose of our studies is to examine rater personality as a potential source of non-random variance in transformational leadership ratings. A key contribution of these studies is their explicit focus on non-random sources of variance across lab and field settings. Ones et al. (2008) call for more field studies in rating research, noting that most rating research is implemented in lab contexts. If systematic, predictable rater effects are found, our results have important implications for decisions about aggregating transformational leadership ratings from multiple individuals to form a single score for each leader, and for decisions about whether to correct observed correlations for measurement error based on inter-rater reliability. Furthermore, if we find that the source of idiosyncratic ratings (e.g., rater personality) is not randomly distributed across leaders, our results present a new concern for those who wish to aggregate individual ratings for greater generalizability. Finally, if unique rater variance represents either true differences in leader behavior across followers or followers' perceptions of behavior, then individual ratings have the potential to improve prediction of outcomes associated with transformational leadership, though this possibility is rarely considered in the existing research literature. Indeed, of 23 primary studies that measured transformational leadership and were published in Journal of Applied Psychology in the last decade (2000–2010), only two did not aggregate to the group level, and only one examined both individual and aggregated rating (Liao & Chuang, 2007). Few explicitly considered whether individual or group reports of transformational leadership were most appropriate for the research question being addressed.

To examine the link between rater personality and transformational leadership ratings, we use the five-factor model of personality (i.e., agreeableness, extraversion, openness, neuroticism, and conscientiousness) as a comprehensive taxonomy of normal adult personality (Costa & McCrae, 1989). With respect to leadership, we focused on transformational and transactional leadership both because this has been the dominant paradigm for leadership research in recent years (See Bass, 1985, Judge and Piccolo, 2004, for a review.), and because it has been a common practice in this literature to aggregate ratings. Selection of leadership dimensions for examination in the current study was based upon two criteria: 1) relevance of the leadership dimension to important outcome variables (e.g., job satisfaction and performance) and 2) factor structure of the leadership dimensions. With respect to the first criterion, we drew on a meta-analysis (Judge & Piccolo, 2004) that revealed criterion-related validity only for the transformational, contingent reward, and laissez-faire dimensions of the transformational–transactional leadership model. With respect to the second criterion, we drew on studies that support collapsing the four types of transformational behaviors into a single transformational leadership dimension (Awamleh and Gardner, 1999, Bono and Judge, 2003, Carless, 1998, Lim and Ployhart, 2004) and collapsing two transactional dimensions (management by exception-passive and laissez faire) into a single passive leadership dimension (Avolio et al., 1999, Bono and Judge, 2004). Accordingly, we focus on three broad dimensions of leadership behavior: transformational leadership, contingent reward, and passive leadership.

Section snippets

Sources of variance in ratings

Wherry and Bartlett (1982) discuss three general factors that can affect ratings: 1) true ratee (e.g., leader) behaviors, 2) rater (e.g., subordinate) biases, and 3) random measurement error. With respect to true differences in ratee behaviors across subordinates, it is plausible that leader behavior varies across raters if leaders adjust their behavior in response to individual employees (Hoyt, 2000), though we do not directly test this notion in the current study. With respect to rater

Agreeableness

The trait of agreeableness includes the tendency to be cooperative, trusting, compliant, and kind (Costa & McCrae, 1989). Consistent with past research, we expect agreeableness to have a systematic impact on ratings via a leniency bias; Bernardin et al. (2000) found that rater agreeableness was associated with elevated ratings (r = 0.33) of others' academic performance in a class exercise. We expect that agreeableness and leniency will operate similarly in leadership ratings, such that more

Participants and procedure

Participants were drawn from a pool of 192 leaders enrolled in leadership development programs linked to a large public university that were held at various locations throughout the U.S. There were participants from both small businesses and Fortune 500 corporations; private and public organizations; industries ranging from manufacturing, technology, service, and government; and management jobs in areas ranging from sales to accounting and engineering. As part of the development program,

Results

Table 1 presents means, standard deviations, and intercorrelations among the variables. Results reveal numerous associations among the five personality traits and leadership behaviors (14 of 15 associations are significant). However, caution is advised in interpreting these correlations relative to our hypotheses because they represent the association between rater personality and leadership ratings across all leaders, but raters are nested within leaders. For this reason we used random

Participants and procedure

Two hundred fourteen undergraduate students in three sections of introductory I–O psychology course taught by either the first or second authors (100% of the students present that day and 81% of the students enrolled in the course) participated in this study as part of normal course activities. Students in the course were 51% male and, on average, 21 years old. Early in the course, students were asked to complete a personality inventory, also as part of normal course activities. Later in the

Results

Means, standard deviations, correlations and scale reliabilities are reported in Table 4. As expected, the mean rating for transformational leadership is higher in Study 2 (M = 3.77 in Study 1 and M = 4.17 in Study 2 for the 10 matched items). More to the point of the study, we found reduced variability in ratings of transformational leadership in our lab study data (SD = 0.43) as compared to what is commonly found in field studies of transformational leadership using the full MLQ (e.g., SD = 0.81 in

Discussion

An ongoing issue in leadership research is the lack of agreement among individual raters about leader behavior. Our primary concern in this research was to better understand possible non-random sources of unique rater variance, for the purpose of aiding researchers in making (1) better decisions about when to aggregate, (2) what sorts of generalizations are appropriate from aggregated ratings, and (3) whether or not aggregated ratings should be corrected for measurement error using inter-rater

Future research

What these studies cannot tell us is whether leaders differ in their behaviors with subordinates, based on personality, in a naturalistic work setting, as that would require extensive examination of leader behavior across multiple followers over time. Indeed, even such invasive research may not provide an unbiased answer to the question of if, or how, rater personality affects leader behavior because the process of observation itself may influence leader behavior. Nonetheless, by demonstrating

Limitations and strengths

This study makes a unique contribution by demonstrating the non-random nature of leadership rating disagreements, and as such, provides important practical implications for researchers who must make decisions about aggregation and correction for measurement error. Like all studies, it is limited in several ways. First, we were unable to determine with certainty why (e.g., leniency, ratee behaviors, or differential recall) rater traits are linked to leadership ratings. We note several possible

Conclusion

Our findings highlight the importance of considering non-random, trait-linked rater effects when making decisions about whether to use individual or aggregated reports of leadership behavior, and whether and how to correct observed correlations for measurement unreliability. Moreover, our findings lay the groundwork for future research aimed at determining both the mechanisms by which rater personality affects leadership ratings, and the mechanism by which rater personality comes to be

References (71)

  • F.J. Yammarino et al.

    Transformational and contingent reward leadership: Individual, dyad, and group levels of analysis

    The Leadership Quarterly

    (1998)
  • G. Yukl

    An evaluation of conceptual weaknesses in transformational and charismatic leadership theories

    The Leadership Quarterly

    (1999)
  • L.E. Atwater et al.

    Self-other agreement: Does it really matter?

    Personnel Psychology

    (1998)
  • L. Atwater et al.

    Are cultural characteristics associated with the relationship between self and others' ratings of leadership?

    Journal of Applied Psychology

    (2009)
  • L.E. Atwater et al.

    Does self-other agreement on leadership perceptions moderate the validity of leadership and performance predictions?

    Personnel Psychology

    (1992)
  • B.J. Avolio et al.

    Re-examining the components of transformational and transactional leadership using the Multifactor Leadership Questionnaire

    Journal of Occupational and Organizational Psychology

    (1999)
  • J. Barling et al.

    Development and test of a model linking safety-specific transformational leadership and occupational safety

    Journal of Applied Psychology

    (2002)
  • M.R. Barrick et al.

    The Big Five personality dimensions and job performance: A meta-analysis

    Personnel Psychology

    (1991)
  • L.K. Bartells et al.

    Assessing the assessor: The relationship of assessor personality to leniency in assessment center ratings

    Journal of Social Behavior & Personality

    (1997)
  • H.J. Bernardin et al.

    Conscientiousness and agreeableness as predictors of rating leniency

    Journal of Applied Psychology

    (2000)
  • P.D. Bliese

    Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis

  • J.E. Bono et al.

    Self-concordance at work: Toward understanding the motivational effects of transformational leaders

    Academy of Management Journal

    (2003)
  • J.E. Bono et al.

    Personality and transformational and transactional leadership: A meta-analysis

    Journal of Applied Psychology

    (2004)
  • W.C. Borman

    Individual differences correlates of accuracy in evaluating others' performance effectiveness

    Applied Psychological Measurement

    (1979)
  • W.C. Borman et al.

    Observation accuracy for assessors of work–sample performance: Consistency across task and individual-differences correlates

    Journal of Applied Psychology

    (1991)
  • M.D. Botwin

    Review of the revised NEO Personality Inventory

  • S.A. Carless

    Assessing the discriminant validity of transformational leader behavior as measured by the MLQ

    Journal of Occupational and Organizational Psychology

    (1998)
  • W.F. Cascio et al.

    Behaviorally anchored rating scores: Effects of education and job experience of raters and ratees

    Journal of Applied Psychology

    (1977)
  • P.T. Costa et al.

    The NEO-PI/NEO-FFI manual supplement

    (1989)
  • L.J. Cronbach et al.

    The dependability of behavioral measurements: Theory of generalizability scores and profiles

    (1972)
  • D.V. Day et al.

    Self-monitoring personality at work: A meta-analytic investigation of construct validity

    Journal of Applied Psychology

    (2002)
  • P.B. Elmore et al.

    Effect of teacher sex, student sex, and teacher warmth on the evaluation of college instructors

    Journal of Educational Psychology

    (1975)
  • D.C. Funder

    Personality

    Annual Review of Psychology

    (2001)
  • L.R. Goldberg

    A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models

  • W.K. Hofstee et al.

    Integration of the Big Five and circumplex approaches to trait structure

    Journal of Personality and Social Psychology

    (1992)
  • Cited by (52)

    • Eye gaze and visual attention as a window into leadership and followership: A review of empirical insights and future directions

      2023, Leadership Quarterly
      Citation Excerpt :

      Second, conclusions may be skewed because of the reliance on perceptual measures. For example, rater perceptions of the distribution of leadership within a group may be biased by inter-individual rater cognitive and personality differences, contextual factors such as culture and length of working relationship among group members, or familiarity (or lack of naiveté) with the scale items deployed (Bono et al., 2012; Hansbrough et al., 2015; Hunter et al., 2007; Yammarino & Atwater, 1993). Consider, for example, findings revealing that observers who like or trust their leader provide systematically biased (i.e., overly positive) ratings of leader effectiveness and behavior (Brown & Keeping, 2005).

    • Observing leadership as behavior in teams and herds – An ethological approach to shared leadership research

      2020, Leadership Quarterly
      Citation Excerpt :

      Applying peer ratings of more specific sets of leadership behaviors to quantify leadership, e.g. by using the LBDQ (Stogdill, 1963) does not solve the issues of rater cognition and information processing (Lord & Maher, 1993). Factors such as ascribed social status of the target by the rater (Shollen & Brunner, 2016), rater characteristics (Bono, Hooper, & Yoon, 2012; Schyns & Felfe, 2006; Schyns & Sanders, 2007), perceived similarity and liking affect ratings of behavior (Keller Hansbrough, Lord, & Schyns, 2015). Second, using individual perceptions as the basis for team-level networks of influence changes the meaning of dyadic ties within these networks.

    View all citing articles on Scopus
    View full text