Impact of rater personality on transformational and transactional leadership ratings
Introduction
Across the organizational sciences, low agreement has been found when multiple individuals rate characteristics or behaviors of another individual. Only modest agreement is found among ratings of personality (Funder, 2001) and job performance (Murphy, 2008, Ones et al., 2008). Scullen, Mount, and Goff (2000) showed that more variance in multi-source ratings of managerial job performance could be attributed to rater idiosyncrasies than to either the individuals being rated or their organizational roles (e.g., direct report, peer, supervisor). Atwater and Yammarino (1992) report low agreement on leadership ratings between managers and their bosses and subordinates. Murphy and DeShon (2000) identified this unexplained rater variance as an important issue for researchers to address.
In the leadership domain, the focus of agreement research has been primarily on two issues. First, more than 30 years ago, Graen and colleagues (i.e., leader–member exchange [LMX]; Dansereau et al., 1973, Dansereau et al., 1975, Liden and Graen, 1980) provided convincing evidence that subordinates differed in their perceptions of leadership and that unique subordinate perceptions represented valid variance, providing information about the quality of the relationship (LMX) between managers and subordinates. Subsequently, there has been some research aimed at understanding the antecedents of LMX, much of it focused on similarity between leader and follower (e.g., Phillips & Bedeian, 1994). Second, there is a well-developed literature on self-other agreement in leadership ratings. Most of this literature has focused on the effects of self-other agreement on leadership effectiveness (e.g., Atwater et al., 1998, Atwater and Yammarino, 1992), but Atwater and her colleagues have also examined the antecedents of agreement (and disagreement). Ostroff, Atwater, and Feinberg (2004) examined rater and ratee characteristics that propel agreement (i.e., being female, Caucasian, younger, having less experience, and having higher education) and Atwater, Wang, Smither, and Fleenor (2009) discovered that cultural characteristics such as assertiveness and power distance also account for the extent to which leaders and others agree in ratings of leadership behavior.
Both of these lines of research have made valuable contributions to our understanding of leadership ratings, but neither provides us with adequate insight into the source of unique rater variance. Without clarification about the nature of unique variance in ratings, disagreement among raters has and will continue to plague researchers in the leadership domain (e.g. Yammarino, Spangler, & Dubinsky, 1998). Leadership scholars make choices about whether to aggregate leadership ratings obtained from multiple observers, and about how and when to correct observed correlations for measurement error, a common practice in meta-analyses (e.g., Bono and Judge, 2004, Judge and Piccolo, 2004). Such decisions are based, at least in part, on assumptions a researcher makes about the source of disagreements between raters because each source of rater disagreement (e.g., random error, systematic rater biases, or real differences in leader behavior) carries a different normative implication for aggregation and correction decisions. If rater disagreements are predominantly random measurement error, it is sensible to aggregate multiple rating because a group mean would represent the best estimate of a leader's true behavior. Even if rater disagreements are not random, but the source of the unique variance (e.g., rater personality) were randomly distributed across leaders, then research based on aggregated leadership ratings would provide the greatest degree of generalizability, though predictive validity might be improved by using individual ratings. Moreover, aggregated ratings can be useful in developing an understanding of leader characteristics generally associated with certain leadership behaviors or with leadership effectiveness (Bono and Judge, 2004, Judge et al., 2002). Problems arise, however, if a substantial proportion of variance in rater disagreement is systematic or if the source of non-random variance is not randomly distributed across leaders. Then aggregation and correction decisions become more complex (Murphy, 2008, Murphy and DeShon, 2000, Schmidt et al., 2000), and corrections based on inter-rater agreement may lead to substantial overestimates of associations between leader characteristics and behaviors (e.g., Bono & Judge, 2004) and associations between leader behavior and outcomes (e.g., Judge & Piccolo, 2004). Furthermore, if disagreements among raters reflect true differences in leader behavior, rating variance that has been typically treated as random measurement error may actually have incremental predictive validity, as has been found in the LMX literature. Even if differences among raters represent only perceptions and not true differences in leader behavior, individual ratings may have incremental predictive validity because perceptions of a leader's behavior are expected to directly influence rater attitudes and behavior. In such cases, true associations between leadership and outcomes would be underestimated with aggregation. Several statistical techniques (e.g., rwg, WABA, ICC; see Bliese, 2000) aid researchers in determining the magnitude of agreement (or disagreement) among raters, but there has been little systematic research focused on the source and nature of such disagreement, including whether it is random or systematic, whether or not it is linked to rater characteristics, or whether or not it has predictive validity. We aim to fill this gap in the literature by directly addressing the notion of whether or not rater variance is random and whether or not it is randomly distributed across leaders.
Accordingly, the purpose of our studies is to examine rater personality as a potential source of non-random variance in transformational leadership ratings. A key contribution of these studies is their explicit focus on non-random sources of variance across lab and field settings. Ones et al. (2008) call for more field studies in rating research, noting that most rating research is implemented in lab contexts. If systematic, predictable rater effects are found, our results have important implications for decisions about aggregating transformational leadership ratings from multiple individuals to form a single score for each leader, and for decisions about whether to correct observed correlations for measurement error based on inter-rater reliability. Furthermore, if we find that the source of idiosyncratic ratings (e.g., rater personality) is not randomly distributed across leaders, our results present a new concern for those who wish to aggregate individual ratings for greater generalizability. Finally, if unique rater variance represents either true differences in leader behavior across followers or followers' perceptions of behavior, then individual ratings have the potential to improve prediction of outcomes associated with transformational leadership, though this possibility is rarely considered in the existing research literature. Indeed, of 23 primary studies that measured transformational leadership and were published in Journal of Applied Psychology in the last decade (2000–2010), only two did not aggregate to the group level, and only one examined both individual and aggregated rating (Liao & Chuang, 2007). Few explicitly considered whether individual or group reports of transformational leadership were most appropriate for the research question being addressed.
To examine the link between rater personality and transformational leadership ratings, we use the five-factor model of personality (i.e., agreeableness, extraversion, openness, neuroticism, and conscientiousness) as a comprehensive taxonomy of normal adult personality (Costa & McCrae, 1989). With respect to leadership, we focused on transformational and transactional leadership both because this has been the dominant paradigm for leadership research in recent years (See Bass, 1985, Judge and Piccolo, 2004, for a review.), and because it has been a common practice in this literature to aggregate ratings. Selection of leadership dimensions for examination in the current study was based upon two criteria: 1) relevance of the leadership dimension to important outcome variables (e.g., job satisfaction and performance) and 2) factor structure of the leadership dimensions. With respect to the first criterion, we drew on a meta-analysis (Judge & Piccolo, 2004) that revealed criterion-related validity only for the transformational, contingent reward, and laissez-faire dimensions of the transformational–transactional leadership model. With respect to the second criterion, we drew on studies that support collapsing the four types of transformational behaviors into a single transformational leadership dimension (Awamleh and Gardner, 1999, Bono and Judge, 2003, Carless, 1998, Lim and Ployhart, 2004) and collapsing two transactional dimensions (management by exception-passive and laissez faire) into a single passive leadership dimension (Avolio et al., 1999, Bono and Judge, 2004). Accordingly, we focus on three broad dimensions of leadership behavior: transformational leadership, contingent reward, and passive leadership.
Section snippets
Sources of variance in ratings
Wherry and Bartlett (1982) discuss three general factors that can affect ratings: 1) true ratee (e.g., leader) behaviors, 2) rater (e.g., subordinate) biases, and 3) random measurement error. With respect to true differences in ratee behaviors across subordinates, it is plausible that leader behavior varies across raters if leaders adjust their behavior in response to individual employees (Hoyt, 2000), though we do not directly test this notion in the current study. With respect to rater
Agreeableness
The trait of agreeableness includes the tendency to be cooperative, trusting, compliant, and kind (Costa & McCrae, 1989). Consistent with past research, we expect agreeableness to have a systematic impact on ratings via a leniency bias; Bernardin et al. (2000) found that rater agreeableness was associated with elevated ratings (r = 0.33) of others' academic performance in a class exercise. We expect that agreeableness and leniency will operate similarly in leadership ratings, such that more
Participants and procedure
Participants were drawn from a pool of 192 leaders enrolled in leadership development programs linked to a large public university that were held at various locations throughout the U.S. There were participants from both small businesses and Fortune 500 corporations; private and public organizations; industries ranging from manufacturing, technology, service, and government; and management jobs in areas ranging from sales to accounting and engineering. As part of the development program,
Results
Table 1 presents means, standard deviations, and intercorrelations among the variables. Results reveal numerous associations among the five personality traits and leadership behaviors (14 of 15 associations are significant). However, caution is advised in interpreting these correlations relative to our hypotheses because they represent the association between rater personality and leadership ratings across all leaders, but raters are nested within leaders. For this reason we used random
Participants and procedure
Two hundred fourteen undergraduate students in three sections of introductory I–O psychology course taught by either the first or second authors (100% of the students present that day and 81% of the students enrolled in the course) participated in this study as part of normal course activities. Students in the course were 51% male and, on average, 21 years old. Early in the course, students were asked to complete a personality inventory, also as part of normal course activities. Later in the
Results
Means, standard deviations, correlations and scale reliabilities are reported in Table 4. As expected, the mean rating for transformational leadership is higher in Study 2 (M = 3.77 in Study 1 and M = 4.17 in Study 2 for the 10 matched items). More to the point of the study, we found reduced variability in ratings of transformational leadership in our lab study data (SD = 0.43) as compared to what is commonly found in field studies of transformational leadership using the full MLQ (e.g., SD = 0.81 in
Discussion
An ongoing issue in leadership research is the lack of agreement among individual raters about leader behavior. Our primary concern in this research was to better understand possible non-random sources of unique rater variance, for the purpose of aiding researchers in making (1) better decisions about when to aggregate, (2) what sorts of generalizations are appropriate from aggregated ratings, and (3) whether or not aggregated ratings should be corrected for measurement error using inter-rater
Future research
What these studies cannot tell us is whether leaders differ in their behaviors with subordinates, based on personality, in a naturalistic work setting, as that would require extensive examination of leader behavior across multiple followers over time. Indeed, even such invasive research may not provide an unbiased answer to the question of if, or how, rater personality affects leader behavior because the process of observation itself may influence leader behavior. Nonetheless, by demonstrating
Limitations and strengths
This study makes a unique contribution by demonstrating the non-random nature of leadership rating disagreements, and as such, provides important practical implications for researchers who must make decisions about aggregation and correction for measurement error. Like all studies, it is limited in several ways. First, we were unable to determine with certainty why (e.g., leniency, ratee behaviors, or differential recall) rater traits are linked to leadership ratings. We note several possible
Conclusion
Our findings highlight the importance of considering non-random, trait-linked rater effects when making decisions about whether to use individual or aggregated reports of leadership behavior, and whether and how to correct observed correlations for measurement unreliability. Moreover, our findings lay the groundwork for future research aimed at determining both the mechanisms by which rater personality affects leadership ratings, and the mechanism by which rater personality comes to be
References (71)
- et al.
Individual differences in optimism predict the recall of personally relevant information
Personality and Individual Differences
(2007) - et al.
Perceptions of leader charisma and effectiveness: The effects of vision content, delivery, and organizational performance
The Leadership Quarterly
(1999) Leadership: Good, better, best
Organizational Dynamics
(1985)- et al.
Instrumentality theory and equity theory as complementary approaches in predicting the relationship of leadership and turnover among managers
Organizational Behavior and Human Performance
(1973) - et al.
A vertical dyad linkage approach to leadership within formal organizations: A longitudinal investigation of the role making process
Organizational Behavior and Human Performance
(1975) - et al.
Follower developmental characteristics as predicting transformational leadership: A longitudinal field study
The Leadership Quarterly
(2003) Images of the familiar: Individual differences and implicit leadership theories
The Leadership Quarterly
(1999)- et al.
Effectiveness correlates of transformational and transactional leadership: A meta-analytic review of the MLQ literature
The Leadership Quarterly
(1996) Generalizability theory
- et al.
The sensitivity to punishment and sensitivity to reward questionnaire (SPSRQ) as a measure of Gray's anxiety and impulsivity dimensions
Personality and Individual Differences
(2001)
Transformational and contingent reward leadership: Individual, dyad, and group levels of analysis
The Leadership Quarterly
An evaluation of conceptual weaknesses in transformational and charismatic leadership theories
The Leadership Quarterly
Self-other agreement: Does it really matter?
Personnel Psychology
Are cultural characteristics associated with the relationship between self and others' ratings of leadership?
Journal of Applied Psychology
Does self-other agreement on leadership perceptions moderate the validity of leadership and performance predictions?
Personnel Psychology
Re-examining the components of transformational and transactional leadership using the Multifactor Leadership Questionnaire
Journal of Occupational and Organizational Psychology
Development and test of a model linking safety-specific transformational leadership and occupational safety
Journal of Applied Psychology
The Big Five personality dimensions and job performance: A meta-analysis
Personnel Psychology
Assessing the assessor: The relationship of assessor personality to leniency in assessment center ratings
Journal of Social Behavior & Personality
Conscientiousness and agreeableness as predictors of rating leniency
Journal of Applied Psychology
Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis
Self-concordance at work: Toward understanding the motivational effects of transformational leaders
Academy of Management Journal
Personality and transformational and transactional leadership: A meta-analysis
Journal of Applied Psychology
Individual differences correlates of accuracy in evaluating others' performance effectiveness
Applied Psychological Measurement
Observation accuracy for assessors of work–sample performance: Consistency across task and individual-differences correlates
Journal of Applied Psychology
Review of the revised NEO Personality Inventory
Assessing the discriminant validity of transformational leader behavior as measured by the MLQ
Journal of Occupational and Organizational Psychology
Behaviorally anchored rating scores: Effects of education and job experience of raters and ratees
Journal of Applied Psychology
The NEO-PI/NEO-FFI manual supplement
The dependability of behavioral measurements: Theory of generalizability scores and profiles
Self-monitoring personality at work: A meta-analytic investigation of construct validity
Journal of Applied Psychology
Effect of teacher sex, student sex, and teacher warmth on the evaluation of college instructors
Journal of Educational Psychology
Personality
Annual Review of Psychology
A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models
Integration of the Big Five and circumplex approaches to trait structure
Journal of Personality and Social Psychology
Cited by (52)
Eye gaze and visual attention as a window into leadership and followership: A review of empirical insights and future directions
2023, Leadership QuarterlyCitation Excerpt :Second, conclusions may be skewed because of the reliance on perceptual measures. For example, rater perceptions of the distribution of leadership within a group may be biased by inter-individual rater cognitive and personality differences, contextual factors such as culture and length of working relationship among group members, or familiarity (or lack of naiveté) with the scale items deployed (Bono et al., 2012; Hansbrough et al., 2015; Hunter et al., 2007; Yammarino & Atwater, 1993). Consider, for example, findings revealing that observers who like or trust their leader provide systematically biased (i.e., overly positive) ratings of leader effectiveness and behavior (Brown & Keeping, 2005).
Do you remember? Rater memory systems and leadership measurement
2021, Leadership QuarterlyObserving leadership as behavior in teams and herds – An ethological approach to shared leadership research
2020, Leadership QuarterlyCitation Excerpt :Applying peer ratings of more specific sets of leadership behaviors to quantify leadership, e.g. by using the LBDQ (Stogdill, 1963) does not solve the issues of rater cognition and information processing (Lord & Maher, 1993). Factors such as ascribed social status of the target by the rater (Shollen & Brunner, 2016), rater characteristics (Bono, Hooper, & Yoon, 2012; Schyns & Felfe, 2006; Schyns & Sanders, 2007), perceived similarity and liking affect ratings of behavior (Keller Hansbrough, Lord, & Schyns, 2015). Second, using individual perceptions as the basis for team-level networks of influence changes the meaning of dyadic ties within these networks.