research-article

Beyond the Words: Predicting User Personality from Heterogeneous Information

Authors:
Honghao Wei

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Fuzheng Zhang

Microsoft Research, Beijing, China

Microsoft Research, Beijing, China
View Profile

,
Nicholas Jing Yuan

Microsoft, Beijing, China

Microsoft, Beijing, China
View Profile

,
Chuan Cao

Microsoft, Beijing, China

Microsoft, Beijing, China
View Profile

,
Hao Fu

Microsoft, Beijing, China

Microsoft, Beijing, China
View Profile

,
Xing Xie

Microsoft Research, Beijing, China

Microsoft Research, Beijing, China
View Profile

,
Yong Rui

Microsoft Research, Beijing, China

Microsoft Research, Beijing, China
View Profile

,
Wei-Ying Ma

Microsoft Research, Beijing, China

Microsoft Research, Beijing, China
View Profile

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningFebruary 2017Pages 305–314https://doi.org/10.1145/3018661.3018717

Published:02 February 2017Publication History

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Pages 305–314

ABSTRACT

An incisive understanding of user personality is not only essential to many scientific disciplines, but also has a profound business impact on practical applications such as digital marketing, personalized recommendation, mental diagnosis, and human resources management. Previous studies have demonstrated that language usage in social media is effective in personality prediction. However, except for single language features, a less researched direction is how to leverage the heterogeneous information on social media to have a better understanding of user personality. In this paper, we propose a Heterogeneous Information Ensemble framework, called HIE, to predict users' personality traits by integrating heterogeneous information including self-language usage, avatar, emoticon, and responsive patterns. In our framework, to improve the performance of personality prediction, we have designed different strategies extracting semantic representations to fully leverage heterogeneous information on social media. We evaluate our methods with extensive experiments based on a real-world data covering both personality survey results and social media usage from thousands of volunteers. The results reveal that our approaches significantly outperform several widely adopted state-of-the-art baseline methods. To figure out the utility of HIE in a real-world interactive setting, we also present DiPsy, a personalized chatbot to predict user personality through heterogeneous information in digital traces and conversation logs.

References

M. R. Barrick and M. K. Mount. The big five personality dimensions and job performance: a meta-analysis. Personnel psychology, 44(1):1--26, 1991. Google ScholarCross Ref
V. Benet-Martinez and O. P. John. Los cinco grandes across cultures and ethnic groups: Multitrait-multimethod analyses of the big five in spanish and english. Journal of personality and social psychology, 75(3):729, 1998. Google ScholarCross Ref
K. Bessière, A. F. Seay, and S. Kiesler. The ideal elf: Identity exploration in world of warcraft. CyberPsychology & Behavior, 10(4):530--535, 2007. Google Scholar
P. T. Costa and R. R. MacCrae. Revised NEO personality inventory (NEO PI-R) and NEO five-factor inventory (NEO FFI): Professional manual. Psychological Assessment Resources, 1992.Google Scholar
P. T. Costa and R. R. McCrae. Neo Pi-R. Psychological assessment resources, 1992.Google Scholar
R. A. Dunn and R. E. Guadagno. My avatar and me--gender and personality predictors of avatar-self discrepancy. Computers in Human Behavior, 28(1):97--106, 2012. Google ScholarDigital Library
P. Ekman. An argument for basic emotions. Cognition & emotion, 6(3-4):169--200, 1992. Google ScholarCross Ref
J. K. Ford. Brands laid bare: Using market research for evidence-based brand management. John Wiley & Sons, 2005.Google Scholar
L. R. Goldberg, J. A. Johnson, H. W. Eber, R. Hogan, M. C. Ashton, C. R. Cloninger, and H. G. Gough. The international personality item pool and the future of public-domain personality measures. Journal of Research in personality, 40(1):84--96, 2006. Google ScholarCross Ref
L. Gou, M. X. Zhou, and H. Yang. Knowme and shareme: understanding automatically discovered personality traits from social media and user sharing preferences. In Proceedings of the 32nd annual ACM conference on Human factors in computing systems, pages 955--964. ACM, 2014. Google ScholarDigital Library
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.Google Scholar
O. P. John, E. M. Donahue, and R. L. Kentle. The big five inventory-versions 4a and 54, 1991.Google Scholar
O. P. John, L. P. Naumann, and C. J. Soto. Paradigm shift to the integrative big five trait taxonomy. Handbook of personality: Theory and research, 3:114--158, 2008.Google Scholar
Y. Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.Google Scholar
M. Kosinski, D. Stillwell, and T. Graepel. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110(15):5802--5805, 2013. Google ScholarCross Ref
W. Mischel, Y. Shoda, R. E. Smith, and F. W. Mischel. Introduction to personality. University of Phoenix: A John Wiley & Sons, Ltd., Publication, 2004.Google Scholar
J. W. Pennebaker, M. E. Francis, and R. J. Booth. Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates, 71:2001, 2001.Google Scholar
J. Piazza and J. M. Bering. Evolutionary cyber-psychology: Applying an evolutionary framework to internet behavior. Computers in Human Behavior, 25(6):1258--1269, 2009. Google ScholarDigital Library
H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E. Seligman, et al. Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9):e73791, 2013. Google Scholar
Y. R. Tausczik and J. W. Pennebaker. The psychological meaning of words: Liwc and computerized text analysis methods. Journal of language and social psychology, 29(1):24--54, 2010. Google ScholarCross Ref
D. H. Wolpert. Stacked generalization. Neural networks, 5(2):241--259, 1992. Google ScholarDigital Library
T. Yarkoni. Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers. Journal of research in personality, 44(3):363--373, 2010. Google ScholarCross Ref
W. Youyou, M. Kosinski, and D. Stillwell. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4):1036--1040, 2015. Google ScholarCross Ref
Z. Zhao, H. Lu, D. Cai, X. He, and Y. Zhuang. User preference learning for online social recommendation.Google Scholar
Z. Zhao, Q. Yang, D. Cai, X. He, and Y. Zhuang. Expert finding for community-based question answering via ranking metric network learning.Google Scholar

Index Terms

Beyond the Words: Predicting User Personality from Heterogeneous Information
1. Applied computing
  1. Law, social and behavioral sciences
    1. Psychology
2. Information systems
  1. World Wide Web
    1. Web mining

Recommendations

Beyond Facebook Personality Prediction
Proceedings of the 6th International Conference on Social Computing and Social Media - Volume 8531

We investigate creating a predictive model that increases accuracy in personality prediction of social media and social network site users through a multidisciplinary pilot analysis. We present a novel method for increasing personality prediction ...
Read More
The relationship between personality traits and susceptibility to social influence
Abstract
Research has shown that social influence can be leveraged as a persuasive strategy to elicit beneficial behaviors, especially if it is tailored to the target audience. However, research on the impact of personality traits on users' ...
Highlights
- Users who are high in Neuroticism are more likely to be susceptible to Social Learning, Social Proof and Social Comparison.
Read More
An investigation of Big Five and narrow personality traits in relation to Internet usage

The relationship between Internet usage and the Big Five as well as three narrow personality traits was examined using 117 undergraduates as study participants. Results indicated that total Internet usage was negatively related to three of the Big Five ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
February 2017
868 pages
ISBN:9781450346757
DOI:10.1145/3018661
General Chairs:
Maarten de Rijke
University of Amsterdam
,
Milad Shokouhi
Microsoft
,
Program Chairs:
Andrew Tomkins
Google
,
Min Zhang
Tsinghua University
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 February 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
big five
heterogeneous information
user personality
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 56
  Total Citations
  View Citations
- 2,405
  Total Downloads
- Downloads (Last 12 months)113
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond the Words: Predicting User Personality from Heterogeneous Information

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Beyond Facebook Personality Prediction

The relationship between personality traits and susceptibility to social influence

An investigation of Big Five and narrow personality traits in relation to Internet usage