Detection of self-reported experiences with corruption on twitter using unsupervised machine learning

https://doi.org/10.1016/j.ssaho.2020.100060Get rights and content
Under a Creative Commons license
open access

Abstract

Background

Corruption is a significant challenge to the future of human development, economic progress, and population health in the post millennium. Corruption, in its different forms of bribery, fraud, waste, collusion, and illicit financial flows, not only leads to waste but can also erode trust in government and public systems. Corruption is also complex and globalized with different forms of corruption occurring across different countries and multiple industries. One critical tool to leverage in the fight against corruption is the use of innovative technologies such as machine learning.

Methods

In this study, we deployed an unsupervised machine learning methodology using natural language processing to collect and analyze data from the popular social media platform Twitter with the aims of detecting self-reported experiences with corruption, including in the health sector. We collected data from the Twitter public API for keywords associated with corruption and used the biterm topic model to extract themes from the entire corpus of Tweets in order to detect user-generated messages reporting or discussing experiences with corruption.

Results

We analyzed 22, 180, 425 tweets filtered for corruption-related keywords from January–May 2019. Using a combination of NLP and manual annotation, we detected 2383 tweets from 1556 users that included self-reporting of corruption for two dominant themes: police bribery and healthcare corruption. Overall, we found a small number of users actively reporting experiences with corruption, identified users located in countries that are perceived as having higher levels of corruption by their citizens, and found that the majority of messages included reports of users’ own experiences and/or documentation of corruption.

Conclusion

Though technology is not a “silver bullet” that can entirely address the multifaceted nature of global corruption, this study demonstrates its potential utility as a force for good to enable better detection, characterize forms of corruption in different sectors, and hopefully inform future anti-corruption efforts. Additionally, the UN Sustainable Development Goals, with shared goals of fighting corruption, improving population health, encouraging technology adoption, and fostering multistakeholder partnerships, may serve as a critical governance space to catalyze technology-driven anti-corruption approaches.

Keywords

Corruption
Big data
Natural language processing
Machine learning
Healthcare
Bribery
Sustainable development goals

Cited by (0)