ABSTRACT
Due to easy dissemination of news in social media and the Web, there has been an increasing rise of disinformation on important political issues like elections in recent years. Computational solutions for automatic bias and sensationalism detection for news articles can have tremendous impact if used in the right way. Because news is an ever-shifting domain, concept drift is an issue that must be dealt with in any real-world computational news classification system that relies on features and trained machine learning models. Yet, an empirical study of concept drift in such systems, especially popular systems released recently as open-source and used within organizations, has been lacking thus far. This short paper reports results on an empirical study specifically designed to assess concept drift, using an open-source, popular computational news classification system, on real news data crawled from the Web. We find that even a gap of two years (2017 vs. 2019) can lead to significant concept drift, a far narrower gap than observed in traditional machine learning domains, making deployment of pre-trained or openly available computational news classification models an ethically suspect issue.
- B. Martens, L. Aguiar, E. Gómez, and F. Mueller-Langer, "The digital transformation of news media and the rise of disinformation and fake news," 2018.Google Scholar
- L. Floridi, "Brave. net. world: the internet as a disinformation superhighway?" The Electronic Library, vol. 14, no. 6, pp. 509--514, 1996.Google ScholarDigital Library
- R. Faris, H. Roberts, B. Etling, N. Bourassa, E. Zuckerman, and Y. Benkler, "Partisanship, propaganda, and disinformation: Online media and the 2016 us presidential election," 2017.Google Scholar
- N. J. Conroy, V. L. Rubin, and Y. Chen, "Automatic deception detection: Methods for finding fake news," Proceedings of the Association for Information Science and Technology, vol. 52, no. 1, pp. 1--4, 2015.Google ScholarCross Ref
- V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea, "Automatic detection of fake news," arXiv preprint arXiv:1708.07104, 2017.Google Scholar
- V. Rubin, N. Conroy, Y. Chen, and S. Cornwell, "Fake news or truth? using satirical cues to detect potentially misleading news," in Proceedings of the second workshop on computational approaches to deception detection, 2016, pp. 7--17.Google Scholar
- G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," Machine learning, vol. 23, no. 1, pp. 69--101, 1996.Google ScholarCross Ref
- J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, "A survey on concept drift adaptation," ACM computing surveys (CSUR), vol. 46, no. 4, p. 44, 2014.Google ScholarDigital Library
- N. J. Conroy, V. L. Rubin, and Y. Chen, "Automatic deception detection: Methods for finding fake news," Proceedings of the Association for Information Science and Technology, vol. 52, no. 1, pp. 1--4, 2015.Google ScholarCross Ref
Recommendations
Brute force concept drift detection
AbstractWe present a brute-force approach to detect concept drift behind time sequence data. This approach, named Select-Starţ searches for start points of concept drift to minimize error. In other words, Select-Start searches for the start points of new ...
Detecting concept drift using HEDDM in data stream
In evolving data stream, when its concept undergoes a change it is known as concept drift. Detecting concept drift and handling it is a challenging task in data stream mining. If an algorithm is not adapted to concept drift, then it directly affects its ...
Unsupervised concept drift detection for multi-label data streams
AbstractMany real-world applications adopt multi-label data streams as the need for algorithms to deal with rapidly changing data increases. Changes in data distribution, also known as concept drift, cause existing classification models to rapidly lose ...
Comments