skip to main content
research-article

Automatically Temporal Labeled Data Generation Using Positional Lexicon Expansion for Focus Time Estimation of News Articles

Published:10 May 2024Publication History
Skip Abstract Section

Abstract

Many facts change over time, which is a fundamental aspect of our physical environment. In the case of pandemic articles, the user is not interested in the creation date of the document but in the facts and the cause of the last pandemic. Fake news can be better combated by having a document with a temporal focus. Currently, neither the sequence of events nor the temporal focus is considered when obtaining news documents. Despite the limited number of temporal aspects in the available datasets, it is difficult to test and evaluate the temporal conclusions of the model. The goal of this work is to develop a temporal focus news article retrieval model based on co-training to advance research in semi-supervised learning. A mapping of the dataset is performed using (1) the evolving focus time of news articles and (2) the semi-supervised method based on coincidence contexts for learning low-dimensional continuous vectors for learning neural contrast embedding models generating focus time-based query in sequential news articles to facilitate temporal understanding by learning low-dimensional continuous vectors. A diverse dataset of news articles is used to evaluate the effectiveness of the proposed method. With semi-supervised learning and lexicon expansion, the result of the developed model can achieve 89%. The method performed better than previous baselines and traditional machine learning models with improvements of 12.65% and 4.7%, respectively.

REFERENCES

  1. [1] Ahmadi Sina, Hassani Hossein, and Jaff Daban Q.. 2022. Leveraging multilingual news websites for building a kurdish parallel corpus. Trans. As. Low-Resour. Lang. Inf. Process. 21, 5 (2022), 111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Ensar Emirali and M. Elif Karslıgil. 2022. Using word embeddings in detection of temporal expressions in Turkish texts. In Proceedings of the 30th Signal Processing and Communications Applications Conference (SIU’22). IEEE, 1–4.Google ScholarGoogle Scholar
  3. [3] Omar Alonso, Jannik Strötgen, Ricardo Baeza-Yates, and Michael Gertz. 2011. Temporal information retrieval: Challenges and opportunities. Twaw 11 (2011), 1–8.Google ScholarGoogle Scholar
  4. [4] Bhowmick Rajat Subhra, Ganguli Isha, Paul Jayanta, and Sil Jaya. 2021. A multimodal deep framework for derogatory social media post identification of a recognized person. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cao Kai, Li Xiang, Ma Weicheng, and Grishman Ralph. 2018. Including new patterns to improve event extraction systems. In Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference (FLAIRS’18). AAAI Press, 487492.Google ScholarGoogle Scholar
  6. [6] Charles Walter G.. 2000. Contextual correlates of meaning. Appl. Psycholinguist. 21, 4 (2000), 505524.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Costas Mavromatis, Prasanna Lakkur Subramanyam, Vassilis N. Ioannidis, Adesoji Adeshina, Phillip R. Howard, Tetiana Grinberg, Nagib Hakim, and George Karypis. 2022. Tempoqr: temporal question reasoning over knowledge graphs. Proceedings of the AAAI Conference on Artificial Intelligence 36, 5 (2022), 5825–5833.Google ScholarGoogle Scholar
  8. [8] Chen Yubo, Liu Shulin, Zhang Xiang, Liu Kang, and Zhao Jun. 2017. Automatically labeled data generation for large scale event extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 409419.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] De Arkadipta, Bandyopadhyay Dibyanayan, Gain Baban, and Ekbal Asif. 2021. A transformer-based approach to multilingual fake news detection in low-resource languages. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 120.Google ScholarGoogle Scholar
  10. [10] Deng Shumin, Zhang Ningyu, Kang Jiaojian, Zhang Yichi, Zhang Wei, and Chen Huajun. 2020. Meta-learning with dynamic-memory-based prototypical network for few-shot event detection. In Proceedings of the 13th International Conference on Web Search and Data Mining. 151159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Dhall Sakshi, Dwivedi Ashutosh Dhar, Pal Saibal K., and Srivastava Gautam. 2021. Blockchain-based framework for reducing fake or vicious news spread on social media/messaging platforms. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 133.Google ScholarGoogle Scholar
  12. [12] Feng Xiaocheng, Qin Bing, and Liu Ting. 2018. A language-independent neural network for event detection. Sci. Chin. Inf. Sci. 61, 9 (2018), 112.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Michele Filannino. 2016. Data-driven temporal information extraction with applications in general and clinical domains. Faculty of Engineering and Physical Sciences, School of Computer Science The University of Manchester, 233 page.Google ScholarGoogle Scholar
  14. [14] Jain Praphula Kumar, Saravanan Vijayalakshmi, and Pamula Rajendra. 2021. A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Trans. As. Low-Resour. Lang. Inf. Process. 20, 5 (2021), 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Jain Rachna, Jain Deepak Kumar, and Sharma Nitika. 2021. Fake news classification: A quantitative research description. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 117.Google ScholarGoogle Scholar
  16. [16] Khan Shafiq Ur Rehman, Islam Muhammd Arshad, Aleem Muhammad, and Iqbal Muhammad Azhar. 2018. Temporal specificity-based text classification for information retrieval. Turk. J. Electr. Eng. Comput. Sci. 26, 6 (2018), 29152926.Google ScholarGoogle Scholar
  17. [17] Shafiq Ur Rehman Khan, Muhammad Arshad Islam, Muhammad Aleem, Muhammad Azhar Iqbal, and Usman Ahmed. 2018. Section-based focus time estimation of news articles. IEEE Access 6 (2018), 75452–75460.Google ScholarGoogle Scholar
  18. [18] Kumar Akshi, Esposito Christian, and Karras Dimitrios A.. 2021. Introduction to special issue on misinformation, fake news and rumor detection in low-resource languages.Google ScholarGoogle Scholar
  19. [19] Viet Dac Lai, Tuan Ngo Nguyen, and Thien Huu Nguyen. 2020. Event detection: Gate diversity and syntactic importance scores for graph convolution neural networks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 5405–5411.Google ScholarGoogle Scholar
  20. [20] Metzler Donald, Jones Rosie, Peng Fuchun, and Zhang Ruiqiang. 2009. Improving search relevance for implicitly temporal queries. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 700701.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Munir Kashif, Zhao Hai, and Li Zuchao. 2021. Neural unsupervised semantic role labeling. Trans. As. Low-Resour. Lang. Inf. Process. 20, 6 (2021), 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Pang Chao, Jiang Xinzhuo, Kalluri Krishna S., Spotnitz Matthew, Chen RuiJun, Perotte Adler, and Natarajan Karthik. 2021. CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks. In Machine Learning for Health. PMLR, 239260.Google ScholarGoogle Scholar
  23. [23] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 15321543.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Rosin Guy D., Guy Ido, and Radinsky Kira. 2022. Time masking for temporal language models. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 833841.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Samadi Mohammadreza, Mousavian Maryam, and Momtazi Saeedeh. 2021. Persian fake news detection: Neural representation and classification at word and text levels. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 111.Google ScholarGoogle Scholar
  26. [26] Satyapanich Taneeya, Ferraro Francis, and Finin Tim. 2020. Casie: Extracting cybersecurity event information from text. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 87498757.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Wang Haolin, Zhang Qingpeng, and Yuan Jiahu. 2017. Semantically enhanced medical information retrieval system: A tensor factorization based approach. IEEE Access 5 (2017), 75847593.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Wang Jiexin, Jatowt Adam, and Yoshikawa Masatoshi. 2021. Event occurrence date estimation based on multivariate time series analysis over temporal document collections. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 398407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Yang Hang, Chen Yubo, Liu Kang, Zhao Jun, and Wang Taifeng. 2021. Multi-sentence argument linking via an event-aware hierarchical encoder. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 35783582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Yang Zichao, Yang Diyi, Dyer Chris, He Xiaodong, Smola Alex, and Hovy Eduard. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 14801489.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Yeh Jui-Feng, Chen Wen-Yi, and Su Mao-Chuan. 2015. Chinese spelling checker based on an inverted index list with a rescoring mechanism. ACM Trans. As. Low-Resour. Lang. Inf. Process. 14, 4 (2015), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Zahedi MohammadSadegh, Aleahmad Abolfazl, Rahgozar Maseud, Oroumchian Farhad, and Bozorgi Arastoo. 2017. Time sensitive blog retrieval using temporal properties of queries. J. Inf. Sci. 43, 1 (2017), 103121.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatically Temporal Labeled Data Generation Using Positional Lexicon Expansion for Focus Time Estimation of News Articles

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Asian and Low-Resource Language Information Processing
                  ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 5
                  May 2024
                  297 pages
                  ISSN:2375-4699
                  EISSN:2375-4702
                  DOI:10.1145/3613584
                  • Editor:
                  • Imed Zitouni
                  Issue’s Table of Contents

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 10 May 2024
                  • Online AM: 19 October 2022
                  • Accepted: 11 October 2022
                  • Revised: 3 September 2022
                  • Received: 3 June 2022
                  Published in tallip Volume 23, Issue 5

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                Full Text

                View this article in Full Text.

                View Full Text