research-article

Automatically Temporal Labeled Data Generation Using Positional Lexicon Expansion for Focus Time Estimation of News Articles

Authors:
Usman Ahmed

Western Norway University of Applied Sciences, Bergen, Norway

Western Norway University of Applied Sciences, Bergen, Norway

0000-0002-3933-4273
View Profile

,
Jerry Chun-Wei Lin

Western Norway University of Applied Sciences, Bergen, Norway

Western Norway University of Applied Sciences, Bergen, Norway

0000-0001-8768-9709
View Profile

,
Vicente Garcia Diaz

University of Oviedo, Oviedo, Spain

University of Oviedo, Oviedo, Spain

0000-0003-2037-8548
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23 Issue 5Article No.: 64pp 1–20https://doi.org/10.1145/3568164

Published:10 May 2024Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Many facts change over time, which is a fundamental aspect of our physical environment. In the case of pandemic articles, the user is not interested in the creation date of the document but in the facts and the cause of the last pandemic. Fake news can be better combated by having a document with a temporal focus. Currently, neither the sequence of events nor the temporal focus is considered when obtaining news documents. Despite the limited number of temporal aspects in the available datasets, it is difficult to test and evaluate the temporal conclusions of the model. The goal of this work is to develop a temporal focus news article retrieval model based on co-training to advance research in semi-supervised learning. A mapping of the dataset is performed using (1) the evolving focus time of news articles and (2) the semi-supervised method based on coincidence contexts for learning low-dimensional continuous vectors for learning neural contrast embedding models generating focus time-based query in sequential news articles to facilitate temporal understanding by learning low-dimensional continuous vectors. A diverse dataset of news articles is used to evaluate the effectiveness of the proposed method. With semi-supervised learning and lexicon expansion, the result of the developed model can achieve 89%. The method performed better than previous baselines and traditional machine learning models with improvements of 12.65% and 4.7%, respectively.

REFERENCES

[1] Ahmadi Sina, Hassani Hossein, and Jaff Daban Q.. 2022. Leveraging multilingual news websites for building a kurdish parallel corpus. Trans. As. Low-Resour. Lang. Inf. Process. 21, 5 (2022), 1–11.Google ScholarDigital Library
[2] Ensar Emirali and M. Elif Karslıgil. 2022. Using word embeddings in detection of temporal expressions in Turkish texts. In Proceedings of the 30th Signal Processing and Communications Applications Conference (SIU’22). IEEE, 1–4.Google Scholar
[3] Omar Alonso, Jannik Strötgen, Ricardo Baeza-Yates, and Michael Gertz. 2011. Temporal information retrieval: Challenges and opportunities. Twaw 11 (2011), 1–8.Google Scholar
[4] Bhowmick Rajat Subhra, Ganguli Isha, Paul Jayanta, and Sil Jaya. 2021. A multimodal deep framework for derogatory social media post identification of a recognized person. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–19.Google ScholarDigital Library
[5] Cao Kai, Li Xiang, Ma Weicheng, and Grishman Ralph. 2018. Including new patterns to improve event extraction systems. In Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference (FLAIRS’18). AAAI Press, 487–492.Google Scholar
[6] Charles Walter G.. 2000. Contextual correlates of meaning. Appl. Psycholinguist. 21, 4 (2000), 505–524.Google ScholarCross Ref
[7] Costas Mavromatis, Prasanna Lakkur Subramanyam, Vassilis N. Ioannidis, Adesoji Adeshina, Phillip R. Howard, Tetiana Grinberg, Nagib Hakim, and George Karypis. 2022. Tempoqr: temporal question reasoning over knowledge graphs. Proceedings of the AAAI Conference on Artificial Intelligence 36, 5 (2022), 5825–5833.Google Scholar
[8] Chen Yubo, Liu Shulin, Zhang Xiang, Liu Kang, and Zhao Jun. 2017. Automatically labeled data generation for large scale event extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 409–419.Google ScholarCross Ref
[9] De Arkadipta, Bandyopadhyay Dibyanayan, Gain Baban, and Ekbal Asif. 2021. A transformer-based approach to multilingual fake news detection in low-resource languages. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–20.Google Scholar
[10] Deng Shumin, Zhang Ningyu, Kang Jiaojian, Zhang Yichi, Zhang Wei, and Chen Huajun. 2020. Meta-learning with dynamic-memory-based prototypical network for few-shot event detection. In Proceedings of the 13th International Conference on Web Search and Data Mining. 151–159.Google ScholarDigital Library
[11] Dhall Sakshi, Dwivedi Ashutosh Dhar, Pal Saibal K., and Srivastava Gautam. 2021. Blockchain-based framework for reducing fake or vicious news spread on social media/messaging platforms. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–33.Google Scholar
[12] Feng Xiaocheng, Qin Bing, and Liu Ting. 2018. A language-independent neural network for event detection. Sci. Chin. Inf. Sci. 61, 9 (2018), 1–12.Google ScholarCross Ref
[13] Michele Filannino. 2016. Data-driven temporal information extraction with applications in general and clinical domains. Faculty of Engineering and Physical Sciences, School of Computer Science The University of Manchester, 233 page.Google Scholar
[14] Jain Praphula Kumar, Saravanan Vijayalakshmi, and Pamula Rajendra. 2021. A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Trans. As. Low-Resour. Lang. Inf. Process. 20, 5 (2021), 1–15.Google ScholarDigital Library
[15] Jain Rachna, Jain Deepak Kumar, and Sharma Nitika. 2021. Fake news classification: A quantitative research description. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–17.Google Scholar
[16] Khan Shafiq Ur Rehman, Islam Muhammd Arshad, Aleem Muhammad, and Iqbal Muhammad Azhar. 2018. Temporal specificity-based text classification for information retrieval. Turk. J. Electr. Eng. Comput. Sci. 26, 6 (2018), 2915–2926.Google Scholar
[17] Shafiq Ur Rehman Khan, Muhammad Arshad Islam, Muhammad Aleem, Muhammad Azhar Iqbal, and Usman Ahmed. 2018. Section-based focus time estimation of news articles. IEEE Access 6 (2018), 75452–75460.Google Scholar
[18] Kumar Akshi, Esposito Christian, and Karras Dimitrios A.. 2021. Introduction to special issue on misinformation, fake news and rumor detection in low-resource languages.Google Scholar
[19] Viet Dac Lai, Tuan Ngo Nguyen, and Thien Huu Nguyen. 2020. Event detection: Gate diversity and syntactic importance scores for graph convolution neural networks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 5405–5411.Google Scholar
[20] Metzler Donald, Jones Rosie, Peng Fuchun, and Zhang Ruiqiang. 2009. Improving search relevance for implicitly temporal queries. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 700–701.Google ScholarDigital Library
[21] Munir Kashif, Zhao Hai, and Li Zuchao. 2021. Neural unsupervised semantic role labeling. Trans. As. Low-Resour. Lang. Inf. Process. 20, 6 (2021), 1–16.Google ScholarDigital Library
[22] Pang Chao, Jiang Xinzhuo, Kalluri Krishna S., Spotnitz Matthew, Chen RuiJun, Perotte Adler, and Natarajan Karthik. 2021. CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks. In Machine Learning for Health. PMLR, 239–260.Google Scholar
[23] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532–1543.Google ScholarCross Ref
[24] Rosin Guy D., Guy Ido, and Radinsky Kira. 2022. Time masking for temporal language models. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 833–841.Google ScholarDigital Library
[25] Samadi Mohammadreza, Mousavian Maryam, and Momtazi Saeedeh. 2021. Persian fake news detection: Neural representation and classification at word and text levels. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–11.Google Scholar
[26] Satyapanich Taneeya, Ferraro Francis, and Finin Tim. 2020. Casie: Extracting cybersecurity event information from text. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8749–8757.Google ScholarCross Ref
[27] Wang Haolin, Zhang Qingpeng, and Yuan Jiahu. 2017. Semantically enhanced medical information retrieval system: A tensor factorization based approach. IEEE Access 5 (2017), 7584–7593.Google ScholarCross Ref
[28] Wang Jiexin, Jatowt Adam, and Yoshikawa Masatoshi. 2021. Event occurrence date estimation based on multivariate time series analysis over temporal document collections. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 398–407.Google ScholarDigital Library
[29] Yang Hang, Chen Yubo, Liu Kang, Zhao Jun, and Wang Taifeng. 2021. Multi-sentence argument linking via an event-aware hierarchical encoder. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3578–3582.Google ScholarDigital Library
[30] Yang Zichao, Yang Diyi, Dyer Chris, He Xiaodong, Smola Alex, and Hovy Eduard. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480–1489.Google ScholarCross Ref
[31] Yeh Jui-Feng, Chen Wen-Yi, and Su Mao-Chuan. 2015. Chinese spelling checker based on an inverted index list with a rescoring mechanism. ACM Trans. As. Low-Resour. Lang. Inf. Process. 14, 4 (2015), 1–28.Google ScholarDigital Library
[32] Zahedi MohammadSadegh, Aleahmad Abolfazl, Rahgozar Maseud, Oroumchian Farhad, and Bozorgi Arastoo. 2017. Time sensitive blog retrieval using temporal properties of queries. J. Inf. Sci. 43, 1 (2017), 103–121.Google ScholarDigital Library

Index Terms

Automatically Temporal Labeled Data Generation Using Positional Lexicon Expansion for Focus Time Estimation of News Articles
1. Applied computing
  1. Computer forensics
    1. Evidence collection, storage and analysis
  2. Law, social and behavioral sciences
    1. Sociology
2. Information systems
  1. Information retrieval

Recommendations

Query Expansion with Temporal Segmented Texts
ECIR 2014: Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 8416

The use of temporal data extracted from text, to improve the effectiveness of Information Retrieval systems, has recently been the focus of important research work. Our research hypothesis is that the usage of the temporal relationship between words ...
Read More
Semantic Modelling of Document Focus-Time for Temporal Information Retrieval
WWW '22: Companion Proceedings of the Web Conference 2022

An accurate understanding of the temporal dynamics of Web content and user behaviors plays a crucial role during the interactive process between search engine and users. In this work, we focus on how to improve the retrieval performance via a better ...
Read More
Modeling Temporal Evidence from External Collections
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 5
May 2024
297 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3613584
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 May 2024
- Online AM: 19 October 2022
- Accepted: 11 October 2022
- Revised: 3 September 2022
- Received: 3 June 2022
Published in tallip Volume 23, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Information retrieval
temporal information retrieval
focus time
inverted pyramid
news retrieval
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 188
  Total Downloads
- Downloads (Last 12 months)104
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Automatically Temporal Labeled Data Generation Using Positional Lexicon Expansion for Focus Time Estimation of News Articles

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Query Expansion with Temporal Segmented Texts

Semantic Modelling of Document Focus-Time for Temporal Information Retrieval

Modeling Temporal Evidence from External Collections

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Automatically Temporal Labeled Data Generation Using Positional Lexicon Expansion for Focus Time Estimation of News Articles

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Query Expansion with Temporal Segmented Texts

Semantic Modelling of Document Focus-Time for Temporal Information Retrieval

Modeling Temporal Evidence from External Collections

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media