research-article

A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models

Authors:
Usman Naseem

School of Computer Science, The University of Sydney, Australia

School of Computer Science, The University of Sydney, Australia
View Profile

,
Imran Razzak

School of Information Technology, Deakin University, Australia

School of Information Technology, Deakin University, Australia
View Profile

,
Shah Khalid Khan

School of Engineering, RMIT University, Australia

School of Engineering, RMIT University, Australia
View Profile

,
Mukesh Prasad

School of Computer Science, University of Technology Sydney, Australia

School of Computer Science, University of Technology Sydney, Australia
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20 Issue 5Article No.: 74pp 1–35https://doi.org/10.1145/3434237

Published:30 June 2021Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP-related tasks. In the end, this survey briefly discusses the commonly used ML- and DL-based classifiers, evaluation metrics, and the applications of these word embeddings in different NLP tasks.

References

Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca J. Passonneau. 2011. Sentiment analysis of Twitter data. https://www.aclweb.org/anthology/W11-0705.pdf.Google Scholar
Charu C. Aggarwal and Chandan K. Reddy. 2013. Data Clustering: Algorithms and Applications. CRC Press. Google ScholarDigital Library
Charu C. Aggarwal and ChengXiang Zhai. 2012. A survey of text classification algorithms. In Mining Text Data. Springer, Boston, MA, 163–222.Google ScholarDigital Library
Edgar Altszyler, Mariano Sigman, and Diego Fernández Slezak. 2016. Comparative study of LSA vs Word2vec embeddings in small corpora: A case study in dreams database. (2016). arxiv:abs/1610.01520.Google Scholar
Alexandra Balahur. 2013. Sentiment analysis in social media texts. In WASSA@NAACL-HLT.Google Scholar
Jorge A. Balazs and Juan D. Velásquez. 2016. Opinion mining and information fusion: A survey. Inf. Fus. 27 (2016), 95–110. Google ScholarDigital Library
Himani Bansal, Gulshan Shrivastava, Nguyen Nhu, and L. M. Stanciu (Eds.). 2018. Social Network Analytics for Contemporary Business Organizations. IGI Global. DOI:https://doi.org/10.4018/978-1-5225-5097-6 Google ScholarDigital Library
Yanwei Bao, Changqin Quan, Lijuan Wang, and Fuji Ren. 2014. The role of pre-processing in Twitter sentiment analysis. In Intelligent Computing Methodologies, De-Shuang Huang, Kang-Hyun Jo, and Ling Wang (Eds.). Springer International Publishing, Cham, 615–624.Google Scholar
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A pretrained language model for scientific text. arxiv:cs.CL/1903.10676.Google Scholar
Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 8 (2013), 1798–1828. Google ScholarDigital Library
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137–1155. Retrieved from http://dl.acm.org/citation.cfm?id=944919.944966. Google ScholarDigital Library
Y. Bengio, P. Simard, and P. Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5, 2 (Mar. 1994), 157–166. DOI:https://doi.org/10.1109/72.279181 Google ScholarDigital Library
Adam Bermingham and Alan Smeaton. 2011. On using Twitter to monitor political sentiment and predict election results. In Proceedings of the Workshop on Sentiment Analysis Where AI Meets Psychology (’11). Asian Federation of Natural Language Processing, 2–10. Retrieved from https://www.aclweb.org/anthology/W11-3702.Google Scholar
Marina Boia, Boi Faltings, Claudiu Cristian Musat, and Pearl Pu. 2013. A :) Is worth a thousand words: How people attach sentiment to emoticons and words in tweets. In International Conference on Social Computing. 345–350. Google ScholarDigital Library
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016).Google Scholar
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. CoRR abs/1607.04606 (2016).Google Scholar
Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Conference on Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 4349–4357. Retrieved from http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf. Google ScholarDigital Library
Jose Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artif. Intell. 240 (2016), 36–64. DOI:https://doi.org/10.1016/j.artint.2016.07.005Google ScholarCross Ref
Erik Cambria, Soujanya Poria, Devamanyu Hazarika, and Kenneth Kwok. 2018. SenticNet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In Association for the Advancement of Artificial Intelligence Conference.Google Scholar
Xavier Carreras and Lluís Màrquez. 2001. Boosting trees for anti-spam email filtering. CoRR cs.CL/0109015 (2001).Google Scholar
Giuseppe Castellucci, Danilo Croce, and Roberto Basili. 2015. Acquiring a large scale polarity lexicon through unsupervised distributional methods. In Natural Language Processing and Information Systems, Chris Biemann, Siegfried Handschuh, André Freitas, Farid Meziane, and Elisabeth Métais (Eds.). Springer International Publishing, Cham, 73–86.Google Scholar
Arda Celebi and Arzucan Ozgur. 2016. Segmenting hashtags using automatically created training data. https://www.aclweb.org/anthology/L16-1476.pdf.Google Scholar
Wei James Chen, Xiaoshen Xie, Jiale Wang, Biswajeet Pradhan, Haoyuan Hong, Dieu Tien Bui, Zhao Duan, and Jianquan Ma. 2017. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. https://www.sciencedirect.com/science/article/pii/S0341816216305136.Google Scholar
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 1724–1734. DOI:https://doi.org/10.3115/v1/D14-1179Google ScholarCross Ref
Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014).Google Scholar
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).Google Scholar
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In 25th International Conference on Machine Learning (ICML’08). ACM, New York, NY, 160–167. DOI:https://doi.org/10.1145/1390156.1390177 Google ScholarDigital Library
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural language processing (almost) from scratch. CoRR abs/1103.0398 (2011). Google ScholarDigital Library
Thomas Davidson, Dana Warmsley, Michael W. Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. CoRR abs/1703.04009 (2017).Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018).Google Scholar
Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W. Cohen. 2017. A comparative study of word embeddings for reading comprehension. CoRR abs/1703.00993 (2017).Google Scholar
Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Conference on Advances in Neural Information Processing Systems. 13063–13075. Google ScholarDigital Library
Cícero Nogueira dos Santos and Maíra A. de C. Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In International Conference on Computational Linguistics.Google Scholar
Ábel Elekes, Adrian Englhardt, Schäler, and Klemens Böhm. 2018. Toward meaningful notions of similarity in NLP embedding models. Int. J. Dig. Libr. (04 2018). DOI:https://doi.org/10.1007/s00799-018-0237-yGoogle ScholarCross Ref
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9 (June 2008), 1871–1874. Retrieved from http://dl.acm.org/citation.cfm?id=1390681.1442794. Google ScholarDigital Library
Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard H. Hovy, and Noah A. Smith. 2014. Retrofitting word vectors to semantic lexicons. CoRR abs/1411.4166 (2014).Google Scholar
Jennifer Foster, Özlem Çetinoğlu, Joachim Wagner, Joseph Le Roux, Joakim Nivre, Deirdre Hogan, and Josef van Genabith. 2011. From news to comment: Resources and benchmarks for parsing the language of Web 2.0. In Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 893–901. Retrieved from https://www.aclweb.org/anthology/I11-1100.Google Scholar
Xianghua Fu, Wangwang Liu, Yingying Xu, and Laizhong Cui. 2017. Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis. Neurocomputing 241 (2017), 18–27.Google ScholarCross Ref
Alexander Genkin, David D. Lewis, and David Madigan. 2007. Large-scale Bayesian logistic regression for text categorization. Technometrics 49, 3 (2007), 291–304. DOI:https://doi.org/10.1198/004017007000000245Google ScholarCross Ref
Anastasia Giachanou, Julio Gonzalo, Ida Mele, and Fabio Crestani. 2017. Sentiment propagation for predicting reputation polarity. DOI:https://doi.org/10.1007/978-3-319-56608-5_18Google Scholar
Kevin Gimpel, Nathan Schneider, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. [n.d.]. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. https://www.aclweb.org/anthology/P11-2008.pdf.Google Scholar
Christian Giovanelli, Xin U. Liu, Seppo Antero Sierla, Valeriy Vyatkin, and Ryutaro Ichise. 2017. Towards an aggregator that exploits big data to bid on frequency containment reserve market. In 43rd Conference of the IEEE Industrial Electronics Society. 7514–7519.Google ScholarCross Ref
Edel Greevy. 2004. Automatic text categorisation of racist webpages. http://doras.dcu.ie/17275/1/edel_greevy_20120702122736.pdf.Google Scholar
Vishal Gupta and Gurpreet Lehal. 2009. A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1 (08 2009). DOI:DOI:https://doi.org/10.4304/jetwi.1.1.60-76Google Scholar
Emma Haddi, Xiaohui Liu, and Yong Shi. 2013. The role of text pre-processing in sentiment analysis. In International Conference on Information Technology and Quantitative Management.Google ScholarCross Ref
Khaled M. Hammouda and Mohamed S. Kamel. 2004. Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16, 10 (Oct. 2004), 1279–1296. DOI:DOI:https://doi.org/10.1109/TKDE.2004.58 Google ScholarDigital Library
Yulan He, Chenghua Lin, and Harith Alani. 2011. Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In 49th Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 123–131. Retrieved from https://www.aclweb.org/anthology/P11-1013. Google ScholarDigital Library
Aurélie Herbelot and Marco Baroni. 2017. High-risk learning: Acquiring new word vectors from tiny data. CoRR abs/1707.06556 (2017).Google Scholar
Bruce M. Hill. 1968. Posterior distribution of percentiles: Bayes’ theorem for sampling from a population. J. Amer. Statist. Assoc. 63, 322 (1968), 677–691. Retrieved from http://www.jstor.org/stable/2284038.Google Scholar
Tin Kam Ho. 1998. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8 (Aug. 1998), 832–844. DOI:DOI:https://doi.org/10.1109/34.709601 Google ScholarDigital Library
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780. DOI:DOI:https://doi.org/10.1162/neco.1997.9.8.1735 Google ScholarDigital Library
Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In 56th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 328–339. DOI:DOI:https://doi.org/10.18653/v1/P18-1031Google ScholarCross Ref
Xia Hu and Huan Liu. 2012. Text Analytics in Social Media. Springer US, Boston, MA, 385–414. DOI:DOI:https://doi.org/10.1007/978-1-4614-3223-4_12Google Scholar
Ah hwee Tan. 1999. Text mining: The state of the art and the challenges. In Workshop on Knowledge Discovery from Advanced Databases. 65–70.Google Scholar
Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. SensEmbed: Learning sense embeddings for word and relational similarity. In Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
Suzana Ilic, Edison Marrese-Taylor, Jorge A. Balazs, and Yutaka Matsuo. 2018. Deep contextualized word representations for detecting sarcasm and irony. CoRR abs/1809.09795 (2018).Google Scholar
Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Reading text in the wild with convolutional neural networks. CoRR abs/1412.1842 (2014). Google ScholarDigital Library
Zhao Jianqiang. 2015. Pre-processing boosting Twitter sentiment analysis?DOI:DOI:https://doi.org/10.1109/SmartCity.2015.158Google Scholar
Zhao Jianqiang and Gui Xiaolin. 2017. Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5 (2017), 2870–2879.Google ScholarCross Ref
Zhao Jianqiang and Gui Xiaolin. 2018. Deep convolution neural networks for Twitter sentiment analysis. IEEE Access PP (01 2018), 1–1. DOI:DOI:https://doi.org/10.1109/ACCESS.2017.2776930Google ScholarCross Ref
Rie Johnson and Tong Zhang. 2014. Effective use of word order for text categorization with convolutional neural networks. CoRR abs/1412.1058 (2014).Google Scholar
Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Ling. 8 (2020), 64–77.Google ScholarCross Ref
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. CoRR abs/1607.01759 (2016).Google Scholar
Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. 2019. CTRL: A conditional transformer language model for controllable generation. arxiv:cs.CL/1909.05858.Google Scholar
Farhan Hassan Khan, Saba Bashir, and Usman Qamar. 2014. TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57 (Jan. 2014), 245–257. DOI:DOI:https://doi.org/10.1016/j.dss.2013.09.004 Google ScholarDigital Library
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).Google Scholar
Vandana Korde and C. Namrata Mahender. 2012. Text classification and classifiers: A survey. http://www.airccse.org/journal/ijaia/papers/3212ijaia08.pdfhttps://www.researchgate.net/publication/276196340_Text_Classification_and_ClassifiersA_Survey.Google Scholar
Efthymios Kouloumpis, Theresa Wilson, and Johanna D. Moore. 2011. Twitter sentiment analysis: The good the bad and the OMG! In International AAAI Conference on Web and Social Media.Google Scholar
Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura E. Barnes, and Donald E. Brown. 2019. Text classification algorithms: A survey. CoRR abs/1904.08067 (2019).Google Scholar
Irene Kwok and Yuzhou Wang. 2013. Locate the hate: Detecting tweets against blacks. In 27th AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, 1621–1622. Retrieved from http://dl.acm.org/citation.cfm?id=2891460.2891697. Google ScholarDigital Library
Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019).Google Scholar
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arxiv:cs.CL/1909.11942.Google Scholar
Ray R. Larson. 2010. Introduction to information retrieval. J. Amer. Soc. Inf. Sci. Technol. 61, 4 (Apr. 2010), 852–853. DOI:DOI:https://doi.org/10.1002/asi.v61:4 Google ScholarDigital Library
Paula Lauren, Guangzhi Qu, Feng Zhang, and Amaury Lendasse. 2018. Discriminant document embeddings with an extreme learning machine for classifying clinical narratives. Neurocomputing 277 (2018), 129–138. DOI:DOI:https://doi.org/10.1016/j.neucom.2017.01.117Google ScholarCross Ref
Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. CoRR abs/1405.4053 (2014). Google ScholarDigital Library
Yann LeCun, Y. Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (05 2015), 436–44. DOI:DOI:https://doi.org/10.1038/nature14539Google Scholar
Yann Lecun, Leon Bottou, Y. Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86 (12 1998), 2278–2324. DOI:DOI:https://doi.org/10.1109/5.726791Google ScholarCross Ref
Ledell, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, and Jason Weston. 2017. StarSpace: Embed all the things!arxiv:cs.CL/1709.03856.Google Scholar
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. arxiv:cs.CL/1901.08746.Google Scholar
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).Google Scholar
Liang-Chih, Jin Wang, K. Robert Lai, and Xuejie Zhang. 2018. Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Trans. Audio, Speech Lang. Proc. 26, 3 (Mar. 2018), 671–681. DOI:DOI:https://doi.org/10.1109/TASLP.2017.2788182 Google ScholarDigital Library
Chenghua Lin and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In 18th ACM Conference on Information and Knowledge Management (CIKM’09). ACM, New York, NY, 375–384. DOI:DOI:https://doi.org/10.1145/1645953.1646003 Google ScholarDigital Library
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2015. Learning context-sensitive word embeddings with neural tensor skip-gram model. In 24th International Conference on Artificial Intelligence (IJCAI’15). AAAI Press, 1284–1290. Retrieved from http://dl.acm.org/citation.cfm?id=2832415.2832428. Google ScholarDigital Library
Shuhua Liu and Thomas Forss. 2014. Combining N-gram based similarity analysis with sentiment analysis in web content classification. In International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1 (IC3K’14). SCITEPRESS - Science and Technology Publications, Lda, 530–537. DOI:DOI:https://doi.org/10.5220/0005170305300537 Google ScholarDigital Library
Yuanchao Liu, Bingquan Liu, Lili Shan, and Xin Wang. 2018. Modelling context with neural networks for recommending idioms in essay writing. Neurocomputing 275 (2018), 2287–2293. DOI:DOI:https://doi.org/10.1016/j.neucom.2017.11.005Google ScholarCross Ref
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019).Google Scholar
David M. Magerman. 1995. Statistical decision-tree models for parsing. In 33rd Meeting on Association for Computational Linguistics (ACL’95). Association for Computational Linguistics, 276–283. DOI:DOI:https://doi.org/10.3115/981658.981695 Google ScholarDigital Library
Danilo P. Mandic and Jonathon Chambers. 2001. Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability. John Wiley & Sons, Inc., New York, NY. Google ScholarDigital Library
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Meeting of the Association for Computational Linguistics (System Demonstrations). 55–60.Google ScholarCross Ref
Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. CoRR abs/1708.00107 (2017). Google ScholarDigital Library
Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Conference on Advances in Neural Information Processing Systems. Google ScholarDigital Library
Yelena Mejova and Padmini Srinivasan. 2011. Exploring feature definition and selection for sentiment classifiers.Google Scholar
Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, 51–61. DOI:DOI:https://doi.org/10.18653/v1/K16-1006Google ScholarCross Ref
Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In Conference on Computational Natural Language Learning.Google ScholarCross Ref
Prem Melville, Wojciech Gryc, and Richard D. Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 1275–1284. DOI:DOI:https://doi.org/10.1145/1557019.1557156 Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Conference on Advances in Neural Information Processing Systems. 3111–9. Google ScholarDigital Library
Saif Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu. 2013. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Association for Computational Linguistics, 321–327. Retrieved from http://aclweb.org/anthology/S13-2053.Google Scholar
James N. Morgan and John A. Sonquist. 1963. Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc. 58, 302 (1963), 415–434. Retrieved from http://www.jstor.org/stable/2283276.Google ScholarCross Ref
Nikola Mrksic, Ivan Vulic, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gasic, Anna Korhonen, and Steve J. Young. 2017. Semantic specialisation of distributional word vector spaces using monolingual and cross-lingual constraints. CoRR abs/1706.00374 (2017).Google Scholar
T. Mullen and R. Malouf. 2006. A preliminary investigation into sentiment analysis of informal political discourse. AAAI Spring Symposium - Technical Report SS-06-03 (2006), 159–162. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-33747172751&partnerID=40&md5=6b12793b70eae006102989ed6d398fcb.Google Scholar
Martin Müller, Marcel Salathé, and Per E. Kummervold. 2020. COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter. arXiv preprint arXiv:2005.07503 (2020).Google Scholar
Marwa Naili, Anja Habacha Chaibi, and Henda Hajjami Ben Ghezala. 2017. Comparative study of word embedding methods in topic segmentation. Procedia Comput. Sci. 112 (2017), 340–349. Google ScholarDigital Library
Vivek Narayanan, Ishan Arora, and Arjun Bhatia. 2013. Fast and accurate sentiment classification using an enhanced Naive Bayes model. CoRR abs/1305.6143 (2013). Google ScholarDigital Library
Usman Naseem. 2020. Hybrid Words Representation for the Classification of Low Quality Text. Ph.D. Dissertation. University of Technology Sydney, Australia.Google Scholar
U. Naseem, S. K. Khan, M. Farasat, and F. Ali. 2019. Abusive language detection: A comprehensive review. Indian J. Sci. Technol. 12, 45 (2019), 1–13.Google Scholar
U. Naseem, I. Razzak, and P. W. Eklund. 2020. A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimedia Tools and Applications, 1–28.Google Scholar
Usman Naseem, Shah Khalid Khan, Imran Razzak, and Ibrahim A. Hameed. 2019. Hybrid words representation for airlines sentiment analysis. In AI 2019: Advances in Artificial Intelligence, Jixue Liu and James Bailey (Eds.). Springer International Publishing, Cham, 381–392.Google Scholar
Usman Naseem, Matloob Khushi, Shah Khalid Khan, Nazar Waheed, Adnan Mir, Atika Qazi, Bandar Alshammari, and Simon K. Poon. 2020. Diabetic retinopathy detection using multi-layer neural networks and split attention with focal loss. In International Conference on Neural Information Processing. Springer, 1–12.Google Scholar
Usman Naseem, Matloob Khushi, Vinay Reddy, Sakthivel Rajendran, Imran Razzak, and Jinman Kim. 2020. BioALBERT: A simple and effective pre-trained language model for biomedical named entity recognition. arXiv preprint arXiv:2009.09223 (2020).Google Scholar
Usman Naseem and Katarzyna Musial. 2019. DICE: Deep intelligent contextual embedding for Twitter sentiment analysis. In International Conference on Document Analysis and Recognition (ICDAR’19). IEEE, 953–958.Google ScholarCross Ref
U. Naseem, I. Razzak, M. Khushi, P. W. Eklund, and J. Kim. 2021. COVIDSenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis. IEEE Transactions on Computational Social Systems.Google ScholarCross Ref
Usman Naseem, Katarzyna Musial, Peter Eklund, and Mukesh Prasad. 2020. Biomedical named-entity recognition by hierarchically fusing BioBERT representations and deep contextual-level word-embedding. In International Joint Conference on Neural Networks (IJCNN’20). IEEE, 1–8.Google ScholarCross Ref
Usman Naseem, Imran Razzak, Peter Eklund, and Katarzyna Musial. 2020. Towards improved deep contextual embedding for the identification of irony and sarcasm. In International Joint Conference on Neural Networks (IJCNN’20). IEEE, 1–7.Google ScholarCross Ref
Usman Naseem, Imran Razzak, and Ibrahim A. Hameed. 2019. Deep context-aware embedding for abusive and hate speech detection on Twitter. Aust. J. Intell. Inf. Process. Syst. 15, 3 (2019), 69–76.Google Scholar
Usman Naseem, Imran Razzak, Katarzyna Musial, and Muhammad Imran. 2020. Transformer based deep intelligent contextual embedding for Twitter sentiment analysis. Fut. Gen. Comput. Syst. 113 (2020), 58–69.Google ScholarCross Ref
Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. 2015. Efficient non-parametric estimation of multiple embeddings per word in vector space. CoRR abs/1504.06654 (2015).Google Scholar
Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. 2020. BERTweet: A pre-trained language model for English Tweets. arXiv preprint arXiv:2005.10200 (2020).Google Scholar
Thomas Niebler, Martin Becker, Christian Pölitz, and Andreas Hotho. 2017. Learning semantic relatedness from human feedback using metric learning. CoRR abs/1705.07425 (2017).Google Scholar
Alexander Pak and Patrick Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In International Conference on Language Resources and Evaluation.Google Scholar
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 (EMNLP’02). Association for Computational Linguistics, 79–86. DOI:DOI:https://doi.org/10.3115/1118693.1118704 Google ScholarDigital Library
U. Naseem, M. Khushi, S. K. Khan, K. Shaukat, and M. A. Moni. 2021. A comparative analysis of active learning for biomedical text mining. Applied System Innovation 4, 1 (2021), 23.Google ScholarCross Ref
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML’13). JMLR.org, III–1310–III–1318. Retrieved from http://dl.acm.org/citation.cfm?id=3042817.3043083. Google ScholarDigital Library
Cristian Patriche, Pîrn?u Gabriel, Adrian Grozavu, and Bogdan Ro?ca. 2016. A comparative analysis of binary logistic regression and analytical hierarchy process for landslide susceptibility assessment in the Dobrov River Basin, Romania. Pedosphere 26 (06 2016), 335–350. DOI:DOI:https://doi.org/10.1016/S1002-0160(15)60047-9Google ScholarCross Ref
C. V. Patriche, R. Pirnau, A. Grozavu, and B. Rosca. 2016. A comparative analysis of binary logistic regression and analytical hierarchy process for landslide susceptibility assessment in the Dobrov River Basin, Romania. Pedosphere 26, 3 (2016), 335–350.Google ScholarCross Ref
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 2227–2237. DOI:DOI:https://doi.org/10.18653/v1/N18-1202Google ScholarCross Ref
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018).Google Scholar
Yuval Pinter, Robert Guthrie, and Jacob Eisenstein. 2017. Mimicking word embeddings using subword RNNs. CoRR abs/1707.06961 (2017).Google Scholar
Pengda Qin, Weiran Xu, and Jun Guo. 2016. An empirical convolutional neural network approach for semantic relation classification. Neurocomputing 190, C (May 2016), 1–9. DOI:DOI:https://doi.org/10.1016/j.neucom.2015.12.091 Google ScholarDigital Library
Zhaowei Qu, Xiaomin Song, Shuqiang Zheng, Xiaoru Wang, Xiaohui Song, and Zuquan Li. 2018. Improved Bayes method based on TF-IDF feature and grade factor feature for Chinese information classification. In IEEE International Conference on Big Data and Smart Computing (BigComp’18). 677–680.Google ScholarCross Ref
J. R. Quinlan. 1987. Simplifying decision trees. Int. J. Man Mach. Stud. 27, 3 (1987), 221–234. DOI:DOI:https://doi.org/10.1016/S0020-7373(87)80053-6 Google ScholarDigital Library
J. R. Quinlan. 1986. Induction of decision trees. Mach. Learn. 1, 1 (Mar. 1986), 81–106. DOI:DOI:https://doi.org/10.1023/A:1022643204877 Google ScholarDigital Library
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language models are unsupervised multitask learners. Retrieved from https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf.Google Scholar
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).Google Scholar
Arshia Rehman, Saeeda Naz, Usman Naseem, Imran Razzak, and Ibrahim A. Hameed. 2019. Deep AutoEncoder-Decoder framework for semantic segmentation of brain tumor. Aust. J. Intell. Inf. Process. Syst. 15, 3 (2019), 53–60.Google Scholar
Yafeng Ren, Yue Zhang, Meishan Zhang, and Donghong Ji. 2016. Context-sensitive Twitter sentiment classification using neural network. In 30th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 215–221. Retrieved from http://dl.acm.org/citation.cfm?id=3015812.3015844. Google ScholarDigital Library
Jack Reuter, Jhonata Pereira-Martins, and Jugal Kalita. 2016. Segmenting Twitter hashtags. Int. J. Nat. Lang. Comput. 5 (08 2016), 23–36. DOI:DOI:https://doi.org/10.5121/ijnlc.2016.5402Google ScholarCross Ref
M. Jaggi, P. Mandal, S. Narang, U. Naseem, and M. Khushi. 2021. Text mining of stocktwits data for predicting stock prices. Applied System Innovation 4, 1 (2021), 13.Google ScholarCross Ref
Seyed Mahdi Rezaeinia, Ali Ghodsi, and Rouhollah Rahmani. 2017. Improving the accuracy of pre-trained word embeddings for sentiment analysis. CoRR abs/1711.08609 (2017).Google Scholar
Hassan Saif, Marta Fernandez Andres, Yulan He, and Harith Alani. 2013. Evaluation datasets for Twitter sentiment analysis: A survey and a new dataset, the STS-Gold. In International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI.Google Scholar
Mohammad Arshi Saloot, Norisma Idris, Nor Liyana Mohd Shuib, Ram Gopal Raj, and AiTi Aw. 2015. Toward tweets normalization using maximum entropy. In Workshop on Noisy User-generated Text, NUT@IJCNLP.Google ScholarCross Ref
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google Scholar
Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In Artificial Neural Networks – ICANN 2010, Konstantinos Diamantaras, Wlodek Duch, and Lazaros S. Iliadis (Eds.). Springer Berlin, 92–101. Google ScholarDigital Library
Seungil, David Ding, Kevin Canini, Jan Pfeifer, and Maya Gupta. 2017. Deep lattice networks and partial monotonic functions. arxiv:stat.ML/1709.06680.Google Scholar
Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’15). ACM, New York, NY, 959–962. DOI:DOI:https://doi.org/10.1145/2766462.2767830 Google ScholarDigital Library
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-LM: Training multi-billion parameter language models using model parallelism. arxiv:cs.CL/1909.08053.Google Scholar
Tajinder Singh and Madhu Kumari. 2016. Role of text pre-processing in Twitter sentiment analysis. https://www.sciencedirect.com/science/article/pii/S1877050916311607.Google Scholar
R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing. 1631–1642.Google Scholar
Saeid Soheily-Khah, Pierre-François Marteau, and Nicolas Béchet. 2017. Intrusion detection in network systems through hybrid supervised and unsupervised mining process- a detailed case study on the ISCX benchmark dataset -.Google Scholar
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450 (2019).Google Scholar
Karen Sparck Jones. 1988. A statistical interpretation of term specificity and its application in retrieval. In Document Retrieval Systems. Taylor Graham Publishing, London, UK, 132–142. Retrieved from http://dl.acm.org/citation.cfm?id=106765.106782. Google ScholarDigital Library
Robyn Speer, Joshua Chin, and Catherine Havasi. 2016. ConceptNet 5.5: An open multilingual graph of general knowledge. CoRR abs/1612.03975 (2016).Google Scholar
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).Google Scholar
Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. In AAAI Conference on Artificial Intelligence. 8968–8975.Google ScholarCross Ref
Ilya Sutskever, James Martens, and Geoffrey Hinton. 2011. Generating text with recurrent neural networks. In 28th International Conference on International Conference on Machine Learning (ICML’11). Omnipress, 1017–1024. Retrieved from http://dl.acm.org/citation.cfm?id=3104482.3104610. Google ScholarDigital Library
Jared Suttles and Nancy Ide. 2013. Distant supervision for emotion classification with discrete binary values. In Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 2 (CICLing’13). Springer-Verlag, Berlin, 121–136. DOI:DOI:https://doi.org/10.1007/978-3-642-37256-8_11 Google ScholarDigital Library
Symeon Symeonidis, Dimitrios Effrosynidis, and Avi Arampatzis. 2018. A comparative evaluation of pre-processing techniques and their interactions for Twitter sentiment analysis. Exp. Syst. Applic. 110 (2018), 298–310. DOI:DOI:https://doi.org/10.1016/j.eswa.2018.06.022Google ScholarCross Ref
Duyu Tang, Bing Qin, Furu Wei, Li Dong, Ting Liu, and Ming Zhou. 2015. A joint segmentation and classification framework for sentence level sentiment classification. IEEE/ACM Trans. Audio, Speech Lang. Process. 23, 11 (2015), 1750–61. Google ScholarDigital Library
Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, and Ming Zhou. 2016. Sentiment embeddings with applications to sentiment analysis. IEEE Trans. Knowl. Data Eng. 28, 2 (2016), 496–509. Google ScholarDigital Library
Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, and Ming Zhou. 2016. Sentiment embeddings with applications to sentiment analysis. IEEE Trans. Knowl. Data Eng. 28, 2 (Feb. 2016), 496–509. DOI:DOI:https://doi.org/10.1109/TKDE.2015.2489653 Google ScholarDigital Library
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for Twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 1555–1565. DOI:DOI:https://doi.org/10.3115/v1/P14-1146Google ScholarCross Ref
Alper Kursat Uysal and Serkan Günal. 2014. The impact of preprocessing on text classification. Inf. Process. Manag. 50 (2014), 104–112. Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., 6000–6010. Retrieved from http://dl.acm.org/citation.cfm?id=3295222.3295349. Google ScholarDigital Library
Byron Wallace. 2017. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, 253–263.Google Scholar
Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, and Luo Si. 2019. StructBERT: Incorporating language structures into pre-training for deep language understanding. arxiv:cs.CL/1908.04577.Google Scholar
Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, and Amit P. Sheth. 2012. Harnessing Twitter “big data” for automatic emotion identification. In International Conference on Privacy, Security, Risk and Trust and International Confernece on Social Computing. 587–592. Google ScholarDigital Library
Yequan Wang, Minlie Huang, Xiaoyan Zhu, and Li Zhao. 2016. Attention-based LSTM for aspect-level sentiment classification. In Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 606–615. DOI:DOI:https://doi.org/10.18653/v1/D16-1058 Google ScholarDigital Library
Yuyang Wang, Roni Khardon, and Pavlos Protopapas. 2012. Nonparametric Bayesian estimation of periodic light curves. Astrophy. J. 756, 1 (Aug. 2012), 67. DOI:DOI:https://doi.org/10.1088/0004-637x/756/1/67Google Scholar
Ikuya Yamada, Hideaki Takeda, and Yoshiyasu Takefuji. 2015. Enhancing named entity recognition in Twitter messages using entity linking. In Workshop on Noisy User-generated Text, NUT@IJCNLP.Google ScholarCross Ref
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019). Google ScholarDigital Library
Yukun Zhu, Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. CoRR abs/1506.06724 (2015). Google ScholarDigital Library

Index Terms

A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
  2. Machine learning
    1. Machine learning approaches

Recommendations

Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

This paper describes our research on statistical language modeling of Lithuanian. The idea of improving sparse n-gram models of highly inflected Lithuanian language by interpolating them with complex n-gram models based on word clustering and morphological ...
Read More
Learning distributed word representation with multi-contextual mixed embedding

Learning distributed word representations has been a popular method for various natural language processing applications such as word analogy and similarity, document classification and sentiment analysis. However, most existing word embedding models ...
Read More
Word2vec’s Distributed Word Representation for Hindi Word Sense Disambiguation
Distributed Computing and Internet Technology
Abstract
Word Sense Disambiguation (WSD) is the task of extracting an appropriate sense of an ambiguous word in a sentence. WSD is an essential task for language processing, as it is a pre-requisite for determining the closest interpretations of various ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20, Issue 5
September 2021
320 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3467024
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 June 2021
- Accepted: 1 November 2020
- Revised: 1 October 2020
- Received: 1 May 2020
Published in tallip Volume 20, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Text mining
natural language processing
word representation
language models
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 1,615
  Total Downloads
- Downloads (Last 12 months)555
- Downloads (Last 6 weeks)67
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

Learning distributed word representation with multi-contextual mixed embedding

Word2vec’s Distributed Word Representation for Hindi Word Sense Disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

Learning distributed word representation with multi-contextual mixed embedding

Word2vec’s Distributed Word Representation for Hindi Word Sense Disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media