skip to main content
research-article

A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models

Authors Info & Claims
Published:30 June 2021Publication History
Skip Abstract Section

Abstract

Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP-related tasks. In the end, this survey briefly discusses the commonly used ML- and DL-based classifiers, evaluation metrics, and the applications of these word embeddings in different NLP tasks.

References

  1. Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca J. Passonneau. 2011. Sentiment analysis of Twitter data. https://www.aclweb.org/anthology/W11-0705.pdf.Google ScholarGoogle Scholar
  2. Charu C. Aggarwal and Chandan K. Reddy. 2013. Data Clustering: Algorithms and Applications. CRC Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Charu C. Aggarwal and ChengXiang Zhai. 2012. A survey of text classification algorithms. In Mining Text Data. Springer, Boston, MA, 163–222.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Edgar Altszyler, Mariano Sigman, and Diego Fernández Slezak. 2016. Comparative study of LSA vs Word2vec embeddings in small corpora: A case study in dreams database. (2016). arxiv:abs/1610.01520.Google ScholarGoogle Scholar
  5. Alexandra Balahur. 2013. Sentiment analysis in social media texts. In WASSA@NAACL-HLT.Google ScholarGoogle Scholar
  6. Jorge A. Balazs and Juan D. Velásquez. 2016. Opinion mining and information fusion: A survey. Inf. Fus. 27 (2016), 95–110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Himani Bansal, Gulshan Shrivastava, Nguyen Nhu, and L. M. Stanciu (Eds.). 2018. Social Network Analytics for Contemporary Business Organizations. IGI Global. DOI:https://doi.org/10.4018/978-1-5225-5097-6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yanwei Bao, Changqin Quan, Lijuan Wang, and Fuji Ren. 2014. The role of pre-processing in Twitter sentiment analysis. In Intelligent Computing Methodologies, De-Shuang Huang, Kang-Hyun Jo, and Ling Wang (Eds.). Springer International Publishing, Cham, 615–624.Google ScholarGoogle Scholar
  9. Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A pretrained language model for scientific text. arxiv:cs.CL/1903.10676.Google ScholarGoogle Scholar
  10. Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 8 (2013), 1798–1828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137–1155. Retrieved from http://dl.acm.org/citation.cfm?id=944919.944966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Bengio, P. Simard, and P. Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5, 2 (Mar. 1994), 157–166. DOI:https://doi.org/10.1109/72.279181 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Adam Bermingham and Alan Smeaton. 2011. On using Twitter to monitor political sentiment and predict election results. In Proceedings of the Workshop on Sentiment Analysis Where AI Meets Psychology (’11). Asian Federation of Natural Language Processing, 2–10. Retrieved from https://www.aclweb.org/anthology/W11-3702.Google ScholarGoogle Scholar
  14. Marina Boia, Boi Faltings, Claudiu Cristian Musat, and Pearl Pu. 2013. A :) Is worth a thousand words: How people attach sentiment to emoticons and words in tweets. In International Conference on Social Computing. 345–350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016).Google ScholarGoogle Scholar
  16. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. CoRR abs/1607.04606 (2016).Google ScholarGoogle Scholar
  17. Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Conference on Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 4349–4357. Retrieved from http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jose Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artif. Intell. 240 (2016), 36–64. DOI:https://doi.org/10.1016/j.artint.2016.07.005Google ScholarGoogle ScholarCross RefCross Ref
  19. Erik Cambria, Soujanya Poria, Devamanyu Hazarika, and Kenneth Kwok. 2018. SenticNet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In Association for the Advancement of Artificial Intelligence Conference.Google ScholarGoogle Scholar
  20. Xavier Carreras and Lluís Màrquez. 2001. Boosting trees for anti-spam email filtering. CoRR cs.CL/0109015 (2001).Google ScholarGoogle Scholar
  21. Giuseppe Castellucci, Danilo Croce, and Roberto Basili. 2015. Acquiring a large scale polarity lexicon through unsupervised distributional methods. In Natural Language Processing and Information Systems, Chris Biemann, Siegfried Handschuh, André Freitas, Farid Meziane, and Elisabeth Métais (Eds.). Springer International Publishing, Cham, 73–86.Google ScholarGoogle Scholar
  22. Arda Celebi and Arzucan Ozgur. 2016. Segmenting hashtags using automatically created training data. https://www.aclweb.org/anthology/L16-1476.pdf.Google ScholarGoogle Scholar
  23. Wei James Chen, Xiaoshen Xie, Jiale Wang, Biswajeet Pradhan, Haoyuan Hong, Dieu Tien Bui, Zhao Duan, and Jianquan Ma. 2017. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. https://www.sciencedirect.com/science/article/pii/S0341816216305136.Google ScholarGoogle Scholar
  24. Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 1724–1734. DOI:https://doi.org/10.3115/v1/D14-1179Google ScholarGoogle ScholarCross RefCross Ref
  25. Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014).Google ScholarGoogle Scholar
  26. Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).Google ScholarGoogle Scholar
  27. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In 25th International Conference on Machine Learning (ICML’08). ACM, New York, NY, 160–167. DOI:https://doi.org/10.1145/1390156.1390177 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural language processing (almost) from scratch. CoRR abs/1103.0398 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Thomas Davidson, Dana Warmsley, Michael W. Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. CoRR abs/1703.04009 (2017).Google ScholarGoogle Scholar
  30. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018).Google ScholarGoogle Scholar
  31. Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W. Cohen. 2017. A comparative study of word embeddings for reading comprehension. CoRR abs/1703.00993 (2017).Google ScholarGoogle Scholar
  32. Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Conference on Advances in Neural Information Processing Systems. 13063–13075. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Cícero Nogueira dos Santos and Maíra A. de C. Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In International Conference on Computational Linguistics.Google ScholarGoogle Scholar
  34. Ábel Elekes, Adrian Englhardt, Schäler, and Klemens Böhm. 2018. Toward meaningful notions of similarity in NLP embedding models. Int. J. Dig. Libr. (04 2018). DOI:https://doi.org/10.1007/s00799-018-0237-yGoogle ScholarGoogle ScholarCross RefCross Ref
  35. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9 (June 2008), 1871–1874. Retrieved from http://dl.acm.org/citation.cfm?id=1390681.1442794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard H. Hovy, and Noah A. Smith. 2014. Retrofitting word vectors to semantic lexicons. CoRR abs/1411.4166 (2014).Google ScholarGoogle Scholar
  37. Jennifer Foster, Özlem Çetinoğlu, Joachim Wagner, Joseph Le Roux, Joakim Nivre, Deirdre Hogan, and Josef van Genabith. 2011. From news to comment: Resources and benchmarks for parsing the language of Web 2.0. In Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 893–901. Retrieved from https://www.aclweb.org/anthology/I11-1100.Google ScholarGoogle Scholar
  38. Xianghua Fu, Wangwang Liu, Yingying Xu, and Laizhong Cui. 2017. Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis. Neurocomputing 241 (2017), 18–27.Google ScholarGoogle ScholarCross RefCross Ref
  39. Alexander Genkin, David D. Lewis, and David Madigan. 2007. Large-scale Bayesian logistic regression for text categorization. Technometrics 49, 3 (2007), 291–304. DOI:https://doi.org/10.1198/004017007000000245Google ScholarGoogle ScholarCross RefCross Ref
  40. Anastasia Giachanou, Julio Gonzalo, Ida Mele, and Fabio Crestani. 2017. Sentiment propagation for predicting reputation polarity. DOI:https://doi.org/10.1007/978-3-319-56608-5_18Google ScholarGoogle Scholar
  41. Kevin Gimpel, Nathan Schneider, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. [n.d.]. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. https://www.aclweb.org/anthology/P11-2008.pdf.Google ScholarGoogle Scholar
  42. Christian Giovanelli, Xin U. Liu, Seppo Antero Sierla, Valeriy Vyatkin, and Ryutaro Ichise. 2017. Towards an aggregator that exploits big data to bid on frequency containment reserve market. In 43rd Conference of the IEEE Industrial Electronics Society. 7514–7519.Google ScholarGoogle ScholarCross RefCross Ref
  43. Edel Greevy. 2004. Automatic text categorisation of racist webpages. http://doras.dcu.ie/17275/1/edel_greevy_20120702122736.pdf.Google ScholarGoogle Scholar
  44. Vishal Gupta and Gurpreet Lehal. 2009. A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1 (08 2009). DOI:DOI:https://doi.org/10.4304/jetwi.1.1.60-76Google ScholarGoogle Scholar
  45. Emma Haddi, Xiaohui Liu, and Yong Shi. 2013. The role of text pre-processing in sentiment analysis. In International Conference on Information Technology and Quantitative Management.Google ScholarGoogle ScholarCross RefCross Ref
  46. Khaled M. Hammouda and Mohamed S. Kamel. 2004. Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16, 10 (Oct. 2004), 1279–1296. DOI:DOI:https://doi.org/10.1109/TKDE.2004.58 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yulan He, Chenghua Lin, and Harith Alani. 2011. Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In 49th Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 123–131. Retrieved from https://www.aclweb.org/anthology/P11-1013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Aurélie Herbelot and Marco Baroni. 2017. High-risk learning: Acquiring new word vectors from tiny data. CoRR abs/1707.06556 (2017).Google ScholarGoogle Scholar
  49. Bruce M. Hill. 1968. Posterior distribution of percentiles: Bayes’ theorem for sampling from a population. J. Amer. Statist. Assoc. 63, 322 (1968), 677–691. Retrieved from http://www.jstor.org/stable/2284038.Google ScholarGoogle Scholar
  50. Tin Kam Ho. 1998. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8 (Aug. 1998), 832–844. DOI:DOI:https://doi.org/10.1109/34.709601 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780. DOI:DOI:https://doi.org/10.1162/neco.1997.9.8.1735 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In 56th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 328–339. DOI:DOI:https://doi.org/10.18653/v1/P18-1031Google ScholarGoogle ScholarCross RefCross Ref
  53. Xia Hu and Huan Liu. 2012. Text Analytics in Social Media. Springer US, Boston, MA, 385–414. DOI:DOI:https://doi.org/10.1007/978-1-4614-3223-4_12Google ScholarGoogle Scholar
  54. Ah hwee Tan. 1999. Text mining: The state of the art and the challenges. In Workshop on Knowledge Discovery from Advanced Databases. 65–70.Google ScholarGoogle Scholar
  55. Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. SensEmbed: Learning sense embeddings for word and relational similarity. In Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  56. Suzana Ilic, Edison Marrese-Taylor, Jorge A. Balazs, and Yutaka Matsuo. 2018. Deep contextualized word representations for detecting sarcasm and irony. CoRR abs/1809.09795 (2018).Google ScholarGoogle Scholar
  57. Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Reading text in the wild with convolutional neural networks. CoRR abs/1412.1842 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zhao Jianqiang. 2015. Pre-processing boosting Twitter sentiment analysis?DOI:DOI:https://doi.org/10.1109/SmartCity.2015.158Google ScholarGoogle Scholar
  59. Zhao Jianqiang and Gui Xiaolin. 2017. Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5 (2017), 2870–2879.Google ScholarGoogle ScholarCross RefCross Ref
  60. Zhao Jianqiang and Gui Xiaolin. 2018. Deep convolution neural networks for Twitter sentiment analysis. IEEE Access PP (01 2018), 1–1. DOI:DOI:https://doi.org/10.1109/ACCESS.2017.2776930Google ScholarGoogle ScholarCross RefCross Ref
  61. Rie Johnson and Tong Zhang. 2014. Effective use of word order for text categorization with convolutional neural networks. CoRR abs/1412.1058 (2014).Google ScholarGoogle Scholar
  62. Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Ling. 8 (2020), 64–77.Google ScholarGoogle ScholarCross RefCross Ref
  63. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. CoRR abs/1607.01759 (2016).Google ScholarGoogle Scholar
  64. Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. 2019. CTRL: A conditional transformer language model for controllable generation. arxiv:cs.CL/1909.05858.Google ScholarGoogle Scholar
  65. Farhan Hassan Khan, Saba Bashir, and Usman Qamar. 2014. TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57 (Jan. 2014), 245–257. DOI:DOI:https://doi.org/10.1016/j.dss.2013.09.004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).Google ScholarGoogle Scholar
  67. Vandana Korde and C. Namrata Mahender. 2012. Text classification and classifiers: A survey. http://www.airccse.org/journal/ijaia/papers/3212ijaia08.pdfhttps://www.researchgate.net/publication/276196340_Text_Classification_and_ClassifiersA_Survey.Google ScholarGoogle Scholar
  68. Efthymios Kouloumpis, Theresa Wilson, and Johanna D. Moore. 2011. Twitter sentiment analysis: The good the bad and the OMG! In International AAAI Conference on Web and Social Media.Google ScholarGoogle Scholar
  69. Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura E. Barnes, and Donald E. Brown. 2019. Text classification algorithms: A survey. CoRR abs/1904.08067 (2019).Google ScholarGoogle Scholar
  70. Irene Kwok and Yuzhou Wang. 2013. Locate the hate: Detecting tweets against blacks. In 27th AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, 1621–1622. Retrieved from http://dl.acm.org/citation.cfm?id=2891460.2891697. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019).Google ScholarGoogle Scholar
  72. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arxiv:cs.CL/1909.11942.Google ScholarGoogle Scholar
  73. Ray R. Larson. 2010. Introduction to information retrieval. J. Amer. Soc. Inf. Sci. Technol. 61, 4 (Apr. 2010), 852–853. DOI:DOI:https://doi.org/10.1002/asi.v61:4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Paula Lauren, Guangzhi Qu, Feng Zhang, and Amaury Lendasse. 2018. Discriminant document embeddings with an extreme learning machine for classifying clinical narratives. Neurocomputing 277 (2018), 129–138. DOI:DOI:https://doi.org/10.1016/j.neucom.2017.01.117Google ScholarGoogle ScholarCross RefCross Ref
  75. Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. CoRR abs/1405.4053 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Yann LeCun, Y. Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (05 2015), 436–44. DOI:DOI:https://doi.org/10.1038/nature14539Google ScholarGoogle Scholar
  77. Yann Lecun, Leon Bottou, Y. Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86 (12 1998), 2278–2324. DOI:DOI:https://doi.org/10.1109/5.726791Google ScholarGoogle ScholarCross RefCross Ref
  78. Ledell, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, and Jason Weston. 2017. StarSpace: Embed all the things!arxiv:cs.CL/1709.03856.Google ScholarGoogle Scholar
  79. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. arxiv:cs.CL/1901.08746.Google ScholarGoogle Scholar
  80. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).Google ScholarGoogle Scholar
  81. Liang-Chih, Jin Wang, K. Robert Lai, and Xuejie Zhang. 2018. Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Trans. Audio, Speech Lang. Proc. 26, 3 (Mar. 2018), 671–681. DOI:DOI:https://doi.org/10.1109/TASLP.2017.2788182 Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Chenghua Lin and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In 18th ACM Conference on Information and Knowledge Management (CIKM’09). ACM, New York, NY, 375–384. DOI:DOI:https://doi.org/10.1145/1645953.1646003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2015. Learning context-sensitive word embeddings with neural tensor skip-gram model. In 24th International Conference on Artificial Intelligence (IJCAI’15). AAAI Press, 1284–1290. Retrieved from http://dl.acm.org/citation.cfm?id=2832415.2832428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Shuhua Liu and Thomas Forss. 2014. Combining N-gram based similarity analysis with sentiment analysis in web content classification. In International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1 (IC3K’14). SCITEPRESS - Science and Technology Publications, Lda, 530–537. DOI:DOI:https://doi.org/10.5220/0005170305300537 Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Yuanchao Liu, Bingquan Liu, Lili Shan, and Xin Wang. 2018. Modelling context with neural networks for recommending idioms in essay writing. Neurocomputing 275 (2018), 2287–2293. DOI:DOI:https://doi.org/10.1016/j.neucom.2017.11.005Google ScholarGoogle ScholarCross RefCross Ref
  86. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019).Google ScholarGoogle Scholar
  87. David M. Magerman. 1995. Statistical decision-tree models for parsing. In 33rd Meeting on Association for Computational Linguistics (ACL’95). Association for Computational Linguistics, 276–283. DOI:DOI:https://doi.org/10.3115/981658.981695 Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Danilo P. Mandic and Jonathon Chambers. 2001. Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability. John Wiley & Sons, Inc., New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Meeting of the Association for Computational Linguistics (System Demonstrations). 55–60.Google ScholarGoogle ScholarCross RefCross Ref
  90. Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. CoRR abs/1708.00107 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Conference on Advances in Neural Information Processing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Yelena Mejova and Padmini Srinivasan. 2011. Exploring feature definition and selection for sentiment classifiers.Google ScholarGoogle Scholar
  93. Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, 51–61. DOI:DOI:https://doi.org/10.18653/v1/K16-1006Google ScholarGoogle ScholarCross RefCross Ref
  94. Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In Conference on Computational Natural Language Learning.Google ScholarGoogle ScholarCross RefCross Ref
  95. Prem Melville, Wojciech Gryc, and Richard D. Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 1275–1284. DOI:DOI:https://doi.org/10.1145/1557019.1557156 Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Conference on Advances in Neural Information Processing Systems. 3111–9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Saif Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu. 2013. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Association for Computational Linguistics, 321–327. Retrieved from http://aclweb.org/anthology/S13-2053.Google ScholarGoogle Scholar
  98. James N. Morgan and John A. Sonquist. 1963. Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc. 58, 302 (1963), 415–434. Retrieved from http://www.jstor.org/stable/2283276.Google ScholarGoogle ScholarCross RefCross Ref
  99. Nikola Mrksic, Ivan Vulic, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gasic, Anna Korhonen, and Steve J. Young. 2017. Semantic specialisation of distributional word vector spaces using monolingual and cross-lingual constraints. CoRR abs/1706.00374 (2017).Google ScholarGoogle Scholar
  100. T. Mullen and R. Malouf. 2006. A preliminary investigation into sentiment analysis of informal political discourse. AAAI Spring Symposium - Technical Report SS-06-03 (2006), 159–162. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-33747172751&partnerID=40&md5=6b12793b70eae006102989ed6d398fcb.Google ScholarGoogle Scholar
  101. Martin Müller, Marcel Salathé, and Per E. Kummervold. 2020. COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter. arXiv preprint arXiv:2005.07503 (2020).Google ScholarGoogle Scholar
  102. Marwa Naili, Anja Habacha Chaibi, and Henda Hajjami Ben Ghezala. 2017. Comparative study of word embedding methods in topic segmentation. Procedia Comput. Sci. 112 (2017), 340–349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Vivek Narayanan, Ishan Arora, and Arjun Bhatia. 2013. Fast and accurate sentiment classification using an enhanced Naive Bayes model. CoRR abs/1305.6143 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Usman Naseem. 2020. Hybrid Words Representation for the Classification of Low Quality Text. Ph.D. Dissertation. University of Technology Sydney, Australia.Google ScholarGoogle Scholar
  105. U. Naseem, S. K. Khan, M. Farasat, and F. Ali. 2019. Abusive language detection: A comprehensive review. Indian J. Sci. Technol. 12, 45 (2019), 1–13.Google ScholarGoogle Scholar
  106. U. Naseem, I. Razzak, and P. W. Eklund. 2020. A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimedia Tools and Applications, 1–28.Google ScholarGoogle Scholar
  107. Usman Naseem, Shah Khalid Khan, Imran Razzak, and Ibrahim A. Hameed. 2019. Hybrid words representation for airlines sentiment analysis. In AI 2019: Advances in Artificial Intelligence, Jixue Liu and James Bailey (Eds.). Springer International Publishing, Cham, 381–392.Google ScholarGoogle Scholar
  108. Usman Naseem, Matloob Khushi, Shah Khalid Khan, Nazar Waheed, Adnan Mir, Atika Qazi, Bandar Alshammari, and Simon K. Poon. 2020. Diabetic retinopathy detection using multi-layer neural networks and split attention with focal loss. In International Conference on Neural Information Processing. Springer, 1–12.Google ScholarGoogle Scholar
  109. Usman Naseem, Matloob Khushi, Vinay Reddy, Sakthivel Rajendran, Imran Razzak, and Jinman Kim. 2020. BioALBERT: A simple and effective pre-trained language model for biomedical named entity recognition. arXiv preprint arXiv:2009.09223 (2020).Google ScholarGoogle Scholar
  110. Usman Naseem and Katarzyna Musial. 2019. DICE: Deep intelligent contextual embedding for Twitter sentiment analysis. In International Conference on Document Analysis and Recognition (ICDAR’19). IEEE, 953–958.Google ScholarGoogle ScholarCross RefCross Ref
  111. U. Naseem, I. Razzak, M. Khushi, P. W. Eklund, and J. Kim. 2021. COVIDSenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis. IEEE Transactions on Computational Social Systems.Google ScholarGoogle ScholarCross RefCross Ref
  112. Usman Naseem, Katarzyna Musial, Peter Eklund, and Mukesh Prasad. 2020. Biomedical named-entity recognition by hierarchically fusing BioBERT representations and deep contextual-level word-embedding. In International Joint Conference on Neural Networks (IJCNN’20). IEEE, 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  113. Usman Naseem, Imran Razzak, Peter Eklund, and Katarzyna Musial. 2020. Towards improved deep contextual embedding for the identification of irony and sarcasm. In International Joint Conference on Neural Networks (IJCNN’20). IEEE, 1–7.Google ScholarGoogle ScholarCross RefCross Ref
  114. Usman Naseem, Imran Razzak, and Ibrahim A. Hameed. 2019. Deep context-aware embedding for abusive and hate speech detection on Twitter. Aust. J. Intell. Inf. Process. Syst. 15, 3 (2019), 69–76.Google ScholarGoogle Scholar
  115. Usman Naseem, Imran Razzak, Katarzyna Musial, and Muhammad Imran. 2020. Transformer based deep intelligent contextual embedding for Twitter sentiment analysis. Fut. Gen. Comput. Syst. 113 (2020), 58–69.Google ScholarGoogle ScholarCross RefCross Ref
  116. Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. 2015. Efficient non-parametric estimation of multiple embeddings per word in vector space. CoRR abs/1504.06654 (2015).Google ScholarGoogle Scholar
  117. Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. 2020. BERTweet: A pre-trained language model for English Tweets. arXiv preprint arXiv:2005.10200 (2020).Google ScholarGoogle Scholar
  118. Thomas Niebler, Martin Becker, Christian Pölitz, and Andreas Hotho. 2017. Learning semantic relatedness from human feedback using metric learning. CoRR abs/1705.07425 (2017).Google ScholarGoogle Scholar
  119. Alexander Pak and Patrick Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In International Conference on Language Resources and Evaluation.Google ScholarGoogle Scholar
  120. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 (EMNLP’02). Association for Computational Linguistics, 79–86. DOI:DOI:https://doi.org/10.3115/1118693.1118704 Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. U. Naseem, M. Khushi, S. K. Khan, K. Shaukat, and M. A. Moni. 2021. A comparative analysis of active learning for biomedical text mining. Applied System Innovation 4, 1 (2021), 23.Google ScholarGoogle ScholarCross RefCross Ref
  122. Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML’13). JMLR.org, III–1310–III–1318. Retrieved from http://dl.acm.org/citation.cfm?id=3042817.3043083. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Cristian Patriche, Pîrn?u Gabriel, Adrian Grozavu, and Bogdan Ro?ca. 2016. A comparative analysis of binary logistic regression and analytical hierarchy process for landslide susceptibility assessment in the Dobrov River Basin, Romania. Pedosphere 26 (06 2016), 335–350. DOI:DOI:https://doi.org/10.1016/S1002-0160(15)60047-9Google ScholarGoogle ScholarCross RefCross Ref
  124. C. V. Patriche, R. Pirnau, A. Grozavu, and B. Rosca. 2016. A comparative analysis of binary logistic regression and analytical hierarchy process for landslide susceptibility assessment in the Dobrov River Basin, Romania. Pedosphere 26, 3 (2016), 335–350.Google ScholarGoogle ScholarCross RefCross Ref
  125. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 2227–2237. DOI:DOI:https://doi.org/10.18653/v1/N18-1202Google ScholarGoogle ScholarCross RefCross Ref
  126. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018).Google ScholarGoogle Scholar
  127. Yuval Pinter, Robert Guthrie, and Jacob Eisenstein. 2017. Mimicking word embeddings using subword RNNs. CoRR abs/1707.06961 (2017).Google ScholarGoogle Scholar
  128. Pengda Qin, Weiran Xu, and Jun Guo. 2016. An empirical convolutional neural network approach for semantic relation classification. Neurocomputing 190, C (May 2016), 1–9. DOI:DOI:https://doi.org/10.1016/j.neucom.2015.12.091 Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Zhaowei Qu, Xiaomin Song, Shuqiang Zheng, Xiaoru Wang, Xiaohui Song, and Zuquan Li. 2018. Improved Bayes method based on TF-IDF feature and grade factor feature for Chinese information classification. In IEEE International Conference on Big Data and Smart Computing (BigComp’18). 677–680.Google ScholarGoogle ScholarCross RefCross Ref
  130. J. R. Quinlan. 1987. Simplifying decision trees. Int. J. Man Mach. Stud. 27, 3 (1987), 221–234. DOI:DOI:https://doi.org/10.1016/S0020-7373(87)80053-6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. J. R. Quinlan. 1986. Induction of decision trees. Mach. Learn. 1, 1 (Mar. 1986), 81–106. DOI:DOI:https://doi.org/10.1023/A:1022643204877 Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language models are unsupervised multitask learners. Retrieved from https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf.Google ScholarGoogle Scholar
  133. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).Google ScholarGoogle Scholar
  134. Arshia Rehman, Saeeda Naz, Usman Naseem, Imran Razzak, and Ibrahim A. Hameed. 2019. Deep AutoEncoder-Decoder framework for semantic segmentation of brain tumor. Aust. J. Intell. Inf. Process. Syst. 15, 3 (2019), 53–60.Google ScholarGoogle Scholar
  135. Yafeng Ren, Yue Zhang, Meishan Zhang, and Donghong Ji. 2016. Context-sensitive Twitter sentiment classification using neural network. In 30th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 215–221. Retrieved from http://dl.acm.org/citation.cfm?id=3015812.3015844. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Jack Reuter, Jhonata Pereira-Martins, and Jugal Kalita. 2016. Segmenting Twitter hashtags. Int. J. Nat. Lang. Comput. 5 (08 2016), 23–36. DOI:DOI:https://doi.org/10.5121/ijnlc.2016.5402Google ScholarGoogle ScholarCross RefCross Ref
  137. M. Jaggi, P. Mandal, S. Narang, U. Naseem, and M. Khushi. 2021. Text mining of stocktwits data for predicting stock prices. Applied System Innovation 4, 1 (2021), 13.Google ScholarGoogle ScholarCross RefCross Ref
  138. Seyed Mahdi Rezaeinia, Ali Ghodsi, and Rouhollah Rahmani. 2017. Improving the accuracy of pre-trained word embeddings for sentiment analysis. CoRR abs/1711.08609 (2017).Google ScholarGoogle Scholar
  139. Hassan Saif, Marta Fernandez Andres, Yulan He, and Harith Alani. 2013. Evaluation datasets for Twitter sentiment analysis: A survey and a new dataset, the STS-Gold. In International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI.Google ScholarGoogle Scholar
  140. Mohammad Arshi Saloot, Norisma Idris, Nor Liyana Mohd Shuib, Ram Gopal Raj, and AiTi Aw. 2015. Toward tweets normalization using maximum entropy. In Workshop on Noisy User-generated Text, NUT@IJCNLP.Google ScholarGoogle ScholarCross RefCross Ref
  141. Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google ScholarGoogle Scholar
  142. Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In Artificial Neural Networks – ICANN 2010, Konstantinos Diamantaras, Wlodek Duch, and Lazaros S. Iliadis (Eds.). Springer Berlin, 92–101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. Seungil, David Ding, Kevin Canini, Jan Pfeifer, and Maya Gupta. 2017. Deep lattice networks and partial monotonic functions. arxiv:stat.ML/1709.06680.Google ScholarGoogle Scholar
  144. Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’15). ACM, New York, NY, 959–962. DOI:DOI:https://doi.org/10.1145/2766462.2767830 Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-LM: Training multi-billion parameter language models using model parallelism. arxiv:cs.CL/1909.08053.Google ScholarGoogle Scholar
  146. Tajinder Singh and Madhu Kumari. 2016. Role of text pre-processing in Twitter sentiment analysis. https://www.sciencedirect.com/science/article/pii/S1877050916311607.Google ScholarGoogle Scholar
  147. R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing. 1631–1642.Google ScholarGoogle Scholar
  148. Saeid Soheily-Khah, Pierre-François Marteau, and Nicolas Béchet. 2017. Intrusion detection in network systems through hybrid supervised and unsupervised mining process- a detailed case study on the ISCX benchmark dataset -.Google ScholarGoogle Scholar
  149. Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450 (2019).Google ScholarGoogle Scholar
  150. Karen Sparck Jones. 1988. A statistical interpretation of term specificity and its application in retrieval. In Document Retrieval Systems. Taylor Graham Publishing, London, UK, 132–142. Retrieved from http://dl.acm.org/citation.cfm?id=106765.106782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Robyn Speer, Joshua Chin, and Catherine Havasi. 2016. ConceptNet 5.5: An open multilingual graph of general knowledge. CoRR abs/1612.03975 (2016).Google ScholarGoogle Scholar
  152. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).Google ScholarGoogle Scholar
  153. Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. In AAAI Conference on Artificial Intelligence. 8968–8975.Google ScholarGoogle ScholarCross RefCross Ref
  154. Ilya Sutskever, James Martens, and Geoffrey Hinton. 2011. Generating text with recurrent neural networks. In 28th International Conference on International Conference on Machine Learning (ICML’11). Omnipress, 1017–1024. Retrieved from http://dl.acm.org/citation.cfm?id=3104482.3104610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Jared Suttles and Nancy Ide. 2013. Distant supervision for emotion classification with discrete binary values. In Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 2 (CICLing’13). Springer-Verlag, Berlin, 121–136. DOI:DOI:https://doi.org/10.1007/978-3-642-37256-8_11 Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. Symeon Symeonidis, Dimitrios Effrosynidis, and Avi Arampatzis. 2018. A comparative evaluation of pre-processing techniques and their interactions for Twitter sentiment analysis. Exp. Syst. Applic. 110 (2018), 298–310. DOI:DOI:https://doi.org/10.1016/j.eswa.2018.06.022Google ScholarGoogle ScholarCross RefCross Ref
  157. Duyu Tang, Bing Qin, Furu Wei, Li Dong, Ting Liu, and Ming Zhou. 2015. A joint segmentation and classification framework for sentence level sentiment classification. IEEE/ACM Trans. Audio, Speech Lang. Process. 23, 11 (2015), 1750–61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, and Ming Zhou. 2016. Sentiment embeddings with applications to sentiment analysis. IEEE Trans. Knowl. Data Eng. 28, 2 (2016), 496–509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, and Ming Zhou. 2016. Sentiment embeddings with applications to sentiment analysis. IEEE Trans. Knowl. Data Eng. 28, 2 (Feb. 2016), 496–509. DOI:DOI:https://doi.org/10.1109/TKDE.2015.2489653 Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for Twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 1555–1565. DOI:DOI:https://doi.org/10.3115/v1/P14-1146Google ScholarGoogle ScholarCross RefCross Ref
  161. Alper Kursat Uysal and Serkan Günal. 2014. The impact of preprocessing on text classification. Inf. Process. Manag. 50 (2014), 104–112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., 6000–6010. Retrieved from http://dl.acm.org/citation.cfm?id=3295222.3295349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. Byron Wallace. 2017. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, 253–263.Google ScholarGoogle Scholar
  164. Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, and Luo Si. 2019. StructBERT: Incorporating language structures into pre-training for deep language understanding. arxiv:cs.CL/1908.04577.Google ScholarGoogle Scholar
  165. Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, and Amit P. Sheth. 2012. Harnessing Twitter “big data” for automatic emotion identification. In International Conference on Privacy, Security, Risk and Trust and International Confernece on Social Computing. 587–592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  166. Yequan Wang, Minlie Huang, Xiaoyan Zhu, and Li Zhao. 2016. Attention-based LSTM for aspect-level sentiment classification. In Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 606–615. DOI:DOI:https://doi.org/10.18653/v1/D16-1058 Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. Yuyang Wang, Roni Khardon, and Pavlos Protopapas. 2012. Nonparametric Bayesian estimation of periodic light curves. Astrophy. J. 756, 1 (Aug. 2012), 67. DOI:DOI:https://doi.org/10.1088/0004-637x/756/1/67Google ScholarGoogle Scholar
  168. Ikuya Yamada, Hideaki Takeda, and Yoshiyasu Takefuji. 2015. Enhancing named entity recognition in Twitter messages using entity linking. In Workshop on Noisy User-generated Text, NUT@IJCNLP.Google ScholarGoogle ScholarCross RefCross Ref
  169. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019). Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. Yukun Zhu, Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. CoRR abs/1506.06724 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 5
        September 2021
        320 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3467024
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 June 2021
        • Accepted: 1 November 2020
        • Revised: 1 October 2020
        • Received: 1 May 2020
        Published in tallip Volume 20, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format