Abstract
Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP-related tasks. In the end, this survey briefly discusses the commonly used ML- and DL-based classifiers, evaluation metrics, and the applications of these word embeddings in different NLP tasks.
- Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca J. Passonneau. 2011. Sentiment analysis of Twitter data. https://www.aclweb.org/anthology/W11-0705.pdf.Google Scholar
- Charu C. Aggarwal and Chandan K. Reddy. 2013. Data Clustering: Algorithms and Applications. CRC Press. Google ScholarDigital Library
- Charu C. Aggarwal and ChengXiang Zhai. 2012. A survey of text classification algorithms. In Mining Text Data. Springer, Boston, MA, 163–222.Google ScholarDigital Library
- Edgar Altszyler, Mariano Sigman, and Diego Fernández Slezak. 2016. Comparative study of LSA vs Word2vec embeddings in small corpora: A case study in dreams database. (2016). arxiv:abs/1610.01520.Google Scholar
- Alexandra Balahur. 2013. Sentiment analysis in social media texts. In WASSA@NAACL-HLT.Google Scholar
- Jorge A. Balazs and Juan D. Velásquez. 2016. Opinion mining and information fusion: A survey. Inf. Fus. 27 (2016), 95–110. Google ScholarDigital Library
- Himani Bansal, Gulshan Shrivastava, Nguyen Nhu, and L. M. Stanciu (Eds.). 2018. Social Network Analytics for Contemporary Business Organizations. IGI Global. DOI:https://doi.org/10.4018/978-1-5225-5097-6 Google ScholarDigital Library
- Yanwei Bao, Changqin Quan, Lijuan Wang, and Fuji Ren. 2014. The role of pre-processing in Twitter sentiment analysis. In Intelligent Computing Methodologies, De-Shuang Huang, Kang-Hyun Jo, and Ling Wang (Eds.). Springer International Publishing, Cham, 615–624.Google Scholar
- Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A pretrained language model for scientific text. arxiv:cs.CL/1903.10676.Google Scholar
- Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 8 (2013), 1798–1828. Google ScholarDigital Library
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137–1155. Retrieved from http://dl.acm.org/citation.cfm?id=944919.944966. Google ScholarDigital Library
- Y. Bengio, P. Simard, and P. Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5, 2 (Mar. 1994), 157–166. DOI:https://doi.org/10.1109/72.279181 Google ScholarDigital Library
- Adam Bermingham and Alan Smeaton. 2011. On using Twitter to monitor political sentiment and predict election results. In Proceedings of the Workshop on Sentiment Analysis Where AI Meets Psychology (’11). Asian Federation of Natural Language Processing, 2–10. Retrieved from https://www.aclweb.org/anthology/W11-3702.Google Scholar
- Marina Boia, Boi Faltings, Claudiu Cristian Musat, and Pearl Pu. 2013. A :) Is worth a thousand words: How people attach sentiment to emoticons and words in tweets. In International Conference on Social Computing. 345–350. Google ScholarDigital Library
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016).Google Scholar
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. CoRR abs/1607.04606 (2016).Google Scholar
- Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Conference on Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 4349–4357. Retrieved from http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf. Google ScholarDigital Library
- Jose Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artif. Intell. 240 (2016), 36–64. DOI:https://doi.org/10.1016/j.artint.2016.07.005Google ScholarCross Ref
- Erik Cambria, Soujanya Poria, Devamanyu Hazarika, and Kenneth Kwok. 2018. SenticNet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In Association for the Advancement of Artificial Intelligence Conference.Google Scholar
- Xavier Carreras and Lluís Màrquez. 2001. Boosting trees for anti-spam email filtering. CoRR cs.CL/0109015 (2001).Google Scholar
- Giuseppe Castellucci, Danilo Croce, and Roberto Basili. 2015. Acquiring a large scale polarity lexicon through unsupervised distributional methods. In Natural Language Processing and Information Systems, Chris Biemann, Siegfried Handschuh, André Freitas, Farid Meziane, and Elisabeth Métais (Eds.). Springer International Publishing, Cham, 73–86.Google Scholar
- Arda Celebi and Arzucan Ozgur. 2016. Segmenting hashtags using automatically created training data. https://www.aclweb.org/anthology/L16-1476.pdf.Google Scholar
- Wei James Chen, Xiaoshen Xie, Jiale Wang, Biswajeet Pradhan, Haoyuan Hong, Dieu Tien Bui, Zhao Duan, and Jianquan Ma. 2017. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. https://www.sciencedirect.com/science/article/pii/S0341816216305136.Google Scholar
- Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 1724–1734. DOI:https://doi.org/10.3115/v1/D14-1179Google ScholarCross Ref
- Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014).Google Scholar
- Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).Google Scholar
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In 25th International Conference on Machine Learning (ICML’08). ACM, New York, NY, 160–167. DOI:https://doi.org/10.1145/1390156.1390177 Google ScholarDigital Library
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural language processing (almost) from scratch. CoRR abs/1103.0398 (2011). Google ScholarDigital Library
- Thomas Davidson, Dana Warmsley, Michael W. Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. CoRR abs/1703.04009 (2017).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018).Google Scholar
- Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W. Cohen. 2017. A comparative study of word embeddings for reading comprehension. CoRR abs/1703.00993 (2017).Google Scholar
- Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Conference on Advances in Neural Information Processing Systems. 13063–13075. Google ScholarDigital Library
- Cícero Nogueira dos Santos and Maíra A. de C. Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In International Conference on Computational Linguistics.Google Scholar
- Ábel Elekes, Adrian Englhardt, Schäler, and Klemens Böhm. 2018. Toward meaningful notions of similarity in NLP embedding models. Int. J. Dig. Libr. (04 2018). DOI:https://doi.org/10.1007/s00799-018-0237-yGoogle ScholarCross Ref
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9 (June 2008), 1871–1874. Retrieved from http://dl.acm.org/citation.cfm?id=1390681.1442794. Google ScholarDigital Library
- Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard H. Hovy, and Noah A. Smith. 2014. Retrofitting word vectors to semantic lexicons. CoRR abs/1411.4166 (2014).Google Scholar
- Jennifer Foster, Özlem Çetinoğlu, Joachim Wagner, Joseph Le Roux, Joakim Nivre, Deirdre Hogan, and Josef van Genabith. 2011. From news to comment: Resources and benchmarks for parsing the language of Web 2.0. In Proceedings of 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 893–901. Retrieved from https://www.aclweb.org/anthology/I11-1100.Google Scholar
- Xianghua Fu, Wangwang Liu, Yingying Xu, and Laizhong Cui. 2017. Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis. Neurocomputing 241 (2017), 18–27.Google ScholarCross Ref
- Alexander Genkin, David D. Lewis, and David Madigan. 2007. Large-scale Bayesian logistic regression for text categorization. Technometrics 49, 3 (2007), 291–304. DOI:https://doi.org/10.1198/004017007000000245Google ScholarCross Ref
- Anastasia Giachanou, Julio Gonzalo, Ida Mele, and Fabio Crestani. 2017. Sentiment propagation for predicting reputation polarity. DOI:https://doi.org/10.1007/978-3-319-56608-5_18Google Scholar
- Kevin Gimpel, Nathan Schneider, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. [n.d.]. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. https://www.aclweb.org/anthology/P11-2008.pdf.Google Scholar
- Christian Giovanelli, Xin U. Liu, Seppo Antero Sierla, Valeriy Vyatkin, and Ryutaro Ichise. 2017. Towards an aggregator that exploits big data to bid on frequency containment reserve market. In 43rd Conference of the IEEE Industrial Electronics Society. 7514–7519.Google ScholarCross Ref
- Edel Greevy. 2004. Automatic text categorisation of racist webpages. http://doras.dcu.ie/17275/1/edel_greevy_20120702122736.pdf.Google Scholar
- Vishal Gupta and Gurpreet Lehal. 2009. A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1 (08 2009). DOI:DOI:https://doi.org/10.4304/jetwi.1.1.60-76Google Scholar
- Emma Haddi, Xiaohui Liu, and Yong Shi. 2013. The role of text pre-processing in sentiment analysis. In International Conference on Information Technology and Quantitative Management.Google ScholarCross Ref
- Khaled M. Hammouda and Mohamed S. Kamel. 2004. Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16, 10 (Oct. 2004), 1279–1296. DOI:DOI:https://doi.org/10.1109/TKDE.2004.58 Google ScholarDigital Library
- Yulan He, Chenghua Lin, and Harith Alani. 2011. Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In 49th Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 123–131. Retrieved from https://www.aclweb.org/anthology/P11-1013. Google ScholarDigital Library
- Aurélie Herbelot and Marco Baroni. 2017. High-risk learning: Acquiring new word vectors from tiny data. CoRR abs/1707.06556 (2017).Google Scholar
- Bruce M. Hill. 1968. Posterior distribution of percentiles: Bayes’ theorem for sampling from a population. J. Amer. Statist. Assoc. 63, 322 (1968), 677–691. Retrieved from http://www.jstor.org/stable/2284038.Google Scholar
- Tin Kam Ho. 1998. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8 (Aug. 1998), 832–844. DOI:DOI:https://doi.org/10.1109/34.709601 Google ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780. DOI:DOI:https://doi.org/10.1162/neco.1997.9.8.1735 Google ScholarDigital Library
- Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In 56th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 328–339. DOI:DOI:https://doi.org/10.18653/v1/P18-1031Google ScholarCross Ref
- Xia Hu and Huan Liu. 2012. Text Analytics in Social Media. Springer US, Boston, MA, 385–414. DOI:DOI:https://doi.org/10.1007/978-1-4614-3223-4_12Google Scholar
- Ah hwee Tan. 1999. Text mining: The state of the art and the challenges. In Workshop on Knowledge Discovery from Advanced Databases. 65–70.Google Scholar
- Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. SensEmbed: Learning sense embeddings for word and relational similarity. In Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- Suzana Ilic, Edison Marrese-Taylor, Jorge A. Balazs, and Yutaka Matsuo. 2018. Deep contextualized word representations for detecting sarcasm and irony. CoRR abs/1809.09795 (2018).Google Scholar
- Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Reading text in the wild with convolutional neural networks. CoRR abs/1412.1842 (2014). Google ScholarDigital Library
- Zhao Jianqiang. 2015. Pre-processing boosting Twitter sentiment analysis?DOI:DOI:https://doi.org/10.1109/SmartCity.2015.158Google Scholar
- Zhao Jianqiang and Gui Xiaolin. 2017. Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5 (2017), 2870–2879.Google ScholarCross Ref
- Zhao Jianqiang and Gui Xiaolin. 2018. Deep convolution neural networks for Twitter sentiment analysis. IEEE Access PP (01 2018), 1–1. DOI:DOI:https://doi.org/10.1109/ACCESS.2017.2776930Google ScholarCross Ref
- Rie Johnson and Tong Zhang. 2014. Effective use of word order for text categorization with convolutional neural networks. CoRR abs/1412.1058 (2014).Google Scholar
- Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Ling. 8 (2020), 64–77.Google ScholarCross Ref
- Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. CoRR abs/1607.01759 (2016).Google Scholar
- Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. 2019. CTRL: A conditional transformer language model for controllable generation. arxiv:cs.CL/1909.05858.Google Scholar
- Farhan Hassan Khan, Saba Bashir, and Usman Qamar. 2014. TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57 (Jan. 2014), 245–257. DOI:DOI:https://doi.org/10.1016/j.dss.2013.09.004 Google ScholarDigital Library
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).Google Scholar
- Vandana Korde and C. Namrata Mahender. 2012. Text classification and classifiers: A survey. http://www.airccse.org/journal/ijaia/papers/3212ijaia08.pdfhttps://www.researchgate.net/publication/276196340_Text_Classification_and_ClassifiersA_Survey.Google Scholar
- Efthymios Kouloumpis, Theresa Wilson, and Johanna D. Moore. 2011. Twitter sentiment analysis: The good the bad and the OMG! In International AAAI Conference on Web and Social Media.Google Scholar
- Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura E. Barnes, and Donald E. Brown. 2019. Text classification algorithms: A survey. CoRR abs/1904.08067 (2019).Google Scholar
- Irene Kwok and Yuzhou Wang. 2013. Locate the hate: Detecting tweets against blacks. In 27th AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, 1621–1622. Retrieved from http://dl.acm.org/citation.cfm?id=2891460.2891697. Google ScholarDigital Library
- Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019).Google Scholar
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arxiv:cs.CL/1909.11942.Google Scholar
- Ray R. Larson. 2010. Introduction to information retrieval. J. Amer. Soc. Inf. Sci. Technol. 61, 4 (Apr. 2010), 852–853. DOI:DOI:https://doi.org/10.1002/asi.v61:4 Google ScholarDigital Library
- Paula Lauren, Guangzhi Qu, Feng Zhang, and Amaury Lendasse. 2018. Discriminant document embeddings with an extreme learning machine for classifying clinical narratives. Neurocomputing 277 (2018), 129–138. DOI:DOI:https://doi.org/10.1016/j.neucom.2017.01.117Google ScholarCross Ref
- Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. CoRR abs/1405.4053 (2014). Google ScholarDigital Library
- Yann LeCun, Y. Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (05 2015), 436–44. DOI:DOI:https://doi.org/10.1038/nature14539Google Scholar
- Yann Lecun, Leon Bottou, Y. Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86 (12 1998), 2278–2324. DOI:DOI:https://doi.org/10.1109/5.726791Google ScholarCross Ref
- Ledell, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, and Jason Weston. 2017. StarSpace: Embed all the things!arxiv:cs.CL/1709.03856.Google Scholar
- Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. arxiv:cs.CL/1901.08746.Google Scholar
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).Google Scholar
- Liang-Chih, Jin Wang, K. Robert Lai, and Xuejie Zhang. 2018. Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Trans. Audio, Speech Lang. Proc. 26, 3 (Mar. 2018), 671–681. DOI:DOI:https://doi.org/10.1109/TASLP.2017.2788182 Google ScholarDigital Library
- Chenghua Lin and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In 18th ACM Conference on Information and Knowledge Management (CIKM’09). ACM, New York, NY, 375–384. DOI:DOI:https://doi.org/10.1145/1645953.1646003 Google ScholarDigital Library
- Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2015. Learning context-sensitive word embeddings with neural tensor skip-gram model. In 24th International Conference on Artificial Intelligence (IJCAI’15). AAAI Press, 1284–1290. Retrieved from http://dl.acm.org/citation.cfm?id=2832415.2832428. Google ScholarDigital Library
- Shuhua Liu and Thomas Forss. 2014. Combining N-gram based similarity analysis with sentiment analysis in web content classification. In International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1 (IC3K’14). SCITEPRESS - Science and Technology Publications, Lda, 530–537. DOI:DOI:https://doi.org/10.5220/0005170305300537 Google ScholarDigital Library
- Yuanchao Liu, Bingquan Liu, Lili Shan, and Xin Wang. 2018. Modelling context with neural networks for recommending idioms in essay writing. Neurocomputing 275 (2018), 2287–2293. DOI:DOI:https://doi.org/10.1016/j.neucom.2017.11.005Google ScholarCross Ref
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019).Google Scholar
- David M. Magerman. 1995. Statistical decision-tree models for parsing. In 33rd Meeting on Association for Computational Linguistics (ACL’95). Association for Computational Linguistics, 276–283. DOI:DOI:https://doi.org/10.3115/981658.981695 Google ScholarDigital Library
- Danilo P. Mandic and Jonathon Chambers. 2001. Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability. John Wiley & Sons, Inc., New York, NY. Google ScholarDigital Library
- Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Meeting of the Association for Computational Linguistics (System Demonstrations). 55–60.Google ScholarCross Ref
- Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. CoRR abs/1708.00107 (2017). Google ScholarDigital Library
- Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Conference on Advances in Neural Information Processing Systems. Google ScholarDigital Library
- Yelena Mejova and Padmini Srinivasan. 2011. Exploring feature definition and selection for sentiment classifiers.Google Scholar
- Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, 51–61. DOI:DOI:https://doi.org/10.18653/v1/K16-1006Google ScholarCross Ref
- Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In Conference on Computational Natural Language Learning.Google ScholarCross Ref
- Prem Melville, Wojciech Gryc, and Richard D. Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 1275–1284. DOI:DOI:https://doi.org/10.1145/1557019.1557156 Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Conference on Advances in Neural Information Processing Systems. 3111–9. Google ScholarDigital Library
- Saif Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu. 2013. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Association for Computational Linguistics, 321–327. Retrieved from http://aclweb.org/anthology/S13-2053.Google Scholar
- James N. Morgan and John A. Sonquist. 1963. Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc. 58, 302 (1963), 415–434. Retrieved from http://www.jstor.org/stable/2283276.Google ScholarCross Ref
- Nikola Mrksic, Ivan Vulic, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gasic, Anna Korhonen, and Steve J. Young. 2017. Semantic specialisation of distributional word vector spaces using monolingual and cross-lingual constraints. CoRR abs/1706.00374 (2017).Google Scholar
- T. Mullen and R. Malouf. 2006. A preliminary investigation into sentiment analysis of informal political discourse. AAAI Spring Symposium - Technical Report SS-06-03 (2006), 159–162. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-33747172751&partnerID=40&md5=6b12793b70eae006102989ed6d398fcb.Google Scholar
- Martin Müller, Marcel Salathé, and Per E. Kummervold. 2020. COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter. arXiv preprint arXiv:2005.07503 (2020).Google Scholar
- Marwa Naili, Anja Habacha Chaibi, and Henda Hajjami Ben Ghezala. 2017. Comparative study of word embedding methods in topic segmentation. Procedia Comput. Sci. 112 (2017), 340–349. Google ScholarDigital Library
- Vivek Narayanan, Ishan Arora, and Arjun Bhatia. 2013. Fast and accurate sentiment classification using an enhanced Naive Bayes model. CoRR abs/1305.6143 (2013). Google ScholarDigital Library
- Usman Naseem. 2020. Hybrid Words Representation for the Classification of Low Quality Text. Ph.D. Dissertation. University of Technology Sydney, Australia.Google Scholar
- U. Naseem, S. K. Khan, M. Farasat, and F. Ali. 2019. Abusive language detection: A comprehensive review. Indian J. Sci. Technol. 12, 45 (2019), 1–13.Google Scholar
- U. Naseem, I. Razzak, and P. W. Eklund. 2020. A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimedia Tools and Applications, 1–28.Google Scholar
- Usman Naseem, Shah Khalid Khan, Imran Razzak, and Ibrahim A. Hameed. 2019. Hybrid words representation for airlines sentiment analysis. In AI 2019: Advances in Artificial Intelligence, Jixue Liu and James Bailey (Eds.). Springer International Publishing, Cham, 381–392.Google Scholar
- Usman Naseem, Matloob Khushi, Shah Khalid Khan, Nazar Waheed, Adnan Mir, Atika Qazi, Bandar Alshammari, and Simon K. Poon. 2020. Diabetic retinopathy detection using multi-layer neural networks and split attention with focal loss. In International Conference on Neural Information Processing. Springer, 1–12.Google Scholar
- Usman Naseem, Matloob Khushi, Vinay Reddy, Sakthivel Rajendran, Imran Razzak, and Jinman Kim. 2020. BioALBERT: A simple and effective pre-trained language model for biomedical named entity recognition. arXiv preprint arXiv:2009.09223 (2020).Google Scholar
- Usman Naseem and Katarzyna Musial. 2019. DICE: Deep intelligent contextual embedding for Twitter sentiment analysis. In International Conference on Document Analysis and Recognition (ICDAR’19). IEEE, 953–958.Google ScholarCross Ref
- U. Naseem, I. Razzak, M. Khushi, P. W. Eklund, and J. Kim. 2021. COVIDSenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis. IEEE Transactions on Computational Social Systems.Google ScholarCross Ref
- Usman Naseem, Katarzyna Musial, Peter Eklund, and Mukesh Prasad. 2020. Biomedical named-entity recognition by hierarchically fusing BioBERT representations and deep contextual-level word-embedding. In International Joint Conference on Neural Networks (IJCNN’20). IEEE, 1–8.Google ScholarCross Ref
- Usman Naseem, Imran Razzak, Peter Eklund, and Katarzyna Musial. 2020. Towards improved deep contextual embedding for the identification of irony and sarcasm. In International Joint Conference on Neural Networks (IJCNN’20). IEEE, 1–7.Google ScholarCross Ref
- Usman Naseem, Imran Razzak, and Ibrahim A. Hameed. 2019. Deep context-aware embedding for abusive and hate speech detection on Twitter. Aust. J. Intell. Inf. Process. Syst. 15, 3 (2019), 69–76.Google Scholar
- Usman Naseem, Imran Razzak, Katarzyna Musial, and Muhammad Imran. 2020. Transformer based deep intelligent contextual embedding for Twitter sentiment analysis. Fut. Gen. Comput. Syst. 113 (2020), 58–69.Google ScholarCross Ref
- Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. 2015. Efficient non-parametric estimation of multiple embeddings per word in vector space. CoRR abs/1504.06654 (2015).Google Scholar
- Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. 2020. BERTweet: A pre-trained language model for English Tweets. arXiv preprint arXiv:2005.10200 (2020).Google Scholar
- Thomas Niebler, Martin Becker, Christian Pölitz, and Andreas Hotho. 2017. Learning semantic relatedness from human feedback using metric learning. CoRR abs/1705.07425 (2017).Google Scholar
- Alexander Pak and Patrick Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In International Conference on Language Resources and Evaluation.Google Scholar
- Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 (EMNLP’02). Association for Computational Linguistics, 79–86. DOI:DOI:https://doi.org/10.3115/1118693.1118704 Google ScholarDigital Library
- U. Naseem, M. Khushi, S. K. Khan, K. Shaukat, and M. A. Moni. 2021. A comparative analysis of active learning for biomedical text mining. Applied System Innovation 4, 1 (2021), 23.Google ScholarCross Ref
- Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML’13). JMLR.org, III–1310–III–1318. Retrieved from http://dl.acm.org/citation.cfm?id=3042817.3043083. Google ScholarDigital Library
- Cristian Patriche, Pîrn?u Gabriel, Adrian Grozavu, and Bogdan Ro?ca. 2016. A comparative analysis of binary logistic regression and analytical hierarchy process for landslide susceptibility assessment in the Dobrov River Basin, Romania. Pedosphere 26 (06 2016), 335–350. DOI:DOI:https://doi.org/10.1016/S1002-0160(15)60047-9Google ScholarCross Ref
- C. V. Patriche, R. Pirnau, A. Grozavu, and B. Rosca. 2016. A comparative analysis of binary logistic regression and analytical hierarchy process for landslide susceptibility assessment in the Dobrov River Basin, Romania. Pedosphere 26, 3 (2016), 335–350.Google ScholarCross Ref
- Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 2227–2237. DOI:DOI:https://doi.org/10.18653/v1/N18-1202Google ScholarCross Ref
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018).Google Scholar
- Yuval Pinter, Robert Guthrie, and Jacob Eisenstein. 2017. Mimicking word embeddings using subword RNNs. CoRR abs/1707.06961 (2017).Google Scholar
- Pengda Qin, Weiran Xu, and Jun Guo. 2016. An empirical convolutional neural network approach for semantic relation classification. Neurocomputing 190, C (May 2016), 1–9. DOI:DOI:https://doi.org/10.1016/j.neucom.2015.12.091 Google ScholarDigital Library
- Zhaowei Qu, Xiaomin Song, Shuqiang Zheng, Xiaoru Wang, Xiaohui Song, and Zuquan Li. 2018. Improved Bayes method based on TF-IDF feature and grade factor feature for Chinese information classification. In IEEE International Conference on Big Data and Smart Computing (BigComp’18). 677–680.Google ScholarCross Ref
- J. R. Quinlan. 1987. Simplifying decision trees. Int. J. Man Mach. Stud. 27, 3 (1987), 221–234. DOI:DOI:https://doi.org/10.1016/S0020-7373(87)80053-6 Google ScholarDigital Library
- J. R. Quinlan. 1986. Induction of decision trees. Mach. Learn. 1, 1 (Mar. 1986), 81–106. DOI:DOI:https://doi.org/10.1023/A:1022643204877 Google ScholarDigital Library
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language models are unsupervised multitask learners. Retrieved from https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf.Google Scholar
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).Google Scholar
- Arshia Rehman, Saeeda Naz, Usman Naseem, Imran Razzak, and Ibrahim A. Hameed. 2019. Deep AutoEncoder-Decoder framework for semantic segmentation of brain tumor. Aust. J. Intell. Inf. Process. Syst. 15, 3 (2019), 53–60.Google Scholar
- Yafeng Ren, Yue Zhang, Meishan Zhang, and Donghong Ji. 2016. Context-sensitive Twitter sentiment classification using neural network. In 30th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 215–221. Retrieved from http://dl.acm.org/citation.cfm?id=3015812.3015844. Google ScholarDigital Library
- Jack Reuter, Jhonata Pereira-Martins, and Jugal Kalita. 2016. Segmenting Twitter hashtags. Int. J. Nat. Lang. Comput. 5 (08 2016), 23–36. DOI:DOI:https://doi.org/10.5121/ijnlc.2016.5402Google ScholarCross Ref
- M. Jaggi, P. Mandal, S. Narang, U. Naseem, and M. Khushi. 2021. Text mining of stocktwits data for predicting stock prices. Applied System Innovation 4, 1 (2021), 13.Google ScholarCross Ref
- Seyed Mahdi Rezaeinia, Ali Ghodsi, and Rouhollah Rahmani. 2017. Improving the accuracy of pre-trained word embeddings for sentiment analysis. CoRR abs/1711.08609 (2017).Google Scholar
- Hassan Saif, Marta Fernandez Andres, Yulan He, and Harith Alani. 2013. Evaluation datasets for Twitter sentiment analysis: A survey and a new dataset, the STS-Gold. In International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI.Google Scholar
- Mohammad Arshi Saloot, Norisma Idris, Nor Liyana Mohd Shuib, Ram Gopal Raj, and AiTi Aw. 2015. Toward tweets normalization using maximum entropy. In Workshop on Noisy User-generated Text, NUT@IJCNLP.Google ScholarCross Ref
- Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google Scholar
- Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In Artificial Neural Networks – ICANN 2010, Konstantinos Diamantaras, Wlodek Duch, and Lazaros S. Iliadis (Eds.). Springer Berlin, 92–101. Google ScholarDigital Library
- Seungil, David Ding, Kevin Canini, Jan Pfeifer, and Maya Gupta. 2017. Deep lattice networks and partial monotonic functions. arxiv:stat.ML/1709.06680.Google Scholar
- Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’15). ACM, New York, NY, 959–962. DOI:DOI:https://doi.org/10.1145/2766462.2767830 Google ScholarDigital Library
- Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-LM: Training multi-billion parameter language models using model parallelism. arxiv:cs.CL/1909.08053.Google Scholar
- Tajinder Singh and Madhu Kumari. 2016. Role of text pre-processing in Twitter sentiment analysis. https://www.sciencedirect.com/science/article/pii/S1877050916311607.Google Scholar
- R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing. 1631–1642.Google Scholar
- Saeid Soheily-Khah, Pierre-François Marteau, and Nicolas Béchet. 2017. Intrusion detection in network systems through hybrid supervised and unsupervised mining process- a detailed case study on the ISCX benchmark dataset -.Google Scholar
- Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450 (2019).Google Scholar
- Karen Sparck Jones. 1988. A statistical interpretation of term specificity and its application in retrieval. In Document Retrieval Systems. Taylor Graham Publishing, London, UK, 132–142. Retrieved from http://dl.acm.org/citation.cfm?id=106765.106782. Google ScholarDigital Library
- Robyn Speer, Joshua Chin, and Catherine Havasi. 2016. ConceptNet 5.5: An open multilingual graph of general knowledge. CoRR abs/1612.03975 (2016).Google Scholar
- Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).Google Scholar
- Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. In AAAI Conference on Artificial Intelligence. 8968–8975.Google ScholarCross Ref
- Ilya Sutskever, James Martens, and Geoffrey Hinton. 2011. Generating text with recurrent neural networks. In 28th International Conference on International Conference on Machine Learning (ICML’11). Omnipress, 1017–1024. Retrieved from http://dl.acm.org/citation.cfm?id=3104482.3104610. Google ScholarDigital Library
- Jared Suttles and Nancy Ide. 2013. Distant supervision for emotion classification with discrete binary values. In Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 2 (CICLing’13). Springer-Verlag, Berlin, 121–136. DOI:DOI:https://doi.org/10.1007/978-3-642-37256-8_11 Google ScholarDigital Library
- Symeon Symeonidis, Dimitrios Effrosynidis, and Avi Arampatzis. 2018. A comparative evaluation of pre-processing techniques and their interactions for Twitter sentiment analysis. Exp. Syst. Applic. 110 (2018), 298–310. DOI:DOI:https://doi.org/10.1016/j.eswa.2018.06.022Google ScholarCross Ref
- Duyu Tang, Bing Qin, Furu Wei, Li Dong, Ting Liu, and Ming Zhou. 2015. A joint segmentation and classification framework for sentence level sentiment classification. IEEE/ACM Trans. Audio, Speech Lang. Process. 23, 11 (2015), 1750–61. Google ScholarDigital Library
- Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, and Ming Zhou. 2016. Sentiment embeddings with applications to sentiment analysis. IEEE Trans. Knowl. Data Eng. 28, 2 (2016), 496–509. Google ScholarDigital Library
- Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, and Ming Zhou. 2016. Sentiment embeddings with applications to sentiment analysis. IEEE Trans. Knowl. Data Eng. 28, 2 (Feb. 2016), 496–509. DOI:DOI:https://doi.org/10.1109/TKDE.2015.2489653 Google ScholarDigital Library
- Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for Twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 1555–1565. DOI:DOI:https://doi.org/10.3115/v1/P14-1146Google ScholarCross Ref
- Alper Kursat Uysal and Serkan Günal. 2014. The impact of preprocessing on text classification. Inf. Process. Manag. 50 (2014), 104–112. Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., 6000–6010. Retrieved from http://dl.acm.org/citation.cfm?id=3295222.3295349. Google ScholarDigital Library
- Byron Wallace. 2017. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, 253–263.Google Scholar
- Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, and Luo Si. 2019. StructBERT: Incorporating language structures into pre-training for deep language understanding. arxiv:cs.CL/1908.04577.Google Scholar
- Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, and Amit P. Sheth. 2012. Harnessing Twitter “big data” for automatic emotion identification. In International Conference on Privacy, Security, Risk and Trust and International Confernece on Social Computing. 587–592. Google ScholarDigital Library
- Yequan Wang, Minlie Huang, Xiaoyan Zhu, and Li Zhao. 2016. Attention-based LSTM for aspect-level sentiment classification. In Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 606–615. DOI:DOI:https://doi.org/10.18653/v1/D16-1058 Google ScholarDigital Library
- Yuyang Wang, Roni Khardon, and Pavlos Protopapas. 2012. Nonparametric Bayesian estimation of periodic light curves. Astrophy. J. 756, 1 (Aug. 2012), 67. DOI:DOI:https://doi.org/10.1088/0004-637x/756/1/67Google Scholar
- Ikuya Yamada, Hideaki Takeda, and Yoshiyasu Takefuji. 2015. Enhancing named entity recognition in Twitter messages using entity linking. In Workshop on Noisy User-generated Text, NUT@IJCNLP.Google ScholarCross Ref
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019). Google ScholarDigital Library
- Yukun Zhu, Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. CoRR abs/1506.06724 (2015). Google ScholarDigital Library
Index Terms
- A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models
Recommendations
Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition
This paper describes our research on statistical language modeling of Lithuanian. The idea of improving sparse n-gram models of highly inflected Lithuanian language by interpolating them with complex n-gram models based on word clustering and morphological ...
Learning distributed word representation with multi-contextual mixed embedding
Learning distributed word representations has been a popular method for various natural language processing applications such as word analogy and similarity, document classification and sentiment analysis. However, most existing word embedding models ...
Word2vec’s Distributed Word Representation for Hindi Word Sense Disambiguation
Distributed Computing and Internet Technology
Comments