Overview

Authors:

Anders Søgaard ⁰,
Ivan Vulić ¹,
Sebastian Ruder ²,
…
Manaal Faruqui ³

Anders Søgaard
1. University of Copenhagen, Denmark
View author publications

You can also search for this author in PubMed Google Scholar
Ivan Vulić
1. University of Cambridge, UK
View author publications

You can also search for this author in PubMed Google Scholar
Sebastian Ruder
1. DeepMind, USA
View author publications

You can also search for this author in PubMed Google Scholar
Manaal Faruqui
1. Google Assistant, USA
View author publications

You can also search for this author in PubMed Google Scholar

Part of the book series: Synthesis Lectures on Human Language Technologies (SLHLT)

569 Accesses
11 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 39.99

Price excludes VAT (USA)

Softcover Book USD 54.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (12 chapters)

Front Matter

Pages i-xi

Download chapter PDF
Introduction
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 1-7
Monolingual Word Embedding Models
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 9-12
Cross-Lingual Word Embedding Models: Typology
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 13-20
A Brief History of Cross-Lingual Word Representations
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 21-32
Word-Level Alignment Models
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 33-47
Sentence-Level Alignment Methods
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 49-53
Document-Level Alignment Models
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 55-57
From Bilingual to Multilingual Training
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 59-65
Unsupervised Learning of Cross-Lingual Word Embeddings
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 67-74
Applications and Evaluation
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 75-81
Useful Data and Software
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 83-88
General Challenges and Future Directions
- Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Pages 89-91
Back Matter

Pages 93-120

Download chapter PDF

About this book

The majority of natural language processing (NLP) is English language processing, and while there is good language technology support for (standard varieties of) English, support for Albanian, Burmese, or Cebuano--and most other languages--remains limited. Being able to bridge this digital divide is important for scientific and democratic reasons but also represents an enormous growth potential. A key challenge for this to happen is learning to align basic meaning-bearing units of different languages.

In this book, the authors survey and discuss recent and historical work on supervised and unsupervised learning of such alignments. Specifically, the book focuses on so-called cross-lingual word embeddings. The survey is intended to be systematic, using consistent notation and putting the available methods on comparable form, making it easy to compare wildly different approaches. In so doing, the authors establish previously unreported relations between these methods and are able to present a fast-growing literature in a very compact way. Furthermore, the authors discuss how best to evaluate cross-lingual word embedding methods and survey the resources available for students and researchers interested in this topic.

Authors and Affiliations

University of Copenhagen, Denmark

Anders Søgaard
University of Cambridge, UK

Ivan Vulić
DeepMind, USA

Sebastian Ruder
Google Assistant, USA

Manaal Faruqui

About the authors

Anders Søgaard is a father of three and a published poet, as well as a Full Professor in Computer Science the University of Copenhagen. He is currently funded by the Novo Nordisk Foundation, the Lundbeck Foundation, and the Innovation Fund Denmark; before that, he held an ERC Starting Grant and a Google Focused Research Award. He has won best paper awards at NAACL, EACL, CoNLL, etc. He previously wrote Semi-Supervised Learning and Domain Adaptation in NLP (Morgan & Claypool, 2013) and Cross-Lingual Word Embeddings (Morgan & Claypool, 2019), the latter with co-authors Ivan Vulic, Sebastian Ruder, and Manaal Faruqui.Ivan Vulić is a Senior Research Associate in the Language Technology Lab at the University of Cambridge since 2015. Ivan holds a Ph.D. in Computer Science from KU Leuven, having achieved summa cum laude in 2014 on ""Unsupervised Algorithms for Cross-lingual Text Analysis, Translation Mining, and Information Retrieval."" He is interested in representation learning, humanlanguage understanding, distributional, lexical, and multi-modal semantics inmonolingual and multilingual contexts, and transfer learning for enabling cross-lingual NLP applications. He has co-authored more than 60 peer-reviewed research papers published in top-tier journals and conference proceedings in NLP and IR. He co-lectured a tutorial on monolingual and multilingual topic models and applications at ECIR 2013 and WSDM 2014, a tutorial onword vector space specialisation at EACL 2017 and ESSLLI 2018, a tutorial on cross-lingual word representations at EMNLP 2017, and a tutorial on deep learning for conversational AI at NAACL 2018.
Sebastian Ruder is a Research Scientist at DeepMind. He obtained his Ph.D. in Natural Lan-guage Processing at the National University of Ireland, Galway in 2019. He is interested intransfer learning and cross-lingual learning and has published widely read reviews as well asmore than ten peer-reviewed research papers in top-tier conference proceedings in NLP.
Manaal Faruqui is a Senior Research Scientist at Google, working on industrial scale NLP and ML problems. He obtained his Ph.D. in the Language Technologies Institute at Carnegie Mellon University while working on representation learning, multilingual learning, and distributional and lexical semantics. He received a best paper award at NAACL 2015 for his work on incorporating semantic knowledge in word vector representations. He serves on the editorial board of the Computational Linguistics journal and has been an area chair for several ACL conferences.

Bibliographic Information

Book Title: Cross-Lingual Word Embeddings
Authors: Anders Søgaard, Ivan Vulić, Sebastian Ruder, Manaal Faruqui
Series Title: Synthesis Lectures on Human Language Technologies
DOI: https://doi.org/10.1007/978-3-031-02171-8
Publisher: Springer Cham
eBook Packages: Synthesis Collection of Technology (R0), eBColl Synthesis Collection 9
Copyright Information: Springer Nature Switzerland AG 2019
Softcover ISBN: 978-3-031-01043-9Published: 05 June 2019
eBook ISBN: 978-3-031-02171-8Published: 31 May 2022
Series ISSN: 1947-4040
Series E-ISSN: 1947-4059
Edition Number: 1
Number of Pages: XI, 120
Topics: Artificial Intelligence, Natural Language Processing (NLP), Computational Linguistics

Publish with us

Policies and ethics

Cross-Lingual Word Embeddings

Overview

Access this book

Other ways to access

Table of contents (12 chapters)

Front Matter

Back Matter

About this book

Authors and Affiliations

University of Copenhagen, Denmark

University of Cambridge, UK

DeepMind, USA

Google Assistant, USA

About the authors

Bibliographic Information

Publish with us

Search

Navigation