Carnegie Mellon University
Browse
file.pdf (1.01 MB)

A Latent Variable Model for Geographic Lexical Variation

Download (1.01 MB)
journal contribution
posted on 2010-10-01, 00:00 authored by Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, Eric P. Xing

The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as “sports” or “entertainment” are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author’s geographic location from raw text, outperforming both text regression and supervised topic models.

History

Publisher Statement

c 2010 Association for Computational Linguistics

Date

2010-10-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC