figshare
Browse
cell_lang_counts.zip (1.9 MB)

Spatial distributions of languages extracted from Twitter

Download (1.9 MB)
Version 3 2021-04-29, 10:53
Version 2 2021-03-30, 13:35
Version 1 2021-03-30, 13:33
dataset
posted on 2021-03-30, 13:33 authored by Thomas LoufThomas Louf
This is a collection of GeoJSON files containing the counts of users of local language groups in every cell of a grid laid over several regions of interest. The cells are defined as squares in a projected system of coordinates adapted to each country, the sides of which have a size specified in the file names (cell_size=Xm).

These counts were obtained through the processing of geo-located tweets posted between 2015 and 2019 in these regions, collected through the streaming API of Twitter, and more specifically using the "statuses/filter" endpoint (see Ref. 1). This endpoint provides a sample of tweets in real time matching some provided filters. Bounding box filters were set to collect tweets from a set of countries of interest. Before reproducing this method of data collection, one should bear in mind that the current form and even the availability of this endpoint is subject to future changes introduced by the Twitter Developer's team. The code used to make this processing as well as to visualize these data is available on GitHub (see Ref. 2).

Funding

The authors acknowledge funding from the Spanish Ministry of Science and Innovation, the AEI and FEDER (EU) under the grant PACSS (RTI2018-093732-B-C22) and the Maria de Maeztu program for Units of Excellence in R&D (MDM-2017-0711).

History