Abstract
The information published by the millions of public social network users is an important source of knowledge that can be used in academic, socioeconomic or demographic studies (distribution of male and female population, age, marital status, birth), lifestyle analysis (interests, hobbies, social habits) or be used to study online behavior (time spent online, interaction with friends or discussion about brands, products or politics). This work uses a database of about 27 million Portuguese geolocated tweets, produced in Portugal by 97.8 K users during a 1-year period, to extract information about the behavior of the geolocated Portuguese Twitter community and show that with this information it is possible to extract overall indicators such as: the daily periods of increased activity per region; prediction of regions where the concentration of the population is higher or lower in certain periods of the year; how do regional habitants feel about life; or what is talked about in each region. We also analyze the behavior of the geolocated Portuguese Twitter users based on the tweeted contents, and find indications that their behavior differs in certain relevant aspect from other Twitter communities, hypothesizing that this is in part due to the abnormal high percentage of young teenagers in the community. Finally, we present a small case study on Portuguese tourism in the Algarve region. To the best of our knowledge, this work is the first study that shows geolocated Portuguese users’ behavior in Twitter focusing on geographic regional use.
Similar content being viewed by others
Notes
https://about.twitter.com/company, last accessed, February 5th, 2016.
http://www.internetlivestats.com/twitter-statistics/, last accessed, February 5th, 2016.
http://www.ctt.pt/feapl_2/app/restricted/postalCodeSearch/postalCodeDownloadFiles.jspx, last accessed November 15th, 2015.
https://dre.pt/application/dir/pdf2sdip/2014/07/126000000/1728617289.pdf, last accessed November 15th, 2015.
http://www.jn.pt/PaginaInicial/Nacional/Media/Interior.aspx?content_id=4730582. Last accessed November 15th, 2015.
http://www.businessinsider.com/update-a-breakdown-of-the-demographics-for-each-of-the-different-social-networks-2015-6, last accessed February 6th, 2015.
http://www.asourceofinspiration.com/2014/02/18/social-media-statistics-for-portugal/, last accessed February 6th, 2015.
References
Blanford J, Huang Z, Savelyev A, MacEachren A (2015) Geo-located tweets. Enhancing mobility maps and capturing crossborder movement. PLoS ONE. doi:10.1371/journal.pone.0129202
Bora S, Singh H, Sen A, Bagchi A, Singla P (2015) On the role of conductance, geography and topology in predicting hashtag virality. J Soc Netw Anal Min 5(1):1–15
Boyd D, Golder S, Lotan G (2010) Tweet, tweet, retweet: conversational aspects of retweeting on Twitter. In: Proceedings in system sciences (HICSS), pp 1–10
Brogueira G, Batista F, Carvalho JP, H. Moniz H (2014a) Portuguese geolocated tweets: an overview. In: Proceedings of the international conference on information systems and design of communication, ISDOC, pp 178–179. ACM
Brogueira G, Batista F, Carvalho JP, H. Moniz H (2014b) Expanding a database of Portuguese tweets. In: SLATE’14 3rd symposium on languages, applications and technologies, Jun. 2014, pp 275–282, Schloss Dagstuhl
Brogueira G, Batista F, Carvalho JP (2015) Sistema Inteligente de Recolha, Armazenamento e Visualização de Informação proveniente do Twitter. In: 15th Conferência da Associação Portuguesa de Sistemas de Informação, CAPSI’2015, Lisboa
Brogueira G, Batista F, Carvalho JP (2016) A smart system for Twitter corpus collection, management and visualization. Int J Technol Hum Interact. Accepted for publication, IGI Global, 2016
Burnap P, Williams M, Sloan L, Rana O, Housley W, Edwards A, Knight V, Procter R, Voss A (2014) Tweeting the terror: modelling the social media reaction to the Woolwich terrorist attack. J Soc Netw Anal Min 4(1):1–14
Chandra S, Khan L, Muhaya FB (2011) Estimating Twitter user location using social interactions—a content based approach. In: Proceedings—2011 IEEE international conference on privacy, security, risk and trust and IEEE international conference on social computing, PASSAT/SocialCom 2011, pp 838–843
Chang J, Sun E (2011) Location 3: how users share and respond to location-based data on social networking sites. In: Proceedings of the 5th international conference on weblogs and social media (ICWSM’11). AAAI Press, pp 74–80
Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM ‘10). ACM, pp 759–768
Culotta A (2014) Estimating county health statistics with Twitter. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1335–1344
Culotta A, Ravi N, Cutler J (2015) Predicting the demographics of Twitter users from social evidence using website traffic data. In: 29th AAAI conference on artificial intelligence (AAAI-15)
Dodds P, Harris K, Kloumann I, Bliss C, Danforth C (2011) Temporal patterns of happiness and information in a global social network: hedonometrics and Twitter. PLoS ONE. doi:10.1371/journal.pone.0026752
Edwards J (2014) These maps show that android is for people with less money. http://www.businessinsider.com/android-is-for-poor-people-maps-2014-4. Accessed 12 Nov 2015
Eisenstein J, O’Connor B, Smith N, Xing E (2010) A latent variable model for geographic lexical variation. In: Proceedings of the 2010 conference on empirical methods in natural language processing (EMNLP’10). Association for Computational Linguistics, pp 1277–1287
Hecht B, Hong L, Suh B, Chi E (2011) Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ‘11). ACM, pp 237–246
Honeycutt C, Herring S (2009) Beyond microblogging: conversation and collaboration via Twitter. In: Proceedings of the forty-second Hawaii international conference on system sciences (HICSS-42). IEEE Press, Los Alamitos, pp 1–10
Hong L, Convertino G, Chi EH (2011) Language matters in Twitter: a large scale study. In: Fifth international AAAI conference on weblogs and social media
Housley W, Procter R, Edwards A, Burnap P, Williams M, Sloan L, Rana O, Morgan J, Voss A, Greenhill A (2014) Big and broad social data and the sociological imagination: a collaborative response. Big Data Soc 1(2):1–15
Instituto Nacional de Estatística (2011) Censos 2011 Resultados Definitivos—Portugal
Java A, Song X, Finn T, Tseng B (2006) Why we Twitter: understanding microblogging usage and communities. Joint 9th WEBKDD and 1st SNA-KDD Workshop ‘07, San Jose, CA
Kalarikkal S, Remya PC (2015) Sentiment analysis and dataset collection: a comparative study. In: Advance computing conference (IACC), IEEE international, pp 519–524
Kim T, Huerta-Canepa G, Park J, Hyun SJ, Lee D (2011) What’s happening: finding spontaneous user clusters nearby using Twitter. In: IEEE international conference on privacy, security, risk, and trust, and IEEE international conference on social computing
Kim H, Lee S, Kyeong S (2013) Discovering hot topics using Twitter streaming data. In: IEEE/ACM international conference on advances in social networks analysis and mining
Kumar S, Morstatter F, Liu H (2014) Twitter data analytics. Springer, New York
Mahmud J, Nichols J, Drews C (2014) Home location identification of Twitter users. ACM Trans Intell Syst Technol 5(3):47
Malik M, Lamba H, Nakos C, Pfeffer J (2015) Population bias in geotagged tweets. In: 9th International AAAI conference on web and social media research. The AAAI Press, pp 18–27
Manfredini F, Tagliolato P, Rosa CD (2011) Monitoring temporary populations through cellular core network data. In: Computational science and its applications—ICCSA 2011. Lecture notes in computer science, vol 6783. Springer, Berlin Heidelberg, pp 151–161
Mittal A, Goel A (2011) Stock prediction using Twitter sentiment analysis. Standford University, CS229 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.375.4517&rep=rep1&type=pdf). Accessed 16 Nov 2015
Morstatter F, Pfeffer J, Liu H, Carley K (2013) Is the sample good enough? comparing data from twitter’s streaming API with Twitter’s firehose. In: International AAAI conference on weblogs and social media
Rill S, Reinel D, Scheidt J, Zicari RV (2014) PoliTwi: early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis. Knowl-Based Syst 69:24–33
Santos JC, Matos S (2013) Predicting flu incidence from Portuguese tweets. In: International work-conference on bioinformatics and biomedical engineering—IWBBIO, pp 11–18
Saravanan M, Sundar D, Kumaresh VS (2013) Probing of geospatial stream data to report disorientation. In: IEEE recent advances in intelligent computational systems (RAICS)
Widener MJ, Li W (2014) Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US. Appl Geogr 54:189–197
Zagheni E, Garimella K, State B, Weber I (2014) Inferring international and internal migration patterns from Twitter data. In: Proceedings of the 23rd international conference on WWW’14 companion, Seoul, Korea
Acknowledgments
This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) under project PTDC/IVC-ESCT/4919/2012 (MISNIS) and funds with reference UID/CEC/50021/2013.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Brogueira, G., Batista, F. & Carvalho, J.P. Using geolocated tweets for characterization of Twitter in Portugal and the Portuguese administrative regions. Soc. Netw. Anal. Min. 6, 37 (2016). https://doi.org/10.1007/s13278-016-0347-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0347-8