Published January 16, 2021 | Version v1
Conference paper Open

MM-COVID: A Multilingual and Multimodal Data Repository for Combating COVID-19 Disinformation

  • 1. Worcester Polytechnic Institute
  • 2. Arizona State University
  • 3. Illinois Institute of Technology

Description

This repository includes the COVID-19 fake news content and its related social engagements. 

There are three JSON files under this repository: 

  1. news_collection.json: it includes news veracity label, news fact-checking website, news content URLs, news content, and news' raw data. 
  2. news_tweet_relation.json: news-related tweets. We utilize the title and the URL of the news content as the search query to retrieve related discussions on Twitter. 
  3. tweet_tweet_relation.json: tweets' replies and retweets. 

It should be noticed that to obey Twitter's Developer Agreement and Policy, we only report the Twitter ID. Users can utilize Twarc to dehydrate the tweets from the tweet IDs.

 

Files

news_tweet_relation.json

Files (49.8 MB)

Name Size Download all
md5:7074898971d953db3d4c6fd9f2e5403c
5.8 MB Preview Download
md5:422d6508513749a858c22de5da53173d
44.0 MB Preview Download