To the Editor—The COVID-19 pandemic has thrust preprints into the spotlight, attracting attention from the media and the public, as well as from scientists. Preprints are articles not yet published in a peer-reviewed journal, and as such they offer a unique opportunity to improve reporting. The Automated Screening Working Group (https://scicrunch.org/ASWG/about/COVIDPreprint) aims to provide rapid feedback that may help authors of COVID-19 preprints to improve their transparency and reproducibility.

One quarter of COVID-19 papers published have been preprints. Most of these appear on medRxiv; others appear on bioRxiv or other servers1. Although publishing results in preprints allows them to be posted rapidly, the absence of traditional peer review has raised concerns about preprint quality. Unfortunately, it has been impossible for scientists to keep pace with the thousands of COVID-19 preprints published since February. Preprints are vetted before posting to confirm that they describe scientific studies and to prevent posting on topics that could damage public health; however, routine assessment of manuscript quality or flagging of common reporting problems is not feasible at this scale.

Although automated screening is not a replacement for peer review, automated tools can identify common problems. Examples include failure to state whether experiments were blinded or randomized2, failure to report the sex of participants2 and misuse of bar graphs to display continuous data3. We have been using six tools4,5,6,7,8 to screen all new medRxiv and bioRxiv COVID-19 preprints (Table 1). New preprints are screened daily9. By this means, reports on more than 8,000 COVID preprints have been shared using the web annotation tool hypothes.is (RRID:SCR_000430) and have been tweeted out via @SciScoreReports (https://hypothes.is/users/sciscore). Readers can access these reports in two ways. The first option is to find the link to the report in the @SciScoreReports tweet in the preprint’s Twitter feed, located in the metrics tab. The second option is to download the hypothes.is bookmarklet. In addition, readers and authors can reply to the reports, which also contain information on solutions.

Table 1 Tools used to screen COVID-19 preprints

Screening of 6,570 medRxiv and bioRxiv COVID-19 preprints posted before 19 July revealed several interesting results. 13.6% of preprints shared open data and 14.3% shared open code, making it easier for others to reuse data or reproduce results. Approximately one third (34.4%) of COVID-19 preprints acknowledged at least one study limitation. 7.3% of preprints included bar graphs of continuous data. This is problematic because many different datasets can lead to the same bar graph, and the actual data may suggest different conclusions from those implied by the summary statistics alone3. Therefore, authors should use dot plots, box plots or violin plots instead3. Among papers with color maps, 7.6% used rainbow colormaps, which are not colorblind safe and also create visual artifacts for viewers with normal vision7. Rainbow color maps should be replaced with more-informative color maps that are perceptually uniform and colorblind accessible, such as viridis7. 1,775 preprints (27%) contained an ethics approval statement for human or animal research. This suggests that nearly three quarters of COVID-19 preprints are secondary or tertiary analyses, modeling studies or cell line studies that do not require approval. Although there are known sex differences in COVID-1910, only 20% of all COVID-19 preprints, and 38% of preprints with an ethics approval statement, address sex as a biological variable. Statements regarding sample size calculations (1.4%), blinding (2.7%) and randomization (11.4%) were uncommon, even among studies that contained a human ethics statement (present in 2.4%, 5.4% and 12.6%, respectively). Many COVID-19 preprints are modeling studies, however, and hence these criteria are not always relevant. 6.1% of preprints used nonhuman organisms, mainly mice. Among the 552 preprints that included cell lines, 7% described how the cell lines were authenticated (e.g., short tandem repeat profiling) or were kept free of contamination (e.g., mycoplasma detection tests).

Our work shows that it is feasible to conduct large-scale automated screening of preprints and provide rapid feedback to authors and readers. Automated tools are not perfect—they make mistakes, and they cannot always determine whether a problem is relevant to a given paper. Moreover, some problems are too complex for automated tools to detect. Despite these limitations, automated tools can quickly flag potential problems and may complement peer reviews. We hope that these reports will raise awareness about factors that affect transparency and reproducibility, while helping authors to improve their manuscripts. Further research is needed to determine whether automated tools improve reporting.