Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published July 27, 2017 | Version v1
Dataset Open

Dataset for ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset (ICDAR2017 HTR)

  • 1. PRHLT, Universitat Politècnica de València, Spain

Description

Train-A: Dataset of pages with manually revised baselines and the corresponding transcripts associated to them. This batch is small, 50 pages. Please, keep in mind that only the baselines have been manually corrected, The polygons associated to each line have not been manually reviewed.

Train-B: Dataset of pages without any layout or text line information. The corresponding transcripts are provided at page level with line breaks. It has 10k pages, though for convenience it is divided into two 5k page batches. This information is provided in PAGE format.

Test A: Dataset of pages with manually revised baselines. This batch has 65 pages. The polygons associated to each line have not been manually reviewed.

Test-B1: The same dataset of pages of the Test A, but annotated only with the geometry of regions. Text line information is not provided.                                                   

Test-B2: Dataset of page images annotated with the geometry of regions where to detect text line and recognize. It has 57 pages.

Baseline.tgz: Baseline system trained using the first 40 pages of Train-A. The system is based on the deep learning toolkit to transcribe handwritten text images called Laia.

More information at:

https://scriptnet.iit.demokritos.gr/competitions/~icdar2017htr/

 

Files

Files (4.0 GB)

Name Size Download all
md5:5ef6d6d9a1be6785686559d6f8c9b67a
22.1 MB Download
md5:f989a3f056d1b830564594a576b4dc75
70.9 MB Download
md5:6bea580c2fdcae850041738bc03d8c1c
70.8 MB Download
md5:0bea41d3beab30431fdb3ad01f5929ab
48.0 MB Download
md5:e46c7019f8ac639b796ecb8d872fd481
21.4 MB Download
md5:e11b9d0cb97169d64069268a23e90ef2
1.9 GB Download
md5:93ea0b7285f65c8438155e9490c691ed
1.9 GB Download

Additional details

Funding

READ – Recognition and Enrichment of Archival Documents 674943
European Commission