skip to main content
10.3115/1072228.1072373dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free Access

Building a large-scale annotated Chinese corpus

Published:24 August 2002Publication History

ABSTRACT

In this paper we address issues related to building a large-scale Chinese corpus. We try to answer four questions: (i) how to speed up annotation, (ii) how to maintain high annotation quality, (iii) for what purposes is the corpus applicable, and finally (iv) what future work we anticipate.

References

  1. David Chiang. 2000. Statistical parsing with an automatically-extracted tree adjoining grammar. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, pages 456--463, Hong Kong, 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Fu-Dong Chiou, David Chiang, and Martha Palmer. 2001. Facilitating Treebank Annotation with a Statistical Parser. In Proc. of the Human Language Technology Conference (HLT-2001), San Diego, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Adwait Ratnaparkhi. A Maximum Entropy Part-Of-Speech Tagger. 1996. In Proceedings of the Empirical Methods in Natural Language Processing Conference, May 17-18, 1996. University of Pennsylvania.Google ScholarGoogle Scholar
  4. Andi Wu and Zixin Jiang. 2000. Statistically Enhanced New Word Identification in a Rule-Based Chinese System. In Proceedings of the Second Chinese Language Processing Workshop (in conjunction with ACL), HKUST, Hong Kong, p. 46--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fei Xia. 2000a. The Part-of-speech Guidelines for the Penn Chinese Treebank Project. Technical Report IRCS 00-06, University of Pennsylvania.Google ScholarGoogle Scholar
  6. Fei Xia. 2000b. The Segmentation Guidelines for the Penn Chinese Treebank Project. Technical Report IRCS 00-07, University of Pennsylvania.Google ScholarGoogle Scholar
  7. Fei Xia. 2001. Automatic Grammar Generation from Two Different Perspectives. PhD dissertation, University of Pennsylvania. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fei Xia, Martha Palmer, Nianwen Xue, Mary Ellen Okurowski, John Kovarik, Fu-Dong Chiou, Shizhe Huang, Tony Kroch, Mitch Marcus. 2000. Developing Guidelines and Ensuring Consistency for Chinese Text Annotation. In Proc. of the 2nd International Conference on Language Resources and Evaluation (LREC-2000), Athens, Greece.Google ScholarGoogle Scholar
  9. Nianwen Xue and Fei Xia. 2000. The Bracketing Guidelines for the Penn Chinese Treebank Project. Technical Report IRCS 00-08, University of Pennsylvania.Google ScholarGoogle Scholar
  10. Nianwen Xue. 2001. Defining and Automatically Identifying Words in Chinese. PhD Dissertation, University of Delaware, 2001.Google ScholarGoogle Scholar
  1. Building a large-scale annotated Chinese corpus

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1
          August 2002
          1184 pages

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 24 August 2002

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,537of1,537submissions,100%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader