Skip to main content
Log in

Automatic assessment of divergent thinking in Chinese language with TransDis: A transformer-based language model approach

  • Original Manuscript
  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

Language models have been increasingly popular for automatic creativity assessment, generating semantic distances to objectively measure the quality of creative ideas. However, there is currently a lack of an automatic assessment system for evaluating creative ideas in the Chinese language. To address this gap, we developed TransDis, a scoring system using transformer-based language models, capable of providing valid originality (novelty) and flexibility (variety) scores for Alternative Uses Task (AUT) responses in Chinese. Study 1 demonstrated that the latent model-rated originality factor, comprised of three transformer-based models, strongly predicted human originality ratings, and the model-rated flexibility strongly correlated with human flexibility ratings as well. Criterion validity analyses indicated that model-rated originality and flexibility positively correlated to other creativity measures, demonstrating similar validity to human ratings. Study 2 and 3 showed that TransDis effectively distinguished participants instructed to provide creative vs. common uses (Study 2) and participants instructed to generate ideas in a flexible vs. persistent way (Study 3). Our findings suggest that TransDis can be a reliable and low-cost tool for measuring idea originality and flexibility in Chinese language, potentially paving the way for automatic creativity assessment in other languages. We offer an open platform to compute originality and flexibility for AUT responses in Chinese and over 50 other languages (https://osf.io/59jv2/).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. According to Beaty & Johnson (2021), raters should familiarize themselves with all facets of creativity, such as uncommonness, remoteness, and cleverness, before making judgments. It is important for them to quickly review all responses to discern commonness and uniqueness trends. The entire scale should be utilized to ensure an approximately normal distribution of scores. After completing the initial rating, revisions are encouraged to ensure accuracy.

  2. Sung et al. (2022) constructed and validated a computerized creativity assessing system based on a figure association task, in which the originality and flexibility scores were calculated based on a Word2Vec language model. This scoring system indirectly generated the originality score by calculating the semantic distances between the noun extracted from the response and noncreative benchmark responses, which require word segmentation of the multi-word Chinese responses. Therefore, it cannot calculate the semantic distance between the response and the AUT prompt directly as most English AUT scoring systems do.

  3. An averaged random intraclass correlation coefficient (ICC2k) using an absolute agreement definition was calculated. The two items with lower ICCs were later left out of the analysis.

  4. Word2Vec fastText-chinese (Bojanowski et al., 2016): https://fasttext.cc/docs/en/pretrained-vectors.html.

  5. BERT bert-base-Chinese (Devlin et al., 2018): https://huggingface.co/bert-base-chinese.

  6. SBERT paraphrase-multilingual-mpnet-base-v2 (Reimers & Gurevych, 2020): https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2.

  7. SBERT paraphrase-multilingual-MiniLM-L12-v2 (Reimers & Gurevych, 2020): https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2.

  8. SimCSE simcse-chinese-roberta-wwm-ext (Gao et al., 2021): https://huggingface.co/cyclone/simcse-chinese-roberta-wwm-ext.

  9. https://dumps.wikimedia.org.

  10. The semantic distance calculated from the [CLS] embeddings of BERT showed nearly zero correlations with human ratings. Therefore, we only included mean-pooling strategy for BERT.

  11. https://huggingface.co/hfl/chinese-roberta-wwm-ext.

  12. The results remained consistent when applying Top-2 scoring instead of Top-3 scoring (see Supplementary Table 2). SBERT_mpnet, SBERT_MiniLM, and SimCSE consistently exhibited optimized performance in scoring the originality of AUT bedsheet and AUT toothbrush.

  13. A sensitivity analysis with different prior information is presented in Supplementary Table 1. The correlations between latent human-rated originality and latent model-rated originality remained positive.

  14. As Supplementary Fig. 1 showed, the correlation between latent semantic distance originality and human-rated originality remained nearly the same when using top-2 scoring instead of top-3 scoring. r = .87, p < .001.

  15. Word2Vec word count was determined after the removal of stop words.

References

Download references

Author note

This research was supported by a grant from the Chinese National Natural Science Foundation (32271125) awarded to Yubo Hou. We have no conflicts of interest to disclose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yubo Hou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(DOCX 152 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, T., Zhang, Q., Sun, Z. et al. Automatic assessment of divergent thinking in Chinese language with TransDis: A transformer-based language model approach. Behav Res (2023). https://doi.org/10.3758/s13428-023-02313-z

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-023-02313-z

Keywords

Navigation