Introduction

ChatGPT is a conversational agent released by the US company OpenAI in November 2022. It is a generative AI tool that is based on large language models and pre-trained using a massive amount of data from the internet. The current version of ChatGPT (as of August 2023) makes use of GPT-4, a language model with more than a trillion parameters adjusted through machine learning. Soon after its release, ChatGPT received worldwide attention because its results are often impressive, and it is free and easy to use. ChatGPT is the most prominent of several generative AI tools that harvest masses of internet data and provide easily usable interfaces, allowing for language as both input and output. With these tools, users can ask any kind of question, and the tools provide written answers. Users can then re-formulate their questions, specify their expectations for answers, and build on former utterances.

These AI tools not only learn from their massive textual foundation but also from the users’ reactions, so users can start an almost human-like conversation with the chat bot. Generative AI tools, however, do not possess any conceptual knowledge and have no conscious understanding of the world. They might produce text that might look adequate at first glance but lacks truth and validity or refers to entities that do not exist. With their own utterances, users need to provide relevant concepts as input, which the AI can use as a starting point for its output. These prompts influence the quality of the output and the fit to users’ needs and expectations.

Current generative AI tools, however, are just a first impression of what AI-based conversional agents will probably become in the future. What do these new developments mean for learning, education, and knowledge-related processes? Do they, eventually, impede human learning, or promote it? In the corresponding debate the range of positions reaches from the demand for general bans to the ubiquitous use of such tools in education (for an overview see Kasneci et al., 2023; see also Cooper, 2023). We are convinced that CSCL research can make important contributions to this debate by asking the right questions and reflecting on what is already known. The general question that arises is: What is the human part of communication and text production, and what could be the AI part?

From a CSCL perspective, the interaction of multiple agents is relevant to communication, knowledge construction, and written composition. Communication in this context refers to an information-exchange process in which meaning is constituted and in which new knowledge can emerge in a group, which we refer to as knowledge construction. The concept of knowledge used here is therefore not a purely individual one, as it has a long tradition in philosophy and psychology, but a social concept of knowledge (for an overview see Oeberst et al., 2016). If AI in the form of large language models communicates so much like humans, isn't it also an agent (i.e., an acting entity) that can interact with human agents in a way that lets new knowledge emerge? Such an interaction would encompass true collaboration, that is, an interaction in which the actors involved share processes, goals, and outcomes (cf. Jeong et al., 2017)—even though the AI has no conscious mind of its own. What does this interplay between people and AI look like and how can it be described? How can the potentially complementary capabilities of humans and AI be leveraged (Dellermann et al., 2019)?

We want to contribute to this discussion from a CSCL and Learning Sciences perspective. We aim to stimulate the CSCL community to harvest decades of CSCL research and apply its insights to conversational AI tools. Which findings from CSCL research are applicable, what are potential pitfalls, and what are new research questions CSCL will have to deal with in the future? Our reflections mainly borrow from a cognitive-constructivist perspective, but selectively also touch socio-cultural, technological, and ethical considerations.

Writing and co-construction of knowledge

Key CSCL approaches consider writing not just as a product of a knowledgeable writer, but as a means for learning and knowledge communication. Scardamalia and Bereiter (1987) distinguish knowledge telling from knowledge transforming. After receiving a writing assignment, novice writers engage in knowledge telling in that they search their long-term memory for content that fits the appropriate genre and then start to write what they know in a relatively linear way. The writing of experts tends to encompass knowledge transforming, which is more complex. Here, the assignment creates two problem spaces, one regarding content, the other regarding rhetoric. While writing, writers repeatedly analyze the problem by considering content and rhetoric, which leads to continuous revisions and updated memory searches. This may not only create a coherent text, but also new knowledge in the individual writers. It transforms their knowledge into complex relational argumentation structures and can eventually induce conceptual change (Andriessen et al., 2013; Kimmerle et al., 2021). Research on self-explanation shows that the development of argument structures can be supported by prompts that enable learners to use typical components of argumentation, such as statements of theory and evidence, statements of alternative theories, and rebuttals and counterarguments against the original theory (Schworm & Renkl, 2007; Toulmin, 1958). In CSCL environments, scripts can be implemented, which include such prompts (Stegmann et al., 2007).

In their knowledge building approach, Scardamalia and Bereiter (2006) use these argumentation processes to support knowledge development and increase understanding in groups of learners. Learners exchange individual observations, hypotheses, or explanations for a given problem, such as the issue of how planes can fly. Prompts help them to classify their contributions and create an information structure in which the group proceeds to build knowledge by fading out hypotheses that are not well supported and strengthening those that have more validity. Here argumentation is not just a means for individual learning, but for collaborative knowledge progression, in which the whole group reaches improved understanding over time. Like other CSCL theories, knowledge building stresses that knowledge construction does not take place only in individual learners’ minds. Stahl (2006) refers to similar knowledge progression in the group cognition approach. Through the interaction of its members, small groups can reach new understandings. Signs for such emergent processes are instances when people take up ideas from others (Suthers et al., 2010) or stimulate new and shared insights, for instance, by using analogies (Roschelle & Teasley, 1995). Going a step further, Trausan-Matu (2009) considers learning and understanding as a social and interindividual process in which multiple voices come into contact. Utterances of different contributors produce resonance and create a polyphony where “different voices singing variously on a single theme” (Bakhtin, 1986, p. 42; see also Koschmann, 1999). Following this idea, Trausan-Matu et al. (2014) opted for learning settings where this polyphony is supported through divergent and convergent phases of knowledge construction. This can be particularly productive when the participants begin to pull in the same direction in terms of argumentation.

The co-evolution model describes knowledge construction as a structural coupling of meaning-based cognitive and social systems (Cress & Kimmerle, 2008; Kimmerle et al., 2015). An individual’s cognitions constitute an autopoietic cognitive system (Maturana & Varela, 1991; see also Varela et al., 1974): Cognition and knowledge-related processes stimulate further cognitions, perceptions, and interpretations. Social systems are formed by communication, in which utterances lead to further utterances, also constituting autopoietic systems (Luhmann, 1986). Cognition and communication each have their own logic, but both use language as a medium and can influence each other. Cognitions (within cognitive systems) stimulate utterances (among people), and utterances, in turn, stimulate cognitions. This can be particularly the case when cognitions and utterances do not simply match, that is, when expectations are not entirely fulfilled but slightly violated. In this case, cognitive and social systems can irritate each other. This corresponds to the system-theoretical view of what Piaget (1977) calls socio-cognitive conflicts (see also Mugny & Doise, 1978). These mutual irritations may lead to individual learning and collective knowledge construction. Irritations and productive friction are processes that trigger knowledge development (Holtz et al., 2018).

Implications for collaborative learning with generative AI tools

To what extent do the considerations presented so far apply to communication and knowledge construction with generative AI models? How is cognition distributed when one of the interacting agents acts like a human individual but is a non-conscious entity (for an elaborate approach on distributed cognition in terms of information processing that is dispersed across humans, technologies, and their environment see Hollan et al., 2000; Hutchins, 1995)? Is the polyphony of different voices valid when some of these voices are not even human? Is it even communication if a language-producing system makes contributions but is not a cognitive system?

The preceding overview of some key insights from CSCL research raises several issues that need to be considered when reflecting on ChatGPT and similar tools for knowledge construction purposes. We argue that processes, which in previous research referred to human–human interaction exclusively, to some extent can be translated over to human-AI interaction: Based on their own knowledge and needs for information, humans phrase questions for an AI agent. They aim to use (supposedly) appropriate prompts that stimulate output from the agent to enter into a conversation. But since AI agents do not represent meaning-based cognitive systems, they are not meaning-constituting partners; they simply rely on word patterns and frequencies.

Thus, generative AI models can be used as answering tools that provide content to user questions, but they are different from search engines that deliver concrete references to links, texts, or pictures from the internet, which users can scrutinize for reliability and validity. Instead, generative AI tools are agents with vast amounts of associations, which construct new output for each interaction. Users’ questions and statements trigger this text production. Unlike humans, AI agents do not conceptually understand this input or derive meaning. They provide answers to virtually every prompt, but validity is not reliable. With these strengths and weaknesses, they can indeed serve as interaction partners. This partnership, however, requires that users not only ask questions and receive information from an AI agent, but must also aim for a dialog in which both partners exchange information and stimulate each other’s contributions and reflections. These considerations are in line with the ICAP framework (Chi & Wylie, 2014) that describes different modes of learners’ engagement. This framework indicates that learners who are increasingly engaged with learning tools or material, move from a passive to an active, then constructive, and finally an interactive mode, accompanied by a rise of learning activities. The transferability and application of these modes to dealing with generative AI should be relatively straightforward as experienced users of generative AI may reach an increasingly interactive mode over time.

For such a dialog between human users and AI agents, users must be aware that communication and text creation not only refer to content knowledge, but also to rhetorical knowledge. Writers have concepts about different genres that affect their writing. It makes a difference whether the goal of a discourse is drafting an argumentative essay or an election speech. Both differ in the kind of arguments being used, the choice of language, the complexity of sentences, or the length of text. However, not only humans possess and make use of rhetorical knowledge, but AI tools possess rhetorical capabilities as well. If asked for different genres, they also produce diverse types of text. To support joint processes of knowledge transforming (not just knowledge telling) optimally, users need to be able to make use of translations between a content problem space and a rhetorical problem space. Users need prompts for each of the problem spaces, for example, “Imagine you are an expert and have to explain to 10-year-old students how planes can fly” for the rhetorical problem space, or “Are there any facts that speak against this explanation of how planes fly?” for the content problem space.

Previous research of argumentation and knowledge building may provide a basis for identifying optimal prompts that support human-AI communication. Conceptual and empirical research on prompting processes in human-AI communication is urgently needed. In particular, a better understanding is required of how prompts for content and prompts for rhetorical issues influence the AI outcomes and the resulting knowledge construction. The CSCL community needs to develop procedures to convert the prevalent knowledge telling activities to knowledge transforming, in which both partners take up information from each other and reciprocally link and integrate information. For this purpose, ethnomethodological approaches (Garfinkel, 1967), as they have been applied for observing knowledge construction in human groups (see Stahl, 2012), could be further developed, and adjusted to human-AI interaction. This approach could look at individual or multiple users in their actual interactions with ChatGPT, embedded in their everyday lives, to understand how they accomplish these interactions. The assessment of the situational practices observed in this process could be complemented, for example, by interviews and questionnaires to provide a more complete understanding of these practices.

For optimal interactions, humans need awareness of their AI partner: What was the text basis used to train the tool? What kind of data and information does the AI rely on, how does it create its output? Research has described the relevance of group awareness (the perception of certain characteristics of a group and its members) for interaction among humans (e.g., Engelmann et al., 2009). In this new format of human-AI interactions, the awareness concept needs to be refined to allow for more asymmetrical relationships. Humans may know about an AI’s competencies and shortcomings in advance as part of their digital literacy. But during collaboration there may also be indicators that both partners must identify and interpret. It is an open question whether just the human partner or also the AI partner can create awareness of the respective partner—depending in part on whether one assumes that awareness requires consciousness. Research needs to consider how communication should unfold to allow the AI partner to develop a kind of theory of mind (Holterman & van Deemter, 2023; see also Trott et al., 2023), meaning that it would be able to recognize the mental state of its interaction partners (including thoughts, feelings, expectations, and motivation), allowing for perspective taking. This would enable communication that is more than knowledge telling and exchange of information to move in the direction of knowledge transforming.

As the AI partner simply relies on associations and does not derive meaning, the question arises whether knowledge construction solely occurs with respect to human agents while the AI just provides new information and induces irritations, or whether knowledge construction happens between both agents. While previous research used ethnographical methods to describe moments of shared understanding among human agents, future research should adapt these methods to human-AI collaborations. Can instances be identified where an AI, with the help of human partners, comes to new insights? Can shared knowledge construction be induced? Or is human-AI collaboration still a situation where both partners stimulate and irritate each other in such a way that knowledge development happens only within but not between agents?

Addressing these far-reaching questions requires the full methodological breadth that CSCL and the Learning Sciences have to offer. Beyond ethnographic and ethnomethodological research and discourse processing, experimental studies with human-AI dyads or groups could compare the efficacy of particular prompts and communication strategies for human learning and knowledge construction. In doing so, concrete borrowings can also be made from human–computer interaction research. Moreover, cognitive modeling can simulate people’s thinking and their mental information processing while interacting with generative AI in increasingly sophisticated computerized models. Large-scale studies with simulated human participants that induce particular prompts could also test how AI partners react, and how and to what extent they are able to adapt to people’s knowledge.

This vast array of potentially helpful methods can be cast into concrete research programs. As a first step, people (individuals but also small groups of people) should be systematically and extensively observed in their actual engagement with language-based AI tools. In the context of such exploratory studies, it can be observed and analyzed how people interact with ChatGPT and other tools, what they believe these tools can and cannot do. For these examinations, conversation analysis can be used as well as targeted analysis of uptake events (Suthers et al., 2010). In addition, participants can be asked to think aloud while collaboratively writing with the tools. In-depth interviews with participants could be a good way to get them to take a stand on their behavior and reflect on it. Finally, users could be examined to determine whether they actually developed an understanding (e.g., on how airplanes can fly) or whether completely new ideas were developed in the interaction.

Subsequently, experimental research programs are needed. Here, the variation of a large number of independent variables is conceivable. For example, it could be varied what people know or think they know about the content being addressed. In addition, their knowledge about and expectations of generative AI tools could be manipulated. Varying the (perceived) trustworthiness is also feasible and it can then be recorded to what extent persons recognize this trustworthiness. Moreover, experimental studies could vary the alleged capabilities (e.g., in terms of supposed consciousness) of the tools and capture in what way humans would assign an assumed theory of mind or other typically human abilities to the AI tool and how this in turn affects human-AI collaboration and knowledge construction.

Overall, there are many unanswered questions, but they would certainly be worth answering. CSCL research has the theoretical equipment and the methodological instruments to pursue these important questions. Only then can CSCL research and practice succeed in keeping pace with the currently rapid technological developments and be prepared for a future in which such tools are an integral part of everyday life.