Introduction

Quantum mechanics is a topical theme of physics in general (cf. Acín et al., 2018), and of physics education research in particular (cf. Bitzenbauer, 2021b). With today’s technological advancements, students may not only come into contact with quantum physics in formal learning settings, e.g. in undergraduate university courses (cf. Galvez et al., 2005; Marshman & Singh, 2019; Passante, Emigh, & Shaffer, 2015; Pearson & Jackson, 2010; Singh, 2001; Zhu & Singh, 2012a; Zhu & Singh, 2012b), but also in the informal context: For example, interested learners can access the quantum world via multiple digital resources, such as smart-phone/tablet apps (e.g. Oss & Rosi, 2015), AR/VR applications (e.g. Dorland et al., 2019; Suprapto, Nandyansah, & Mubarok, 2020), games (e.g. Chiofalo, Foti, Michelini, Santi, & Stefanel, 2022; Seskir et al., 2022), or explanatory videos (e.g. Bitzenbauer, 2021a).

Explanatory videos are brief videos — typically up to 10 min maximum — aimed at introducing and explaining a certain topic of interest (cf. Wolf & Kratzer, 2015). They have increasingly been discussed in science education research in recent years (e.g. Kulgemeyer & Wittwer, 2022; Pekdag & Le Marechal, 2010; Schroeder & Traxler, 2017), both in the context of formal and informal learning environments, in particular on YouTube (e.g. Beautemps & Bresges, 2021; Kulgemeyer & Peters, 2016; Pattier, 2021). In the literature, factors that seem to be conducive to the success and popularity of explanatory YouTube videos on scientific topics have been revealed (Beautemps & Bresges, 2021; Welbourne & Grant, 2016), e.g. regarding the structure of a video (Beautemps & Bresges, 2021). While it is desirable to reach as many people as possible, the main goal associated with the development of explanatory videos, of course, is to support student learning. Besides ensuring the success of the video, creators thus have to increase the quality of the explanations offered in their explanatory videos.

From the physics education research perspective, it is crucial to assist learners, teachers, and university lecturers in selecting videos with high explanation quality from the plethora of (online) resources. In the case of YouTube explanatory videos, their popularity is publicly shown by means of different surface features, such as the number of views, the ratings of the video (e.g. the number of likes), or via the comment section. However, it remains open as to whether or not these surface features indeed correlate with the explanatory video’s explaining quality, and hence, may serve as some kind of quality indicator in this respect. In other words: Can teachers and students rely on them?

This question has already been posed by Kulgemeyer and Peters (2016). The authors presented a measure of explaining quality to investigate the abovementioned question in the context of YouTube explanatory videos on two topics from classical mechanics, namely Newton’s third law of motion and Kepler’s laws (Kulgemeyer & Peters, 2016). In their exploratory study, the number of content-related comments given by users below a specific video turned out to be the only variable that was statistically significantly correlated with the explaining quality of explanatory videos — neither the number of views nor the number of likes or dislikes showed correlations to explaining quality that were statistically significant (Kulgemeyer & Peters, 2016). Kulgemeyer and Peters (2016) see the need for further studies on the relationship between surface features provided by YouTube and explaining quality, in particular, regarding other topics. They developed a hypothesis on this relationship that requires further evidence, and furthermore, suggest that YouTube metrics are influenced not only by the explaining quality, but also by the medial design of the videos. Against this backdrop, videos on topics from quantum physics seem to add a valuable perspective here since the media design is even more crucial than in mechanics in order to make the invisible accessible to students: Quantum physics differs fundamentally from classical mechanics, especially since its concepts are not directly visible with the naked eye. Thus, explanations of quantum physics topics arguably require specifically varied explanations. As a result, the question arises as to whether or not the metrics of YouTube explanatory videos about quantum concepts show similar correlations to an established measure of explaining quality as has previously been revealed by Kulgemeyer and Peters (2016) for explanatory videos on classical mechanics topics. This is where this research project comes in: We investigate the explaining quality of YouTube explanatory videos on two genuine quantum physics topics without classical analogies, namely quantum entanglement and quantum tunnelling. To this end, the research methods used by Kulgemeyer and Peters (2016) were leveraged into our study. The objective of the research project presented in this article is to expand on results of Kulgemeyer and Peters (2016) by exploring correlations between the YouTube surface metrics (e.g. likes, dislikes, views, number of days since release, number of relevant comments) of explanatory videos on these two quantum topics and the explaining quality of these videos.

Research Questions

The present study addresses the following research questions:

  1. 1.

    How is the explaining quality of YouTube explanatory videos on quantum entanglement and quantum tunnelling correlated with the videos’ metrics such as the number of views, the number of likes, or the number of dislikes?

  2. 2.

    How is the number of content-related comments correlated with the explaining quality of YouTube explanatory videos on quantum entanglement and quantum tunnelling?

Research Background

Explaining Physics

Instructional explanations are “designed with the specific purpose of teaching a student or group of students” (Leinhardt & Steele, 2005, p. 90). Hence, instructional explanations need to be distinguished from scientific explanations (Treagust & Harrison, 1999). Wittwer and Renkl (2008) uncovered four factors that lead to effective instructional explanations; for example, they should...

  1. 1.

    ...“be adapted to the learner’s knowledge prerequisites” (Wittwer & Renkl, 2008, p. 51),

  2. 2.

    ...“focus on concepts and principles” (Wittwer & Renkl, 2008, p. 53),

  3. 3.

    ...“should be integrated into the learners’ ongoing cognitive activities” (Wittwer & Renkl, 2008, p. 55), and

  4. 4.

    ...“should not replace learners’ knowledge-construction activities” (Wittwer & Renkl, 2008, p. 56)

These factors have been expanded to a total of nine factors in a 2019 review addressing instructional explanations in science teaching (Kulgemeyer, 2019, p. 90). An important criterion for effective instructional explanations is the adaption to the explainee because this criterion mirrors that explaining is to be regarded a constructivist process (Kulgemeyer & Peters, 2016).

The constructivist nature of explanations is reflected in the communication model for explaining physics presented by Kulgemeyer and Schecker (2013). This model consists of four pillars, namely the explainer, the explanation itself, the explainee, and the explainee’s feedback. The fact that a good explanation requires

  1. 1.

    Constant evaluation of the explainee’s feedback, and

  2. 2.

    Prompt adaptation of the explanation based on that feedback,

is at the heart of this model (Kulgemeyer & Schecker, 2013). According to the communication model for explaining physics, “the explainer can vary the explanation on four levels based on this feedback, ranging from the language code, the graphic representation form and the mathematic code, to using examples and analogies” (Kulgemeyer & Peters, 2016, p.3).

Design Principles for Explanatory Videos

The Cognitive load theory (e.g. Sweller, 1988; 1994; Sweller, van Merrienboer, & Paas, 1998) assumes a limited capacity of working memory caused by a cognitive load on learners in learning environments, which — in its modern view (cf. Sweller, van Merrienboer, & Paas, 2019) — is composed by

  • Intrinsic cognitive load which is dependent on the concrete learning task, the students’ prior knowledge, or the teaching materials used, and

  • Extraneous cognitive load stemming from irrelevant cognitive processes that tie up working memory capacities and thus hinder the learning process

According to Sweller et al. (2019), the Cognitive load theory “provides evidence-informed principles that can be applied to the design of instructional messages or relatively short instructional units, such as lessons, written materials consisting of text and pictures, and educational multimedia” (p. 274).

The Cognitive Theory of Multimedia Learning (cf. Mayer, 1999) builds upon the abovementioned Cognitive load theory. This theory is based on three fundamental assumptions that, taken together, describe how auditory-verbal or visual-imagery information is processed toward long-term memory:

  • The dual-channel assumption describes that “humans possess separate channels for processing visual and auditory information” (Mayer, 2009, p. 63)

  • The limited-capacity assumption describes that each of the abovementioned channels can only process a limited amount of “chunks” (Mayer, 2009, p. 67) of information simultaneously

  • The active-processing assumption describes that students’ active engangement is necessary for students constructing knowledge (Mayer, 2009)

Both the Cognitive load theory and the Cognitive Theory of Multimedia Learning have been the basis for prior research on explanatory videos aimed at fostering student learning (cf. Kruger & Doherty, 2016; Noor, Aini, & Hamizan, 2014). In addition, different studies have derived design principles that may influence the effectiveness of explanatory videos against the backdrop of the abovementioned theories (e.g. Brame, 2016; Kay, 2014; Muller, 2008). For example, it has been indicated that the integration of interactive elements into explanatory videos (Delen, Liew, & Willson, 2014) or the use of a 1st-person perspective in explanatory videos (Fiorella, van Gog, Hoogerheide, & Mayer, 2017) might have a positive impact on students’ performance. Findeisen, Horn and Seifried (2019) reviewed and systematized studies dealing with potential effects of explanatory videos’ design principles on student learning, and derived guidelines for the development of explanatory videos based on the overall picture emerging from current empirical findings.

Explaining Quality of Explanatory Videos

In the previous sections, we reviewed both the current state of research on explaining physics and on design criteria for the development of explanatory videos. In this section, both perspectives are merged in order to shed more light on the state of research on the explanatory quality of explanatory videos.

Kulgemeyer (2020) presented a framework for effective explanation videos. This framework is

  • ...consistent with guidelines on the quality of explanatory videos published elsewhere in the literature (e.g. Brame, 2016; Findeisen et al., 2019), and

  • ...acknowledges research on multimedia learning (Kulgemeyer, 2020),

while building upon state-of-the-art research on instructional explanations (e.g. Geelan, 2012, Wittwer & Renkl, 2008). In this framework, seven factors comprising a total of 14 features are described to have an impact on the effectiveness of explanatory videos (Kulgemeyer, 2020, p. 2450). Examples are the use of summaries (factor: structure of the video), the use of an appropriate language-level (factor: tools for adaption), the avoidance of digressions (factor: minimal explanation), or the adaption to prior knowledge, misconceptions, and interest (factor: adaption). An overview of the whole framework for effective explanation videos is presented in Kulgemeyer (2020, p. 2450).

The abovementioned framework has been tested empirically in order to clarify as to whether or not an explanatory video developed with respect to the framework leads to higher student achievement compared to a video that has not strictly been developed according to the framework (Kulgemeyer, 2020). The results of this study revealed that students learning with an explanation video adhering strongly to this framework showed significantly more declarative knowledge in a post-test than students learning with a video that has not strictly been developed according to the framework (d = 0.42). However, no statistically significant difference in the post-test scores regarding conceptual knowledge was observed.

Evaluation of Explanatory Videos’ Explaining Quality

An online test which allows for the assessment of physics explanatory skills has been published by Bartels and Kulgemeyer (2019). This test has been developed both for its usage in teacher education and for self-assessment.

Moreover, based on the communication model for explaining physics (Kulgemeyer & Schecker, 2013), Kulgemeyer and Tomczyszyn (2015, p. 121) developed a process-oriented and category-based measure for the assessment of explanation skills. Kulgemeyer and Peters (2016) adopted this category-based measure for the evaluation of explanatory videos’ explaining quality. The category system to evaluate explanatory videos’ explaining quality (cf. Appendix) consists of seven main categories (content, structure, use of language, contexts and examples, mathematics, interrogation, non-verbal elements) comprising a total of 31 different categories. Each of these categories is either assigned to a certain explanatory video (= 1 point) or not (= 0 points). Four out of the 31 subcategories ((1) scientific mistake, (2) ignoring students’ comment, (3) leaving new technical term uncommented, (4) without context) are related to a decrease of explaining quality, and hence, a negative point (= − 1 point) is allocated to the video for their occurrence.

Within the scope of evaluating the explaining quality of explanatory videos (i.e. in the course of categorization), each category is considered uniformly and there is no counting of a successive occurrence of the same category, “since repetitions of the same wording or the repeated use of a similar explaining aid without any variation are not considered a rich and varied explanation” (Kulgemeyer & Peters, 2016, p. 6). By summing up the points received on the basis of the categories assigned, a specific number of “category points” (Kulgemeyer & Peters, 2016, p. 6), referred to as CP, can be calculated for a given explanatory video (Kulgemeyer & Peters, 2016, p. 6):

$$ \text{CP} = \sum X_{+} + \sum X_{-}, $$

where X+ denotes the number of positive categories assigned to a video, and X stands for the number of all negative categories assigned to a video. The category points (with the upper limit of 28 CP) serve as a measure of an explanatory video’s explaining quality as has been shown by Kulgemeyer and Peters (2016).

It is important to note that the category points assigned to a specific explanatory video may neither judge the video’s overall quality (e.g. a video’s technical design is not taken into account) nor do the CP help finding the best explanation of a specific topic under investigation among multiple explanatory videos. Instead, the rationale underlying this measure is “to distinguish between rich and varied explanations on the one hand and those with fewer variations on the other” because “those with fewer variations in their explanations may be less suitable for a wider range of viewers as some learners’ needs may not be considered” (Kulgemeyer & Peters, 2016, p.9).

Methods

In this section, we outline the methodology applied in our exploratory study to approach a clarification of the research questions. We aim at expanding on the study of Kulgemeyer and Peters (2016) according to which none of the correlations between the surface features provided for YouTube explanatory videos and their explaining quality was statistically significant, except from the number of content-related comments. In a further study, Kocyigit and Akaltun (2019) even conclude that the “number of views, likes, dislikes, and comments per day is not a predictor of high-quality videos on YouTube” (p. 1267).

Sample

Content Domain

We decided to analyze YouTube explanatory videos on two topics: (a) quantum entanglement and (b) quantum tunnelling. We analyzed videos addressing these topics because neither quantum entanglement nor quantum tunnelling has any classical analogy, and the quantum physics formalism does not enable a space-time description of these concepts (cf. Ubben & Bitzenbauer, 2022). In this way, our study allows best to contrast the previous findings of Kulgemeyer and Peters (2016) who analyzed explanatory videos on topics of classical mechanics.

Inclusion-Exclusion Criteria and Search Procedure

Following Kulgemeyer and Peters (2016), we found the videos to be included in our sample via YouTube’s search engine applying the search strings “quantum entanglement” and “quantum tunnelling”, respectively. We used the following inclusion-exclusion criteria for selecting videos appropriate for data analysis:

  • The video is published in the English language

  • The video exclusively covers one of the two topics quantum entanglement or quantum tunnelling, respectively. Videos that covered both topics were excluded Furthermore, videos addressing one of the topics under investigation plus at least one further (different) topic were excluded.

  • Video-recorded lectures (or excerpts thereof) were excluded, since recorded lectures “do not share the explainers’ intentional core of publishing a concise explanatory video” (Kulgemeyer & Peters, 2016, p.4)

  • The video has a maximum duration of 10 min

  • The video is scientifically sound (cf. Kulgemeyer & Peters, 2016)

The latter criterion was important because it only makes sense to compare “the explaining quality of scientifically correct explanations” (Kulgemeyer & Peters, 2016, p.5). Applying the abovementioned search strings, we found more than 100,000 videos on both topics. A title-caption screening of the search results led to the exclusion of the majority of these videos since they did not fulfill the inclusion criteria (in this stage most often due to a duration above 10 min, the coverage of topics beyond the ones under investigation, or representing recorded lectures). In a next step, we reviewed about 200 videos on each of the topics quantum entanglement or tunnelling in detail. Again, we excluded the videos that did not fulfill the inclusion criteria (in this stage most often due to serious scientific errors). Lastly, for our final sample, we (a) settled on videos with comparable run-times of around 5 min as has been done in the prior study conducted by Kulgemeyer and Peters (2016), and (b) aimed for a sample size comparable to the one of the earlier study in the classical mechanics context (Kulgemeyer & Peters, 2016). The final sample consists of 60 YouTube explanatory videos that were included for data analysis, 30 of which address the topic of quantum entanglement, and 30 of which focus on quantum tunnelling.

Description of the Sample

The mean duration of the selected videos is m = 4.97 min with a standard deviation of SD = 2.43 min. The explanatory videos on quantum entanglement (m = 4.74 min, SD = 2.38 min) were of similar length as those on quantum tunnelling (m = 5.20 min, SD = 2.48 min). Moreover, the videos in our sample are of similar length as the ones included in the prior study (cf. Kulgemeyer & Peters, 2016).

Data Collection

The explanatory videos included in the final selection have been analyzed in August and September 2021. For the exploration of our research questions, the data collection comprised three aspects: In a first step, we collected each video’s surface features, i.e. the number of likes and dislikes, the number of views, and the publication date to calculate the videos’ time online (in days). Additionally, we recorded the number of subscribers to the channels by which the videos were published. The average view duration has been a further surface feature which was included in the study of Kulgemeyer and Peters (2016 p. 5) on explanatory videos on classical mechanics topics. However, at the time of conducting our data collection, this feature was not publicly accessible anymore and hence, it is not included in our analysis. In addition, the dislike statistic is not publicly available anymore since the end of 2021 — since our data collection was conducted in August and September 2021; however, we kept the number of dislikes for each video in our dataset and also included it in the data analysis. This allows for a more comprehensive comparison to the earlier results published by Kulgemeyer and Peters (2016) and may help to better understand the interaction of users with explanatory videos. For a description of all the abovementioned YouTube metrics, we refer the reader to the YouTube Analytics and Reporting APIs (2022).

In a second step, we categorized the comments given below the videos in order to receive the number of relevant comments for each video. We provide a proper description of (a) the term relevant comment and (b) the categorization procedure in the data analysis section. We explored relevant comments because they “provide by far the most intense communication channel between explainer and addressee” (Kulgemeyer & Peters, 2016, p. 5).

Lastly, following the data collection method from Kulgemeyer and Peters (2016), we used the category system described above (cf. Appendix) to assess the explaining quality of the explanatory videos included in our sample. The coding was performed by two independent raters. The inter-rater reliability expressed via Cohen’s kappa can be regarded substantial (κ = 0.79) according to Cohen (1988). Against this backdrop, the category system used in this study allows for an objective assessment of the explaining quality of explanatory videos. Furthermore, the reliability of the measure has been found to be satisfactory (Cronbach’s α = 0.58; in the earlier study by Kulgemeyer and Peters (2016), a comparable value of α = 0.69 has been reported). Moreover, the category system used for this study allows for a valid measure of explanatory videos’ explaining quality as has been justified by Kulgemeyer and Peters (2016).

As a last step of data collection, we calculated the category points CP for each explanatory video included in our sample. These category points were then further processed to data analysis.

Data Analysis Carried Out the Answer Research Question 1

We report descriptive statistics (range, median Mdn, mean m, standard deviation SD) regarding the category points of the explanatory videos on quantum entanglement and quantum tunnelling, respectively. In addition, we introduce a new metric to investigate this research question: We assumed that the interaction with a specific explanatory video, i.e. giving a like or a dislike to a video, requires the user to be cognitively activated to some extent. We therefore introduced the variable interactions calculated via

$$ \text{interactions} = \sum \text{likes} + \sum \text{dislikes}, $$

to explore the relationship between explaining quality and the number of interactions. This variable will provide further insights into how users interact with explanatory videos depending on their explaining quality.

We conducted a correlation analysis in order to explore relationships between the videos’ explaining quality (in category points CP) on the one hand, and the surface features on the other hand. We report Pearson’s correlation coefficient r because the data are of metric scale. We interpret correlation coefficients according to Cohen (1988): weak correlation for 0.1 < |r| < 0.3, moderate correlation for 0.3 ≤|r| < 0.5, and strong correlation for |r|≥ 0.5. In addition, we report partial correlations to verify that observed relationships are no artefact caused by

  • The videos’ time online, i.e. the time that has passed between the publication of a video and the data collection, and

  • The number of subscribers to the channels by which the videos were published

The latter control variable seems particularly important due to the fact that the YouTube algorithms promote videos published by popular channels which in turn leads to high numbers of views for these videos. This might influence the results, and hence, deserves special attention.

Data Analysis Carried Out the Answer Research Question 2

The comments below each video included in our sample have been categorized. For the categorization, we used the category system presented by Kulgemeyer and Peters (2016) which consists of four categories:

  1. 1.

    Comment on content: “further question or comment on notations” (Kulgemeyer & Peters, 2016, p. 8)

  2. 2.

    Comment on explanation: “constructive criticisms and inquiries for more videos” (Kulgemeyer & Peters, 2016 p. 8).

  3. 3.

    Comment on explainer’s style: “comments on the style including a reason” (Kulgemeyer & Peters, 2016, p. 8)

  4. 4.

    Comment on use: description of “the viewer’s use of the video, e.g. revising, preparing a talk or learning for a test” (Kulgemeyer & Peters, 2016, p. 8)

All comments that could be assigned to at least one of these categories were considered as relevant comments. Comments that could not be assigned to any of these categories, conversely, were excluded from further analysis because they were not related specifically to the content presented in the respective video or to the explanation offered within. For the further analysis, we refrained from a deeper differentiation between the different categories as has been done by Kulgemeyer and Peters (2016) because research question 2 only addresses relevant comments in general.

The categorization of the all comments underneath N = 60 explanatory videos included in our sample led to a total of 1452 relevant comments. The number of relevant comments for each video was included in our data set as a metric variable and was used for correlation analysis. Again, we additionally calculated partial correlations to verify that observed relationships are no artefact caused by the videos’ time online, or the number of subscribers to the channels by which the videos were published.

Results

Descriptives

The median value of the explanatory videos’ explaining quality (measured in CP) was Mdn = 11 CP for the total sample, ranging from 2 CP (assigned to one video of the sample) to 18 CP (assigned to two videos of the sample). In Table 1, descriptive statistics on the category points assigned to the videos comprised in our sample are reported separately for the two subject areas under investigation, namely quantum entanglement and quantum tunnelling, respectively.

Table 1 Descriptive statistics on the measure of explaining quality of the videos included in our sample (expressed in category points CP)

Correlation Analysis

The correlation analysis results are summarized in Table 2. Within the total sample, we find statistically significant correlations between the videos’ explaining quality and the number of views (r = 0.27, p < 0.05) as well as the number of likes (r = 0.37, p < 0.01). The highest correlation is uncovered between the videos’ explaining quality and the number of relevant comments (r = 0.46, p < 0.01), whereas the correlation between the videos’ explaining quality and their time online does not differ from 0 with statistical significance.

Table 2 Pearson’s correlation coefficient r between the measure of explaining quality (in CP) and the surface features (incl. number of relevant comments) for the total sample, the videos on quantum entanglement, and the ones on quantum tunnelling, respectively. For all correlations, we report 95% confidence intervals (95%-CI)

A striking observation is the positive correlation between the number of dislikes and the measure of explaining quality, both in the total sample (r = 0.32, p < 0.05) and the two sub-samples including videos on quantum entanglement (r = 0.37, n.s.) and quantum tunnelling (r = 0.30, n.s.).

Furthermore, the total number of user interactions is correlated significantly with the videos’ explaining quality — no matter of whether these interactions result in a like or a dislike in the end (r = 0.39, p < 0.01, 95% CI [0.14;0.59]). It is necessary to control the correlations presented in Table 2 for the videos’ time online (in days), and the number of subscribers to the channels by which the videos were published in order to explore this in more detail. Therefore, we report partial correlations in the next subsection.

Partial Correlations

In this subsection, we report partial correlations which refer to the entire sample. This means that we do not distinguish between the sub-samples here for the sake of clarity.

Controlling the correlations between our explanatory videos’ explaining quality (measured in CPs) and the YouTube surface features for the videos’ times online (cf. Table 3), we observe that all metrics correlate significantly with the explaining quality, ranging from highly significant (relevant comments) to significant (views and dislikes). These partial correlations uncover similar relationships between YouTube’s surface metrics and the videos’ explaining quality as the ones presented earlier (cf. Table 2).

Table 3 Partial correlations (controlled for the time online) between the measure of explaining quality (in CP) and YouTube’s surface metrics as well as the number of interactions

In a next step, we controlled for the number of subscribers to the channels by which the videos were published. The corresponding partial correlations are shown in Table 4: Only three of the correlations remain statistically significant in this case, namely the ones between the explanatory videos’ explaining quality and the number of likes (r = 0.43, p < 0.01), the number of relevant comments (r = 0.47, p < 0.001), and the number of interactions (r = 0.43, p < 0.01). In contrast, both the correlations of the videos’ explaining quality to the number of views, and the number of dislikes are not statistically significant anymore. We will discuss these observations in the “Discussion” section.

Table 4 Partial correlations (controlled for the number of subscribers) between the measure of explaining quality (in CP) and YouTube’s surface metrics as well as the number of interactions

Discussion

In our exploratory study, we investigated as to how the explaining quality of YouTube explanatory videos on genuine quantum topics such as quantum entanglement and quantum tunnelling is correlated with the surface features provided by YouTube alongside each online video. In this section, we discuss the results of our study with regard to our research questions, and against the backdrop of prior research.

Kulgemeyer and Peters (2016) analyzed the relationship between the explaining quality of instructional videos on classical mechanics topics and YouTube’s surface features. However, while a lot of scholars investigated the quality of instructional videos in general, to the authors’ knowledge, no further studies exploring correlations between explaining quality and surface features have been published in the literature. Brame (2016) researched the literature on how to manage cognitive load of educational videos and how to maximize student engagement with a video. Similarly, Findeisen et al. (2019) review studies to investigate how didactical elements of explanatory videos should be designed in order to best facilitate student learning. On another note, Beautemps and Bresges (2021) conducted a questionnaire survey to uncover key elements for a successful educational YouTube video from a viewers’ perspective. Nonetheless, an in-depth investigation of YouTube explanatory videos’ surface metrics and their relationship with the explaining quality has only been carried out in the context of mechanics (Kulgemeyer & Peters, 2016) which we will expand upon in the following.

Discussion of Research Question 1

While we observe slight differences between the videos on entanglement and those on quantum tunnelling in terms of correlations between their explaining quality and YouTube surface features (cf. Table 2), the global tendencies are similar for the videos on both topics. In particular, for each metric, the correlations’ 95% confidence interval in the entanglement group overlaps with the corresponding one in the tunnelling group to such an extent that each correlation coefficient lies in the confidence interval of the other one. Hence, the small deviations between the observed correlations for videos on entanglement and quantum tunnelling might be drawn back to the choice of a specific sample of N = 60 YouTube videos out of the cornucopia of videos in the depths of the internet for this study (cf. “Limitation” section). As such, we refrain from providing in-depth explanations for this observation — further research is needed to shed more light on this issue.

In total, our results compare well with the findings reported earlier for the mechanics context (cf. Kulgemeyer & Peters, 2016): While the correlations presented in both studies seem different at first glance (cf. Table 5), we note that most of the correlations reported by Kulgemeyer and Peters (2016) fall within the 95% confidence intervals of our correlation coefficients (or vice versa).

Table 5 Pearson’s correlation coefficient r between the measure of explaining quality (in CP) and the surface features provided by YouTube. For the correlations calculated in our study, we report 95% confidence intervals (95%-CI)

In addition, our results also shed new light on the underlying relationships: In their 2016 article, Kulgemeyer and Peters (2016) found no statistically significant correlation between the videos’ explaining quality and the number of likes although the authors expected such a correlation due to the “illusion of understanding”: “Students do not realize the possible inconsistencies in their understanding and feel as if they have understood a topic” (Kulgemeyer & Peters, 2016, p.11). This assumption is supported by empirical evidence from a recently published experimental study by Kulgemeyer and Wittwer (2022). For the explanatory videos on quantum topics included in our sample, we indeed uncovered a statistically significant correlation between the number of likes and the videos’ explaining quality (r = 0.37, p < 0.01).

Moreover, we find the number of dislikes (r = 0.32, p < 0.05) and the number of views (r = 0.27, p < 0.05) to have statistically significant correlations with the explaining quality of the videos on quantum entanglement and tunnelling. In contrast, Kulgemeyer and Peters (2016) have not found the corresponding correlations to be statistically significant for the videos on classical mechanics topics. The analysis of partial correlations, though, puts these differences between the two studies into perspective: We controlled the correlations between the videos’ explaining quality and the surface features provided by YouTube for the number of subscribers to the channels by which the videos were published. As a result, the correlation between explaining quality and views (r = 0.23) loses its statistical significance. To describe this observation, we go along with Kulgemeyer and Peters (2016) who state that “the number of views is more influenced by [...] the popularity of the YouTube channel than the explaining quality” (p. 5). Accordingly, the correlation between explaining quality and dislikes (r = 0.26) loses its statistical significance, though remaining moderate (cf. Table 4). One reason for this observation might be found in the way that the videos’ explaining quality is operationalized in this study: The category system used to assess a given video’s explaining quality does not comprise categories related to a viewers’ interests, viewer’s conceptions, or viewer’s level of background knowledge. Therefore, it seems possible that a viewer who (a) does not understand a given video, (b) feels bored when watching the video, or (c) feels academically overwhelmed might react with a “dislike” — independent from the explaining quality of the video.

Lastly, we newly introduced the number of interactions, i.e. the sum of likes and dislikes for a given YouTube explanatory video, into the analysis. The number of interactions correlates statistically significantly with the explaining quality of the explanatory videos on entanglement and tunnelling: r = 0.39, p < 0.01. The partial correlation — when controlling for the number of subscribers of the channels by which the videos are published — of r = 0.43, p < 0.01 was even higher.

Discussion of Research Question 2

The number of relevant comments turned out to be most strongly correlated with the explaining quality of explanatory videos (r = 0.46, p < 0.01 for the total sample) on quantum entanglement (r = 0.59, p < 0.01), and quantum tunnelling (r = 0.31, p < 0.1). Similarly, Kulgemeyer and Peters (2016, p.10) report a correlation of r = 0.38 (p < 0.01) between explaining quality and the number of relevant comments for videos on Newton’s third law and Kepler’s laws, respectively.

We controlled the correlations between the videos’ explaining quality and the number of relevant comments for the videos’ time online (in days). As a result, the partial correlation between explaining quality and number of relevant comments for the total sample increased (r = 0.55, p < 0.001). This result is comparable to the one reported for the mechanics context, where a partial correlation coefficient of p = 0.40, p < 0.01 was found (Kulgemeyer & Peters, 2016).

The medium to high correlation between the explanatory videos’ explaining quality and the number of relevant comments might be justified via the users’ cognitive activation: “Hence, videos that accumulate plenty of those relevant comments are more successful in catching viewers’ attention as these videos might use either a more stimulating explanation or the explanation delivered is considered as a starting point for further learning progress” (Kulgemeyer & Peters, 2016, p. 12).

Conclusion

Our results support the findings presented earlier for YouTube explanatory videos on mechanics (cf. Kulgemeyer & Peters, 2016), according to which

  • There is a statistically significant correlation between explaining quality and the number of content-related comments (r = 0.46, p < 0.001 in our study, cf. Table 2), and

  • YouTube’s surface metrics (e.g. likes) should be considered with caution when it comes to searching for high-quality videos since by calculating partial correlations controlling for the number of subscribers to the channels by which the videos were published, the correlations between the videos’ explaining quality and the number of views as well as the number of dislikes lose their statistical significance (cf. Table 4). Hence, YouTube’s surface features might not be fruitful indicators for the explaining quality of explanatory videos

However, focusing on YouTube explanatory videos addressing quantum entanglement and tunnelling, our study contributes to extending previous results presented by Kulgemeyer and Peters (2016) in two respects:

  1. 1.

    We find a statistically significant correlation between the number of likes and the explaining quality of explanatory videos on quantum topics (r = 0.37, p < 0.01, cf. Table 2). Although such a correlation has already been assumed in the previous study (cf. Kulgemeyer & Peters, 2016), it could not be found at that time in the context of explanatory videos on topics of classical mechanics.

  2. 2.

    Our study hints that the number of interactions (e.g. the sum of likes and dislikes) might be an indicator for videos of high explaining quality (r = 0.39, p < 0.01). We argue that this result fits well to the number of relevant comments being statistically significantly correlated with the explaining quality of explanatory videos (cf. Table 2).

Against the backdrop of the abovementioned observations, it seems crucial for educators and students alike to become aware of the fact that a reliable judgement of the explaining quality of a specific video solely based on YouTube’s surface features might be insufficient. In particular, it is noteworthy that the category system (cf. Table 6) provided in this article was developed for the assessment of scientifically sound explanatory videos’ explaining quality. Hence, when it comes to searching high-quality videos for educational purposes, a videos’ scientific quality is to be considered separately, e.g. taking into account the target group students’ prior knowledge. Consequently, the use of the credit points CP assigned to the videos in the course of this study may not be used without further reflection since this score is no indicator of a video’s scientific quality.

Limitations

It is important to note that the results presented in this article should be interpreted with caution for the following reasons:

  1. 1.

    We could only include a small number of N = 60 videos in our sample due to the huge amount of data and the great effort required for data analysis (e.g. categorization of all comments underneath each video)

  2. 2.

    Classical correlations, as presented in this article, allow for the exploration of relationships between variables, but not for the identification of causal connections

  3. 3.

    The data analysis is largely based on the metrics provided by YouTube, which are not fully transparent to users (cf. Kulgemeyer & Peters, 2016).

  4. 4.

    In this study, we only analyzed explanatory videos on the topics quantum entanglement and tunnelling, and hence, the correlations found are not generalizable to different topics

Outlook

Despite the abovementioned limitations, our results may serve as a valuable starting point for future research, in particular with respect to teaching and learning quantum concepts: While in this study only scientifically sound explanatory videos have been included for the analysis, the internet is crowded with scientifically misleading or mystifying explanatory videos on quantum concepts, such as quantum entanglement and quantum tunnelling. Therefore, future educational research should (a) explore widespread misconceptions in explanatory videos on quantum concepts, and (b) make further efforts toward the derivation of evidence-based selection criteria that support both students and teachers/lecturers in detecting high-quality content out of the dark noise.