Introduction

Self-reports of offending (SRO) have come a long way since their early stages in the 1950s, where only a few, minor types of delinquent behaviors were included (Thornberry & Krohn, 2000). Skepticism over the utility of these methods compelled criminologists to develop a large body of research on the validity and reliability of SRO (e.g., Farrington, 1973; Huizinga & Elliott, 1986; Jolliffe et al., 2003; Piquero et al., 2014), making self-reports one of the most widely used methods in the study of offending behavior (Gomes et al., 2018). Current knowledge about the prevalence and causes of offending, as well as risk and protective factors for juvenile delinquency, are almost exclusively reliant on the self-report methodology (Cops et al., 2016; Thornberry & Krohn, 2000). However, little is known about the impact of measurement biases, such as the ones caused by modes of administration and questionnaire format, on the reported rates of offending and data quality.

In a recent systematic review of methodological experiments using SRO, Gomes et al. (2019) found 21 experiments that explored a total of 18 different manipulations of potential biases relating to modes of administration, procedures of data collection, and questionnaire design. In this study, contrary to the large body of research on sensitive questions (e.g., Gnambs & Kaspar, 2015; Richman et al., 1999; Tourangeau & Yan, 2007), the methodological experiments on SRO failed to show any evidence of the benefits of self-administration over face-to-face interviews. The lack of evidence for mode effects on SRO led influential studies on crime measurement to conclude that self-reports are valid and stable over different modes of administration (e.g., Thornberry & Krohn, 2000). However, offending behavior is a highly sensitive topic (Gomes et al., 2022), and unless there are specific features of criminal behavior, the disclosure of offending should be subject to mode effects, at least to the same extent as other types of sensitive behaviors.

In the case that SRO are, in fact, affected by modes of administration, the failure to identify these mode effects will lead researchers to apply unstandardized measurement methods, resulting in biased outcomes and, ultimately, misleading conclusions about offending behavior. In the present study, we have developed a methodological experiment carried out in Portugal with a 2 (modes of administration: interviewer-administered vs. self-administered surveys) × 2 (modes of data collection: paper-and-pencil vs. computer-assisted surveys) factorial design, in order to test whether or not SROs are affected by modes of administration.

Sensitive questions

Sensitive topics in survey research can be defined as intrusive, posing a threat of disclosure, and eliciting socially desirable answers (Tourangeau & Yan, 2007; Tourangeau et al., 2000). An intrusive question can be construed as an inappropriate invasion of privacy. In this sense, the question itself is intrusive, independently of the participant’s truthful response. The dimensions of threat of disclosure and social desirability, on the other hand, are a product of the participant’s past experience and the perceived likelihood of their answers becoming known to other parties. A question on bicycle theft, for example, is nonconsequential for participants who have never committed such behavior, even if their answers were to become known to other people outside the research study. Participants who have stolen a bicycle, on the other hand, may experience feelings of shame, guilt, or fear of criminal consequences and thus refrain from providing a truthful answer to this question. As a result, respondents to sensitive questions may tend to systematically underreport their socially undesirable behavior (Tourangeau et al., 2000).

Evidence for the tendency to underreport sensitive behavior is well documented in the literature. For example, Liber and Warner (2018) compared data from cigarette-tax collections and nationwide surveys and concluded that respondents consistently underreport cigarette consumption over time. Giguère et al. (2019) used biomarkers of recent semen exposure among female sex workers in early antiretroviral treatments and concluded that respondents often underreport unprotected sexual intercourse. Studies using biomarkers to determine substance use (provided from blood, urine, saliva, or hair samples) show that respondents consistently underestimate consumption, such as alcohol (e.g., Kabashi et al., 2019; Littlefield et al., 2017; Vinikoor et al., 2018) and other drugs (e.g., Gerdtz et al., 2020; Palamar et al., 2021). Clark and Tifft (1966) used the polygraph as an external criterion for SRO and found evidence of underreporting of deviant behaviors. Further, studies using indirect measures consistently result in higher rates of reporting sensitive behavior than in direct questioning (Druckman et al., 2015; Kirtadze et al., 2018), including reports of offending behavior (e.g., Wolter & Laier, 2014). Because respondents to sensitive questions tend to underreport their socially undesirable behavior, survey researchers have explored methods to overcome the effects of question sensitivity. For example, measurement methods that provide anonymity and confidentiality to a respondent consistently result in higher rates of sensitive behavior (Bradburn et al., 2004).

The systematic bias of reporting higher rates of sensitive behavior in less threatening measurement conditions, where the motivation to provide socially desirable answers is reduced, cannot be explained by chance, memory faults, or the usual reporting error in survey bias (e.g., Schwarz, 1999). Rather, this evidence is consistent with the deliberate misreporting hypothesis (Bradburn et al., 1979; Tourangeau et al., 2000). According to the idea of deliberate distortion, respondents to sensitive questions deliberately edit their answers in order to avoid the embarrassment or consequences of admitting such behaviors. As a consequence, survey researchers have created the “more is better” assumption, in which measurement conditions that result in higher estimates of a socially undesirable behavior are assumed to be the most accurate (Tourangeau & Yan, 2007). This assumption is especially useful in behaviors where there is no gold standard to which self-reported information can be compared, such as offending behavior.

Modes of administration

One key variable that has repeatedly been shown to affect participants’ disclosure of sensitive behavior is modes of administration (Richman et al., 1999; Tourangeau & Yan, 2007). Mainly, self-administration of a questionnaire, in contrast to interviewer-administered modes, results in a steep effect in increasing participants’ willingness to report sensitive behavior (Sudman & Bradburn, 1974). In face-to-face interviews, participants are requested to disclose their sensitive behavior to a third person (i.e., the interviewer). This is expected to affect participants’ perceptions of confidentiality and anonymity, as well as social desirability, causing the above-described mode effects (Schwarz et al., 1991). Methodological experiments have provided evidence that self-administration causes increased rates of reporting multiple types of sensitive behavior, such as undesirable academic attributes (Kreuter et al. 2008), disclosure of non-heterosexual identity (Robertson et al. 2018), number of sexual partners (Jobe et al., 1997), suicidal ideation (Lee et al. 2019), and drug use (e.g., Aquilino, 1994; Butler et al., 2009; Schober et al., 1992; Turner et al., 1992). Tourangeau and Yan (2007) reviewed the survey methodological research on sensitive topics and concluded that respondents are more likely to disclose socially undesirable behaviors in self-administered conditions. Further, Tourangeau and Yan (in press) found that self-administration, in comparison to face-to-face interviews, resulted in an increase of reports of illicit drug use by 30%.

Survey research is increasingly transitioning from traditional paper-and-pencil questionnaires to computer-assisted modes of data collection. Computerized surveys are cheaper, they eliminate the need for printed questionnaires, data are automatically stored in databases and thus reduce data entry error, and computers allow for more complex branching questionnaires with skip questions, etc. (Lucia et al., 2007). Additionally, authors have suggested that computer-assisted modes increase perceived anonymity (e.g., Trau et al., 2013), raising the question of whether computerized modes of data collection impact participants’ willingness to disclose sensitive behavior. The research on this particular question is fairly inconsistent. Some researchers have found no evidence of mode effects caused by modes of data collection (e.g., Bates & Cox, 2008; Knapp & Kirk, 2003). Further, the meta-analysis carried out by Dodou and de Winter (2014) found no differences in social desirability between paper-and-pencil and computer-assisted surveys.

On the other hand, some methodological experiments have found higher rates of disclosure in paper-and-pencil questionnaires (e.g., Beebe et al., 1998, 2006), while others have found results in the opposite direction, indicating higher reports of sensitive behavior in computer-assisted modes (e.g., Brener et al., 2006). Richman et al. (1999) carried out a meta-analysis and found 61 experiments comparing results obtained in computer-assisted and paper-and-pencil questionnaires (a total of 673 effect sizes). They concluded that, within self-administered modes, computer-assisted surveys resulted in a higher prevalence of sensitive behavior disclosure. More recently, Gnambs and Kaspar (2015) focused on methodological experiments comparing self-administered disclosure in paper-and-pencil and computer-assisted modes of data collection (39 studies and 460 effect sizes). These authors found that computer-assisted surveys resulted in an increased odds of reporting sensitive behavior, especially for highly sensitive topics.

The impact of modes of administration on self-reports of offending

Criminal behavior is a highly sensitive topic. Offenders naturally try to conceal their illegal behavior, and they may feel ashamed or regret their delinquent practices. The disclosure of offending behavior not only causes embarrassment and socially desirable answers, but offenders may also fear potential criminal consequences (Thornberry & Krohn, 2000). Gomes et al. (2022) developed an assessment of question sensitivity based on the three-dimensional definition proposed by Tourangeau and Yan (2007). These authors showed that most offending questions scored higher on topic sensitivity than a question about sexual behavior, especially the more serious and violent offenses which participants rated as very highly sensitive (Gomes et al., 2022). For all these reasons, SRO are expected to be subject to reporting bias, at least to the same extent as other types of sensitive questions.

Unfortunately, methodological research on the response biases of SRO is very scarce. Gomes et al. (2019) systematically reviewed methodological experiments exploring potential response biases in the collection of SRO. In this review, the comparison between self-administered surveys using paper-and-pencil and computer-assisted modes of data collection was the most replicated manipulation within the SRO methodological literature (k = 10). Results were very inconsistent. Five experiments found evidence showing higher reports of offending in paper-and-pencil conditions, while the other five experiments showed higher disclosure in computer-assisted modes. However, similar to previous reviews (Gnambs & Kaspar, 2015; Richman et al., 1999), the overall effect of modes of data collection on SRO showed that computer-assisted modes resulted in higher rates of reporting of sensitive behaviors, though this was only marginally significant.

As for the impact of modes of administration on SRO, Gomes et al. (2019) found three studies that carried out a total of four experimental comparisons testing the effect of self-administration on respondents’ disclosure of offending behavior. Three experiments compared face-to-face interviews with paper-and-pencil questionnaires, and one of these studies also included a comparison between face-to-face interviews and audio-computer-assisted self-interview (ACASI). Results showed no significant effect of self-administration on participants’ rates of reported offenses. These results disagree with the general evidence regarding self-reports of sensitive behavior (e.g., Tourangeau & Yan, 2007). However, it is worth considering that two of these studies were carried out more than 40 years ago (i.e., Hindelang et al., 1981; Krohn et al., 1974), and the third study was developed with the objective of testing mode effects on reports of risky behavior, and only two types of offenses (i.e., carrying a weapon/gun and engaging in abusive/violent behavior after drinking) were included (Potdar & Koenig, 2005). These features may have limited the ability of these studies to find evidence of mode effects, and relying solely on these findings to conclude that SRO are not affected by modes of administration may be misleading. In sum, the question about what are the best practices to measure SRO is far from settled, and more methodological research using contemporary questionnaires of offending behavior is needed.

The present study

The aim of this study was to test whether SRO are affected by modes of administration and modes of data collection. The lack of evidence showing mode effects on SRO led influential reviews of crime measurement to conclude that modes of administration did not affect participants’ willingness to report offending behavior (e.g., Thornberry & Krohn, 2000). However, if the disclosure of criminal behavior is, indeed, affected by modes of administration, similarly to the disclosure of other types of sensitive topics, then using unstandardized modes of administration may have resulted in biased conclusions about criminal behavior. Further, with the progressive transition into computerized modes of data collection, it is important to test the extent to which computer-assisted modes affect participants’ reports of offending behavior in comparison to the traditional paper-and-pencil questionnaires.

In order to assess the impact of modes of administration and modes of data collection on SRO, we conducted a methodological experiment. This experiment followed a 2 (modes of administration: interviewer-administered vs. self-administered surveys) × 2 (modes of data collection: paper-and-pencil vs. computer-assisted surveys) factorial design in which participants were randomly assigned to one of the experimental conditions. Based on the findings in the literature about sensitive topics, we predicted that participants in the self-administered modes would report higher rates of offending behavior than participants in face-to-face interviews (Hypothesis 1) and that participants in computer-assisted modes of data collection would report higher rates of offending compared to participants assigned to the paper-and-pencil modes (Hypothesis 2).

Methods

Participants

One hundred and eighty-one students from a large University in the North of Portugal, mostly female (90.6%, n = 164), aged between 18 and 50 years (M = 20.57, SD = 3.66), participated in this experiment in exchange for course credits.

Design

The present study followed a 2 (modes of administration: interviewer-administered vs. self-administered surveys) × 2 (modes of data collection: paper-and-pencil vs. computer-assisted surveys) experimental design. The crossing of these manipulations resulted in four experimental conditions: paper-and-pencil interviewer-administered interviews (PAPI); computer-assisted interviewer-administered interviews (CAPI); paper-and-pencil self-administered questionnaires (SAQ); and computer-assisted self-administered questionnaires (CASI). Participants were randomly assigned to one of these survey methods and completed the same questionnaire.

Instruments

Participants in this study completed a questionnaire composed of three main sections. First, we have included a section on socio-demographic information (e.g., sex, age, education, and income). In the second section, participants were asked to complete questions about multiple sensitive behaviors, which included the offending behavior questionnaire. Both the socio-demographic section and the offending behavior questionnaire were drawn from the International Self-Report Delinquency 3 questionnaire (ISRD3; Enzmann et al., 2018; Portuguese version by Martins et al., 2015).

Behavioral questions followed the layout set by the ISRD3 questionnaire, in which questions were asked referring to lifetime prevalence and, in case of positive responses, participants were referred to an open-ended follow-up question about past-year incidence. Similar to previous ISRD3 studies (e.g., Doelman et al., 2021), the offending indexes were based on 12 questions on different types of deviant behavior (i.e., vandalism, shoplifting, burglary, bicycle theft, car theft, stealing from a car, stealing from a person, carrying a weapon, robbery, group fight, assault, and drug sales). We have considered lifetime and past-year prevalence rates in order to create two SRO indexes based on the variety of offending (Sweeten, 2012). Also, we have divided these offenses into lifetime and past-year composite variables of offending based on two levels of offending seriousness, i.e., property offenses (vandalism, shoplifting, burglary, and stealing from someone or a vehicle) and violent offenses (group fights, carrying a weapon, robbery, and assault) (Doelman et al., 2021).

In the third section of our questionnaire, we included measures of social desirability and participants’ perceptions of privacy and anonymity. Social desirability was assessed using the Socially Desirable Response Set 5 (SDRS-5; Hays et al., 1989; Portuguese version by Pechorro et al., 2016). This is a five-item brief questionnaire (e.g., “I am always courteous even to people who are disagreeable”). Participants’ perceptions of privacy and anonymity regarding their participation in this study were assessed using two ancillary questions (“I wish I could have taken the survey in a more private place” and “I am confident that the answers I gave in this survey will never be linked with my name”, respectively) developed by Denniston et al. (2010). Independently of the experimental condition, all participants completed the third section of this questionnaire in a self-administered mode in order to reduce potential social desirability effects.

Procedure

Participants were recruited through the platform of exchanging course credits for participation in psychological experiments. Further, the researcher made a presentation at the end of several classes in order to recruit more participants to participate in exchange for course credits. Participants enrolled in the experiment through a doodle calendar and met the researcher in a classroom. Students were randomly assigned to one of the four experimental conditions and completed the experiment individually in a classroom in the sole presence of the researcher. Ethical approval for this experiment was provided by the Portuguese university’s Institutional Review Board. Data collection was carried out from March 2018 to May 2019.

In the classroom, the researcher obtained informed consents from the participants and explained that we were interested in studying how people responded to questionnaires about sensitive topics, that they would be answering questions on personal experiences such as offending, and that their participation in this experiment would take about 30 min. The researcher also stated that students’ answers were anonymous and that their participation was confidential and voluntary. Respondents who were interested in participating in the experiment signed the informed consent, which was archived next to others in order to ensure the anonymity of participants.

Students were then randomly assigned to one of the four possible experimental conditions (i.e., PAPI, CAPI, SAQ, and CASI). In the personal interview conditions, the interviewer read the questions appearing either on the questionnaire (i.e., PAPI) or on a computer screen (i.e., CAPI) to the participants, and the interviewer ticked/entered the response provided by the participants. Interviews were carried out by five researchers (three females) which were randomly distributed to the participants. In the self-administered conditions, after providing the instructions, the researcher would step back and the exact same questions appeared either on a questionnaire (i.e., SAQ) or on a computer screen (i.e., CASI), and participants completed the survey on their own. The computer-assisted conditions were carried out using Qualtrics software with the same questions as in the paper-and-pencil conditions.

Data analysis was developed using descriptive statistics, logistic regression models to test the impact of modes of administration and modes of data collection on offending prevalence, and negative binomial regression models to test the impact of mode effects on offending variety. For logistic regression models, effect sizes are reported as odds ratios (OR: 1.68 = small, 3.47 = medium, 6.71 = large; Chen et al., 2010). As for negative binomial regression models, effect sizes are portrayed as incidence rate ratios (IRR: 1.44 = small, 2.48 = medium, 4.27 = large effect size; Borenstein et al. 2009; Roos et al., 2019). Taking into consideration that our hypotheses suggested relationships in one specific direction (e.g., higher reports of offending in self-administered conditions), all statistical analyses were carried out using one-tailed tests. All statistical analyses were carried out using SPSS software.

In order to provide an alternative to the classical significance tests, we have carried out Bayesian statistical analysis. Bayesian analysis provides a further examination of our results, indicating whether or not there is evidence of the absence of mode effects while circumventing eventual limitations of not detecting mode effects when they actually exist (e.g., an underpowered study to detect small effects) (Dienes, 2014). When comparing the support for the alternative hypothesis compared to the null hypothesis (BF10) and vice-versa (BF01), we have considered the following rules of thumb: < 1 no evidence; 1–3 anecdotal evidence; and 3–10 moderate evidence (Lee & Wagenmakers, 2014). Bayesian analyses were carried out using JASP software.

Results

Descriptive analysis

Participants in this study were randomly assigned either to a face-to-face interview or to a self-administered survey condition, as well as either to a paper-and-pencil or to a computer-assisted mode of data collection. As illustrated in Table 1, the random allocation of participants within these experimental manipulations resulted in similar demographic characteristics. No statistically significant differences were found between these manipulations for participants’ age and sex, interviewers’ sex, economic status, and university class year. Further, the manipulation of both modes of administration and modes of data collection did not cause any significant effect on social desirability (Table 1). As for the ancillary questions, despite a larger prevalence of respondents in computer-assisted modes wished that they had taken this survey in a more private place (compared to paper-and-pencil modes), as well as a larger prevalence of participants in face-to-face interviews reported being confident about the anonymity of this study (compared to self-administered modes), the manipulations in this experiment had no statistically significant effect on participants’ perceived privacy and anonymity.

Table 1 Demographic characteristics by experimental manipulations

Regarding general descriptives of offending, 32.6% of participants (n = 59) reported committing at least one type of offense during their life-course, while 12.7% (n = 23) of our sample reported offending in the past year. Regarding offending variety, the present sample showed a mean number of types of lifetime offending of 0.57 (SD = 0.97, min = 0, max = 5). Male participants reported higher offending variety (M = 1.29, SD = 1.49) than females (M = 0.49, SD = 0.88). These differences were statistically significant (t(17.16) = -2.18, p < 0.05).

Modes of administration (Interviewer-administered vs. self-administered surveys)

Table 2 illustrates the effect of modes of administration on participants’ reports of offending behavior. Results for overall lifetime offending prevalence show that 29% of participants in the face-to-face interview condition reported at least one type of offending behavior during their lifetime, compared to a total of 37% prevalence of offenders in self-administered surveys. However, despite this difference between the two groups, the results were not statistically significant (OR = 1.44, 90% CI [0.853, 2.432]). When considering types of offending separately, small effect sizes were detected both for property and violent offenses (i.e., OR > 1.68). Present findings demonstrate that self-administration of the questionnaire resulted in a statistically significant increase in the prevalence of property offenses (OR = 1.78, 90% CI [1.013, 3.121]). Results for violent offenses followed a similar trend, where participants in self-administered modes (18.5%) reported a higher prevalence of offending compared to participants in personal interview conditions (11.0%), though not reaching statistical significance (OR = 1.84, 90% CI [0.908, 3.723]).

Table 2 Prevalence and variety of offending by modes of administration (lifetime offending on the top; past-year offending below)

Despite the low prevalence of past-year offending, our results identified small effect sizes, suggesting that self-administration had a small effect on participants’ reports of overall (OR = 2.61, 90% CI [1.213, 5.630]), property (OR = 3.06, 90% CI [0.956. 9.787]), and violent (OR = 2.27, 90% CI [0.785, 6.565]) offending, though the regression models only reaching statistical significance for overall offending. In the case of property offenses, participants in self-administered survey conditions were 3.06 times more likely to report offending in the past year than respondents in face-to-face interviews.

As for the offending variety, modes of administration caused an increased likelihood of reports of lifetime offending (Table 2). The incidence rate of overall offending in self-administered modes of administration was significantly higher than in interview modes (IRR = 1.66, 90% CI [1.099, 2.494]). These results were also found to be statistically significant with small effect sizes for both property (IRR = 1.70, 90% CI [1.024, 2.813]) and violent offenses (IRR = 1.99, 90% CI [1.062, 3.746]) over the life-course. Results for past-year offending followed the same pattern of higher disclosure of offending behavior in the self-administered conditions compared to interviewer-administered conditions (Table 2). These differences were statistically significant for past-year overall offending (IRR = 2.74, 90% CI [1.356, 5.550]) and property offenses (IRR = 3.29, 90% CI [1.052, 10.298]). The results for past-year diversity of violent offenses followed the same pattern detecting a small effect size, although with very low offending scores, and, thus, results did not reach statistical significance (IRR = 2.47, 90% CI [0.875, 6.964]).

Modes of data collection (paper-and-pencil vs. computer-assisted surveys)

The manipulation of modes of data collection showed no statistically significant impact on lifetime or past-year offending (Table 3). Regarding lifetime prevalence of offending, results detected a small effect size for the impact of modes of data collection on property offenses, where a larger proportion of participants reported offending behavior in paper-and-pencil modes (30.9%) than in computer-assisted modes (20.7%), though this difference was not statistically significant (OR = 1.71, 90% CI [0.968, 3.023]). Results for lifetime prevalence of violent offenses, however, showed a different trajectory with slightly higher prevalence rates of offending under computer-assisted modes, again with no statistical significance (OR = 0.91, 90% CI [0.455, 1.835]). As for past-year offending, despite slightly higher reports of overall, property, and violent offending under paper-and-pencil modes, differences were not statistically significant (Table 3). However, these analyses detected a small effect size for mode effects on the respondents’ reports of past-year property offenses (OR = 2.25, 90% CI [0.704, 7.205]). Table 3 also illustrates the effects of modes of data collection on offending variety. Findings showed very similar scores of lifetime and past-year offending variety in paper-and-pencil and computer-assisted modes of data collection.

Table 3 Prevalence and variety of offending by modes of data collection (lifetime offending on the top; past-year offending below)

Bayesian analysis

In order to further analyze our results, we have carried out a Bayesian ANOVA that included the two main factors under study (i.e., modes of administration and modes of data collection) explaining lifetime and past-year offending diversity. Regarding lifetime offending, Bayesian analysis showed that the model including the main effect of modes of administration was consistently the best fitting model. The comparison analysis presented anecdotal evidence that the model including modes of administration was better at explaining our results than the null model (BF10 = 1.03). On the other hand, this analysis showed moderate evidence that the null model was better at explaining our findings than the subsequent models, especially the model including the main effect of modes of data collection (BF01 = 6.05). Similar results were found for past-year offending, where the model including the main effect of modes of administration was the best model, though only anecdotally outperforming the null model (BF10 = 1.11), while the null model was moderately better at explaining our results than modes of data collection (BF01 = 8.69).

Discussion

Self-reports are the most widely used measurement method in the study of offending behavior. Subject areas such as the study of the causes of delinquent behavior are heavily reliant on this methodology, making conclusions about delinquent behavior limited by the measurement technique. However, the lack of methodological research on SRO generates doubt about the quality of self-report measures, as well as the best ways to administer questions about offending behavior. This article provides evidence from a methodological experiment with undergraduate students from a Portuguese University. In this experiment, we have tested the effects of modes of administration (i.e., face-to-face interviews vs. self-administered surveys) and modes of data collection (i.e., paper-and-pencil vs. computer-assisted surveys) on SRO. In this 2 × 2 factorial design experiment, participants were randomly assigned to one of the four experimental conditions and were asked to disclose whether they have committed offending behavior over their lifetime and past year.

Offending behavior is a highly sensitive topic that generates concern about socially desirable answers and poses the threat of responses being disclosed to other people outside of the study or even fear of legal repercussions. Therefore, taking into consideration the evidence available in the literature on sensitive questions, self-administration of offending questionnaires is expected to result in higher rates of self-disclosed offending behavior compared to interviewer-administered conditions, where participants are requested to disclose their offending practices to a third person. An experimental approach is required to clearly demonstrate the impact of modes of administration on collecting such sensitive information (Tourangeau & Yan, 2007). In the present experiment, we aimed to provide evidence regarding the best practices of administering questions on offending, in order to improve the quality of SRO data.

In line with our initial hypothesis, the present results showed that participants in self-administered conditions were more likely to report offending behavior than participants in face-to-face interviews (see Fig. 1 for a summary of findings). Results showed that participants who were asked to complete the survey in a self-administered mode had a 66% increase in the rate of disclosing lifetime offending behavior compared to participants in interviewer-administered conditions. This mode effect was even higher for past-year offending, where respondents in the self-administered mode reported an increased rate of disclosing offending behavior by 2.61 times, compared to participants in interviewer-administered conditions. The evidence for the presence of mode effects found in this experiment is in line with the general literature on sensitive questions (e.g., Richman et al., 1999; Tourangeau & Yan, 2007). Requesting someone to disclose sensitive behavior to a third person, compared to completing a survey on their own, is expected to increase social desirability effects and, thus, influence participants’ willingness to disclose embarrassing and criminal behavior (Bradburn et al., 1979; Tourangeau et al., 2000).

Fig. 1
figure 1

The effect of modes of administration (left) and modes of data collection (right) on lifetime and past-year offending variety (error bars are 90% confidence intervals)

However, contrary to the literature on modes of administration, the mode effects found in this experiment were statistically significant despite the absence of differences in participants’ social desirability or perception of privacy and anonymity. Social desirability was only slightly higher in face-to-face interviews, as was the wish to have taken the survey in a more private place (with no statistical significance). As for the participants’ perception of anonymity, respondents in face-to-face conditions reported slightly higher confidence that their names would never be linked to their answers, also with no statistical significance. This finding seems to be contradictory to the deliberate misreporting hypothesis (Bradburn et al., 1979; Tourangeau et al., 2000), where self-administration is expected to provide greater confidence in the study’s assurances about anonymity. One potential explanation for this finding may be linked to our sample. University students may be used to completing surveys and may be aware of the ethical issues involved in carrying out research and be confident that the researcher will treat their answers carefully. Nevertheless, despite similar social desirability, anonymity, and privacy throughout the manipulated modes of administration, our results were still able to detect the presence of mode effects, in which participants in self-administered modes reported higher rates of offending than participants in face-to-face interviews. Therefore, it seems that the benefits of self-administration in improving rates of disclosing offending behavior in this study go beyond the factors of social desirability, anonymity, and privacy. More research to understand the mechanism through which self-administration causes an increased rate of reporting sensitive behavior is needed.

As for the manipulation of modes of data collection, results contrasted with our second hypothesis. According to the literature, we hypothesized that computerized modes would elicit higher rates of reporting offending behavior. However, modes of data collection showed generally no effect on reports of offending behavior, both over the life course and past year. Only in the case of property offenses, and despite the lack of statistical significance, our findings were able to detect small effect sizes in favor of higher rates of property offending in paper-and-pencil conditions, compared to computer-assisted modes. Bayesian statistical analyses provide additional support to the null results of modes of data collection, providing evidence suggestive of the lack of mode effects caused by computer vs. paper-and-pencil on respondents’ willingness to disclose offending behavior. The present findings are somewhat contrary to the body of evidence from the research on sensitive topics (e.g., Gnambs & Kaspar, 2015; Richman et al., 1999), as well as from studies including offending questions (Gomes et al., 2019), where reports of sensitive behaviors are expected to be higher in computer-assisted modes. However, multiple studies have found results where behavioral reports are unaffected by modes of data collection (e.g., Baier, 2017; Hamby et al., 2006; Knapp & Kirk, 2003; Lucia et al., 2007; Trapl et al., 2013). This adds to the already inconsistent body of knowledge regarding the effects of modes of data collection on participants’ willingness to provide truthful answers, and more research on the moderators of this relationship is needed.

Further, in the present experiment, participants reported very similar social desirability in both paper-and-pencil and computer-assisted modes of data collection. Also, no statistically significant differences were found for participants’ perceptions about anonymity and privacy, although a somewhat higher proportion of participants in computer-assisted conditions wished that they had completed the survey in a more private place. These results are inconsistent with findings in the study of Denniston et al. (2010) that found less perceived privacy in computer-assisted modes compared to traditional paper-and-pencil modes, as well as with the study of Trau et al. (2013) that have suggested that computer-assisted modes increase participants’ confidence in the study’s anonymity, demonstrating once again the inconsistency of findings in the experiments comparing computer-assisted modes of data collection to the traditional paper-and-pencil questionnaires.

Limitations

Some limitations of this study need to be discussed. First, the sample in this experiment consisted of university undergraduate students. The prevalence of offending among university students is expected to be low, especially for more serious types of offenses. The low prevalence of offending may have limited our capacity to detect mode effects because, in many cases, participants did not commit these behaviors. Also, taking into consideration that the serious offenses, as well as the most recent offenses, are regarded as the most sensitive questions (Gomes et al., 2022), the low prevalence of these serious and violent types of offenses may be a limitation to our study. Second, our samples were mostly composed of female participants. This sample characteristic may affect the generalizability of the present findings. Also, similar to the previous limitation, offending behavior is less prevalent within female participants, which may limit even more our ability to detect the impact of mode effects. Third, the current sample size (N = 181) limits our ability to detect small mode effects, and larger samples would be preferable. However, the fact that we have detected evidence for the beneficial effects of self-administration in the reports of offending behavior in this experiment is a strong indication that SRO questionnaires are affected by mode effects. On the other hand, one might question whether this study failed to show the impact of modes of data collection due to data insensitivity (i.e., the inability to distinguish the null hypothesis from the alternative hypothesis (Dienes, 2014)). In this regard, Bayesian analysis provided additional reliability to our findings because Bayes factors allowed us to determine that present non-significant results are in support of the null hypothesis (Dienes, 2014). Further, taking into account that the support for the null model tends to increase with sample size, the Bayesian analysis gives us extra confidence that the absence of effects of modes of data collection is not due to study insensitivity (Dienes, 2014). Future studies should consider these limitations and carry out similar experiments with larger sample sizes with younger participants, from multiple backgrounds, in order to provide a larger variability of the offending variable. This would allow us to test for mode effects on more recent and more serious types of offenses, as well as to test whether the benefits of modes of administration increase with more sensitive offending questions.

Conclusions

Findings from this study showed that SRO behaviors are affected by modes of administration. Asking questions about offending behavior in self-administered conditions results in increased odds of participants’ disclosure of offending behavior when compared to face-to-face interviews. Therefore, researchers using questionnaires to assess SRO should consider using self-administered modes of administration in order to increase measurement accuracy. To our knowledge, this is the first study to demonstrate the impact of self-administration on SRO, providing important information to improve the accuracy of self-report methodology to assess offending behavior and reconcile the literature on SRO and general survey research by showing how SRO, as well as other types of sensitive questions, are subject to self-administration bias (Gomes et al., 2019, 2022).

As for the effect of modes of data collection, results from this study show that asking questions using paper-and-pencil questionnaires or computer-assisted surveys resulted in mainly similar results. Further, this experiment showed that participants in paper-and-pencil and computer-assisted conditions reported similar levels of perceived anonymity and privacy. More research on the impact of modes of data collection on SRO is needed, especially considering the gradual transition into more computerized methods and the added advantages of computer-assisted modes in reducing costs, human resources, and overcoming the limitations caused by illiteracy.