1 Introduction

Participants of economics experiments differ in the number of experiments they have participated in. There are both inexperienced subjects who are participating for the first time and experienced subjects who have taken part in several experiments already. So lab experience is a background variable that may differ across subjects.

There are reasons to expect the number of lab visits to be correlated with subjects’ behavior (see Cleave et al. 2013; Guillén and Veszteg 2012; Matthey and Regner 2013, and our literature survey below). First, during their previous lab visits experienced subjects may have learned about the way people decide and may, therefore, behave differently. Second, it is also possible that there is a selection bias: lab experiments might be particularly attractive to subjects with certain preferences. Since participants self-select to become a repeat visitor, this would suggest that the preferences of experienced subjects differ from those of inexperienced subjects who do not return to the lab. In either case, it is not unlikely that the behavior of participants in economic experiments may be sensitive to the number of previous lab visits.

Now, even if the number of participants’ previous lab visits is correlated with behavior, this would not be worrisome as long as the recruitment process is random. An implicit presumption underlying the recruiting of economics experiments is that, even if a prognostic factor—say, participant gender—matters this would be irrelevant because there is no reason to assume that participants of different genders are allocated non-randomly across sessions or treatments. Accordingly, there would be no need to control for background variables, even if relevant, provided the experimenter randomly allocates subjects to treatments.

Experimental experience, however, is a factor that may well not cancel out in the recruitment process. Subjects typically participate only once in each experimental study and often subjects compete for participation. Repeat visitors are familiar with this competition and may sign up more quickly than inexperienced subjects. After several sessions, the pool of experienced subjects may get depleted and so experienced subjects are likely to be allocated to the first sessions or treatments conducted a disproportionate number of times, and inexperienced subjects may appear relatively frequently in additional sessions and treatments conducted at a later phase of the research. Put simply: there may be a recruitment bias.

In this paper, we investigate the behavioral differences between experienced and inexperienced participants and we also analyze how subjects with different levels of experience register for different sessions in an online recruitment system. Our study analyzes the impact of subjects’ laboratory experience across a number of games. Specifically, we compare experienced and inexperienced participants for four one-shot, two-player games (trust game, beauty contest, ultimatum game, and traveler’s dilemma) and two individual decisions (lying task and risk preferences). We then analyze recruiting behavior in two unrelated experiments.

Our results are as follows. By and large, we find that the behavior of inexperienced vs. experienced subjects does not differ greatly. There are differences in the trust game: experienced subjects behave significantly more selfishly as second movers and they also trust less often. Further, experienced participants submit fewer non-monotonic strategies in the risk elicitation task. Importantly, no other differences can be found. We also find evidence for a potential recruitment bias. In the experiments we tracked, indeed in early recruitment waves relatively fewer inexperienced subjects register for the experiment. We conclude with several measures that experimenters can take to avoid this bias.

2 Related literature

A study related to the present paper is Matthey and Regner (2013). They gather data from existing dictator, ultimatum, and trust experiments (published elsewhere) and regress the observed decisions with lab experience. They find that repeated participation in experiments is negatively correlated with subjects’ giving decisions. The authors also analyze subjects’ answers in post-experimental questionnaires and conclude that the differences in subjects’ behavior are more likely to be driven by learning aspects than by a selection bias. Our analysis differs from theirs in that we investigate multiple decisions with the same cohort of subjects specifically invited for this purpose.

There is also some research on the decision to return to the lab. Guillén and Veszteg (2012) find that those who earned more in previous experiments are more likely to participate in future experiments. Similarly, Casari et al. (2007, p.1295) find that successful players are more likely to return to the lab for a follow-up study. One implication of these studies is that experienced subjects may be less generous compared to subjects with fewer lab visits. The reason is that selfish behavior during experiments will often lead to higher monetary payoffs. While this implication is in line with Matthey and Regner (2013), this line of research seems to favor the existence of a selection bias over learning aspects as an explanation for the results.

A third related literature analyzes whether the preferences of subjects voluntarily participating in experiments are representative of the population from which the participants are drawn. Cleave et al. (2013) conduct a large-scale classroom experiment and compare the preferences of those who eventually participate in lab experiments to those who do not. They do find some moderate differences in pro-social behavior (those who sent less in a trust game were more likely to participate in a laboratory experiment) and that overall risk preferences do not differ between participants and non-participants. Falk et al. (2013) show for a large population of undergraduate students that a non-lab donation decision does not explain whether students participate in at least one experiment. Regarding non-monetary decisions, Slonim et al. (2013) find a bias toward more pro-social participants. They find that lab participants have a lower income, more leisure time, more interest in economics and lab activities as well as more hours of volunteering.

3 Experimental design, procedures, and hypotheses

3.1 Design

Participants play four one-shot two-player games and two single-player tasks in a within-subjects design. Subjects play a trust game (both roles), a beauty contest, an ultimatum game (both roles), and a traveler’s dilemma. Two individual decision-making tasks were also included: a lying task and an elicitation of risk preferences. In total, there were eight decisions. Participants made decisions in the order shown: trust game, beauty contest, ultimatum game, traveler’s dilemma, lying task and risk preferences.

Only one of the games or tasks was randomly selected at the end of the experiment to be payoff-relevant. If one of the two player games was chosen, then two subjects would be randomly matched and randomly assigned to each of the two roles. The payoffs were then determined according to the rules of the game.

  • Our two-player trust game (henceforth TG) is a simplified two-action version where the first mover can trust or not trust (we used neutral labels in the instructions). If the first mover does not trust, both players receive €5. If the first mover trusts and the second mover is trustworthy, they each get €7. If the first mover trusts and the other player exploits, they earn €3.50 and €8.50, respectively. The game-theoretic prediction is (do not trust, exploit). Subjects had to play both roles in our experiment.Footnote 1

  • The beauty contest (BC) (Nagel 1995) was conducted as a two-player game (Grosskopf and Nagel 2008). Both players had to name an integer in [0, 100] and the number closest to two-thirds of the average of the two numbers would win €10. In a two-player BC, the player with the lower number wins and choosing zero is the weakly dominant action, so (0, 0) is the unique Nash equilibrium.

  • The ultimatum game (UG) (Güth et al. 1982) had a €10 pie. The proposer makes an offer €s to the responder, keeping €10–€s for himself. If the responder rejects, both players earn zero; if she accepts, players earn €10–€s and €s, as proposed. The responder decisions were (integer) minimum acceptable offers. The prediction for the UG is (€10, €0).Footnote 2 As in the TG, subjects decided both as proposers and responders.Footnote 3

  • Our traveler’s dilemma (TD) (Capra et al. 1999) is a two-player game where subjects claim an integer eurocents amount between 500 and 1000. Let \(n_i\) indicate player i’s claim and \(i \in \left\{ 1,2\right\}\). The payoffs for players i and j in our TD are \(n_j-75\) and \(n_j+75\) if \(n_i>n_j\), \(n_i+75\) and \(n_i-75\) if \(n_i<n_j\) and \(n_i=n_j\) otherwise. In the unique Nash equilibrium of the game, \(n_1=n_2=\) 500c.

  • The lying task (LT) is a single-player decision where each subject is equipped with a die (Fischbacher and Föllmi-Heusi 2013). Subjects are told to throw the die once and to keep in mind the number they threw. In our variant, subjects who entered a six got €10 otherwise they got €0. Subjects who do not face a cost of lying should accordingly report a six.Footnote 4

  • The risk elicitation task (RE) is the non-incentivized version of Holt and Laury (2002). Subjects have to choose between two lotteries, option A or option B (see the table in the online appendix). The switching point in the table indicates the degree of risk aversion of subjects, the further down subjects switch from option A to option B the more risk averse they are. An expected utility maximizer has a single switching point and risk-neutral subjects should choose the safe option four times.

3.2 Procedures

We do not have treatment variants, instead, we are interested in differences in behavior between experienced and inexperienced participants. As a consequence, we decided to invite participants non-randomly. Without a targeted recruitment wave, inexperienced subjects might not be so frequent in any experimental subject pool, so we had to make a specific appeal for inexperienced subjects. The inexperienced subjects had voluntarily registered for our laboratory before. For the experienced subjects, it is an open question as to which level of experience may count as significant lab experience (see Sect. 4.2 below, though). We decided to invite subjects with 10 or more visits to the lab. As experienced subjects they had substantial lab experience, but they had not participated in any of the experiments we use for this study. In other words, they had no game-specific experience.

Both experienced and inexperienced subjects were invited using ORSEE (Greiner 2015). Upon arrival participants were randomly assigned to a closed cubicle and were given general instructions. Right before the decisions for a game were made, participants were provided with game-specific instructions.Footnote 5 To make sure that subjects understood the game, participants had to answer two questions correctly before they made their decisions in the UG and TD, respectively. Subjects were allowed to ask questions in the privacy of their cubicles. At the end of the experiment, participants were given feedback on their decisions. Furthermore, they were told which of the five games was relevant for their earnings. The experiment was implemented using the software z-Tree (Fischbacher 2007).

Sessions were conducted in the DICElab of the University of Düsseldorf between November 2013 and October 2014. We conducted a total of six sessions. Four sessions were mixed with both experienced and inexperienced subjects. Two sessions consisted exclusively of either experienced or inexperienced subjects. In total, 98 participants took part in the experiment, with 48 experienced and 50 inexperienced subjects. Sessions lasted about 45 minutes and earnings were, on average, €12.26 including a lump-sum payment of €7.

3.3 Power analysis

For each of our games, we determined the minimum effect size which can be detected given our sample size of 48 and 50 subjects, respectively. It is based on a significance level of \(\alpha =0.05\) and power \(1-\beta =0.8\). For BC, UG, TD, and RE, we calculated the Cohen’s d assuming a two-tailed normal distribution. The minimum effect size we can detect in these games is \(d=0.59\). For the binary decisions in TG and LT, we calculated the odds ratios OR and take proportions reported in Bolton et al. (2004) and Garbarino et al. (2016) as references. In the first stage of the TG the minimum effect size is OR = 3.01 and in the second stage we also obtain OR = 3.01. In LT the minimum effect size equals OR = 5.14.

How do our minimum effect sizes compare to the effect sizes found in the literature? Bolton et al. (2004) get an effect size of OR = 8.89 in the first and OR = 12.66 in the second stage of their TG. Grosskopf and Nagel (2008) report \(d=0.49\) for their BC. Andersen et al. (2011) obtain \(d=1.03\) for UG1 and \(d=0.72\) for UG2. Capra et al. (1999) get \(d=1.65\) for their TD. Garbarino et al. (2016) find OR = 3.41. We conclude that our sample size is sufficiently large to detect effect sizes typically found in the literature.Footnote 6

3.4 Hypotheses

From Guillén and Veszteg (2012) and Matthey and Regner (2013), we expect experienced subjects to be more selfish than inexperienced subjects. Put differently, experienced subjects are greedier or suffer less disutility from advantageous payoff inequality. This translates into several testable hypotheses in our setting. Experienced subjects will exploit more often in TG2 and they will make weakly lower offers in UG1 (provided they maintain the same beliefs about second movers as inexperienced subjects). Fehr and Schmidt (1999) argue that advantageous payoff inequality and disadvantageous payoff inequality will be correlated. If so, experienced subjects will state lower minimum acceptable offers (maos) in UG2. Finally, in the LT, the gain from lying will exceed its cost more often for greedier (and hence experienced) subjects. We hypothesize:

Hypothesis 1

Experienced subjects are greedier than inexperienced subjects: they (1) are less trustworthy in the TG2, (2) offer less in UG1 and state lower minimum acceptable offers in UG2, and (3) report more sixes in LT.

Guillén and Veszteg (2012) analyze the decision to return to the lab and find that better earners in previous experiments are more likely to participate in future experiments. Our experienced subjects should accordingly be those who earned more in previous experiments whereas at least some of our inexperienced subjects, namely those who will not return, should be bad at money-making. We thus hypothesize that experienced subjects make more decisions consistent with (ex-ante or ex-post) payoff maximization.

Hypothesis 2

Experienced subjects are better at picking payoff-maximizing actions: (1) they bid lower numbers in the BC, and (2) they are closer to the payoff-maximizing choice in TG1, UG1, and TD.Footnote 7

4 Results

4.1 Instrument check

How do our results compare to similar experiments? Table 1 summarizes the data from our experiments, see the “overall” row. Our participants trust more (51%) than trustors in Blanco et al.’s (2014) baseline treatment (27.5%) but just as much as in their Elicit Beliefs (55%) and True Distribution (57%) treatments. Our degree of trustworthiness (57%) is almost the same as in their treatments (53–55%). In Grosskopf and Nagel’s (2008) two-player BC, student participants choose 35.57, close to our mean of 39.95. In the UG, we get an average offer of 4.16 and in the second stage a minimum acceptable offer of 2.95. Oxoby and McLeish (2004) obtain slightly lower averages with a pie of $10: their mean offer is $3.84 and their mean minimum acceptable offer equals $2.78. As for the TD, our overall average (898.96) should roughly compare to Capra et al.’s (1999) first period \(R=10\) or \(R=20\) data. When we scale their TD data to our action space, we obtain a near perfect match for their \(R=10\) treatment (886.6). However, the comparison to their \(R=20\) treatment is not as close (712.1). From the 58% of our participants who reported a six in the LT, following Garbarino et al. (2016), we can calculate the rate of lying. An expected share of 49.68% of our participants were dishonest [with a 95% CI of (45.3, 53.95%)] which is high compared to averages of 25.56% from the literature as reported in Garbarino et al. (2016).

Table 1 Results. We have 48 observations for experienced subjects, 50 for inexperienced subjects. Standard deviations of non-binary variables are reported in parenthesis. TG1 indicates the share of first movers who trust and TG2 the frequency of second-mover trustworthiness. UG1 indicates the average proposer offer and UG2 the average minimum acceptable offer (“mao”) of the responder. LT indicates the share of participants who report a six. For RE, we count the number of safe choices, for risk-neutral subjects risk is equal to 4. We use exact tests (Mann-Whitney U tests for BC, UG1, UG2, TD and RE and Fisher’s exact tests for TG1, TG2, and LT) and report two-sided p-values

4.2 Decisions

Our first result is a null result in that often the behavior of experienced subjects and inexperienced subjects do not differ. Table 1 shows that there are no significant differences in BC, UG, TD, LT, and RE. In fact, there are no differences at all in LT and TD where choices differ only marginally compared to the standard deviation. Experienced subjects choose somewhat lower numbers in the BC and request more in UG2, but both of these differences are insignificant. Results for UG1 are in line with Hypothesis 1 (2) but, even if we considered one-sided p-values, the test would be only borderline significant at the 10 percent level.

Table 1 also shows that there are some significant differences. We obtain substantial differences for both stages of the trust game. Inexperienced subjects trust more in TG1 and are more trustworthy in TG2 compared to experienced subjects. Both differences are significant at \(p<0.01\) and \(p=0.01\), respectively (Fisher’s exact tests, two-sided p-values). The result on TG2 constitutes support for Hypothesis 1 (1).Footnote 8

As an aside, we find significant differences in the RE when it comes to monotonic decisions. As is well known, a substantial share of the subjects do not switch exactly once in RE and their decisions cannot easily be related to standard preferences. We find that while 96% of experienced subjects switch exactly once from Option A to Option B, only 80% of the inexperienced subjects do (Fisher’s exact test, \(p=0.03\), two-sided).

Result 1

(1) Experienced subjects trust significantly less and are (2) significantly less trustworthy. (3) They also submit significantly more often monotonic strategies. (4) There are no other significant differences.

Interestingly, within the cohort of experienced subjects, behavior is not correlated with previous lab visits. Our experienced subjects have participated in an average of 14.85 experiments (with a range from 11 to 23). The correlation coefficients of lab visits and experimental decisions are insignificant (all Spearman’s \(|\rho | <0.14\), all \(p > 0.34\)). In other words, once subjects have gained a certain level of experience (10 experiments, in our case), no more selection effects or changes in behavior due to experience appear to occur.

4.3 Payoff-maximizing actions

We now turn to Hypothesis 2 (1) and (2) which state that experienced subjects are better at finding payoff-maximizing actions. In the BC, the payoff-maximizing action, zero (the weakly dominant action), was chosen by one inexperienced subject (2%) and by two experienced subjects (4.17%). The relatively low rate of payoff-maximizing choices hardly suggests that experienced subjects play better, even though their rate is twice as high. A similar picture emerges in UG2: among the thirteen subjects who chose the payoff-maximizing mao of zero, there were even slightly more (seven) inexperienced subjects.Footnote 9 (The frequencies of payoff-maximizing actions TG2 and LT have already been reported above.)

For TG1, UG1, and TD, we can work out the ex-post payoff-maximizing actions (the optimal action given the choices of the other players). In TG1, it turns out that for risk-neutral players trusting is optimal: the cutoff for trust to pay is \(3/7\approx 0.43\) which is exceeded by our overall trustworthiness rate. Since our inexperienced subjects trust more often than the experienced subjects, the TG1 data do not support Hypothesis 2 (2).Footnote 10 In UG1, to offer the equal split (€5) is the ex-post payoff-maximizing strategy (as is the case in Blanco et al. 2011), see also Figure  5 in the online appendix. Again, inexperienced subjects make offers closer to the payoff-maximizing choice. Finally, for the TD, ex-post optimal payoffs are somewhat intricate to calculate and the expected payoffs are not monotone in claims. The best action turns out to be 994 which was not chosen by anyone (see again Figure  5 in the online appendix). Among the chosen bids, 989 was the best-performing action. Again, we do not find any impact of subject experience.

Result 2

Experienced subjects are not better at picking (ex-ante or ex-post) optimal actions.

4.4 Discussion

Why do experienced and inexperienced participants differ? Can we find causal factors that explain the—arguably few—behavioral differences?

Obviously, experienced subjects will be older than inexperienced subjects and they will also be more mature students. So can age or year of study be a driving factor? Among our participants, experienced subjects are indeed 2.5 years older and started their university studies 2.3 years earlier than the inexperienced participants. Both differences are significant (MWU, both \(p<0.01\), two-sided). However, when we correlate the TG1, TG2 and RE_mon data on age, we find no significant results (rank-biserial correlation, all \(p>0.66\)). The year-of-study variable significantly correlates for RE_mon (\(p=0.01\)) but this becomes insignificant in regressions also involving experience.

More experience could also be a proxy for fewer opportunities to have other sources of income. So experience could be correlated with lower income and this may then suggest that more experienced subjects might behave differently. We believe this hypothesis may have explanatory power, and it would be consistent with the findings in Slonim et al. (2013). Nevertheless, we note that fewer opportunities or a lower income may themselves be only proxies for other unobserved variables.

The notion that “lab rats” may have fewer other sources of income brings us back to the hypothesis that better earners are more likely to return to the lab. Unsurprisingly, given the above results, we do not find that experienced subjects earned more money than inexperienced players in our experiments (€5.66 vs. €4.88, MWU, \(p=0.32\), two-sided).

Addressing the question, however, of who among the inexperienced subjects returns to the lab and who does not (Guillén and Veszteg 2012) we find some surprising results. We can compare (somewhat non-systematically) the behavior of “returning inexperienced subjects” and “non-returning inexperienced subjects” since 12 of our 48 inexperienced subjects never returned to the lab. Non-returning inexperienced subjects earn €1.92 (36%) less than returning inexperienced subjects. Although the difference is insignificant (MWU, \(p=0.14\), two-sided), we note that this is a large difference and is consistent with Guillén and Veszteg (2012). Importantly, there are a couple of instances in which their behavior also differs. In the TG1, the non-returners trust even more often (92%) than those who return to the lab at least once (61%, Fisher’s exact test, \(p=0.07\), two-sided). Second, inexperienced subjects who never return are more risk averse (number of safe choices equals 6.25 in RE) compared to those who do return (4.74) (MWU, \(p=0.02\), two-sided).Footnote 11

5 Recruitment bias?

To demonstrate that recruitment procedures may be potentially biased when it comes to lab experience, we tracked the invitation procedure for two unrelated experiments at our lab. We document the share of (in-)experienced subjects (as defined above), plus subjects with an intermediate level of experience who participated in these experiments. The first experiment was conducted in two recruitment waves. A total of 236 participants signed up for one of 10 sessions. The second experiment comprised 12 sessions which were conducted in three recruitment waves. Here, 294 signed up for one of the sessions. Putting the two experiments together, the three waves had a total number of 140, 239, and 151 participants, respectively.

It turns out only 1.4% (2 of 140) participants in the first wave were inexperienced subjects compared to 5.9% in the second wave and 13.2% in the third. The share of experienced subjects is, by contrast, constant at a rate of about 20.0, 17.6 and 18.5%, respectively. The rest are subjects with intermediate experience. A 3\(\times\)3 Fisher exact test suggests that these proportions differ significantly (\(p<0.01\)).

We note that the bias we report is small in magnitude. The share of inexperienced subjects may vary across recruitment waves, but their overall impact on the decision data in an experiment would not be substantial when their share is as low as it is in our group of participants.

Result 3

There is a moderate but significant recruitment bias in that disproportionally few inexperienced subjects sign up for sessions of the first recruitment waves.

We interpret Result 3 such that a recruitment bias is possible. Together with our above results this implies that experimental research might be biased when experimenters do not control for lab experience.

Why can a bias occur even though the recruitment process is random (all subjects in the pool face an equal likelihood of receiving an invitation)? For a bias to occur, the composition of the subject pool must change over the course of the recruitment waves. This may concern the general subject pool or the pool available for a specific experiment (that is, those subjects who respond).

There are several channels of how the general subject pool composition may change. It obviously changes when there is a major number of new potential subjects signing up for the lab between waves. For our data, we can exclude this possibility. The composition will also change when the number of, say, experienced or intermediate subjects in the subject pool is limited. A bias will occur since the pool of experienced or intermediate potential subjects becomes depleted over the course of the recruitment waves. Such depletion will occur when inexperienced potential subjects respond to lab invitations more slowly than experienced subjects.Footnote 12 (Since all our sessions were fully booked, there actually was competition for participation.)

Potential subjects may also have different response behaviors at different recruitment waves. For example, (in-)experienced potential subjects might have systematic differences in their lecture and exam schedule. However, for our study we are unaware of systematic differences in lectures or exams. More to the point, inexperienced subjects may learn to respond more swiftly after failing to register. In contrast to experienced subjects, they can be naively unaware that there is competition for slots.

To further explore the recruitment bias, one would need to control for the shares of (in-)experienced people in the subject pool at the time of the recruitment waves. With such detailed information about the composition of the subject pool, one can analyze the percentage of inexperienced subjects who are invited who then attend vs. the percentage of experienced subjects who are invited who then attend the experiment. Such data may be more conclusive. Also helpful would be to protocol the exact sequence with which participants sign up for sessions. We leave these explorations for future research.

6 Conclusion

We analyze whether lab experience correlates with subjects’ behavior. Departing from the usual random invitation procedures, we specifically target experienced subjects—subjects who had participated in at least ten other experiments—and inexperienced subjects—those who had never participated in an experiment but had voluntarily signed up for participation. We find that experienced subjects behave differently compared to inexperienced subjects in the trust game: experienced subjects trust less often and they are less trustworthy than inexperienced subjects. We also find that experienced subjects submit significantly fewer non-monotone decisions in a risk-elicitation task (Holt and Laury 2002), although the risk attitudes themselves do not differ.

The observed correlation of lab experience and behavior in the trust game would not be a problem if subject experience was evenly distributed across sessions and treatments. Our research, however, also suggests that a bias in the composition of the experimental sessions is possible when subjects’ experimental experience is not systematically controlled for during the recruitment process. Inexperienced subjects sign up relatively less often for early sessions and more often for later sessions or treatments conducted during a later phase of the research.Footnote 13

We contribute to the literature in a number of ways. Consistent with previous work by Matthey and Regner (2013), we document a correlation of lab experience and behavior in the trust game. Second, we extend the analysis to other games including the traveler’s dilemma and the beauty contest, but find no differences in these cases. Taken together, this seems to suggest that experiments on fairness and/or pro-social behavior may be more prone to this kind of bias than other experiments.

Our study is also in line with the work by Cleave et al. (2013) who compare the social and risk preferences of students who eventually participated in a laboratory experiment to those who did not. The authors find that those participants who sent less in a trust game were more likely to participate in a laboratory experiment (the same applies to the amount returned, but this effect, although substantial, was not significant). Considering that our results in the trust game were particularly pronounced for inexperienced subjects who never returned to the lab, our results are supportive of Cleave et al. (2013).

Even though the recruitment bias and the behavioral differences between inexperienced subjects and experienced subjects are not particularly strong, we believe the following measures for experimental issues deserve more debate:

Recruitment procedures Experimenters may consider distributing their various treatments randomly or evenly across early and late experimental recruitment waves. This effect can be strengthened when more than one treatment is conducted in one session (a downside of which would be that the different sets of instructions cannot be read aloud) and when there are more smaller sessions rather than fewer larger sessions (with the downside than smaller sessions are less anonymous and not suitable for random matching). Smaller recruitment waves can avoid a bias when a time gap between those waves gives inexperienced subjects a chance to register.

Reporting recruitment procedures Experimenters may consider explicitly reporting lab experience, especially for experiments on fairness and trust. Reporting the magnitude and sequence of the recruitment waves may also be informative.

Classroom vs. lab Classroom experiments presumably involve a relatively large number of inexperienced subjects and should, therefore, probably not be compared to lab sessions (as suggested also by the results in Eckel and Grossman 2000).

Cap on lab experience Laboratories may consider imposing a cap on the maximum number of experiments participants may attend. We believe this is a useful measure, which we have also implemented, not least because with every additional experiment it becomes more likely that a participant will gain experience in a specific type of game (dilemma, fairness, coordination, etc.) Our data, however, suggest that this policy may miss the point when it comes to selection effects. These are likely to occur at rather early stages, namely, when subjects sign up for online recruitment systems in the first place (Cleave et al. 2013) and after they have conducted their first experiment(s) (Guillén and Veszteg 2012). This is suggested by our data of non-returning inexperienced subjects and by the fact that, once subjects have gained a certain level of experience, behavior is no longer correlated with previous lab visits.