Introduction

Looking at the natural world, we see objects of various colors and shapes that are either stationary or moving in different directions. How the human visual system merges local inputs into a global coherent perceptual scene is an important question in vision research. Numerous studies have investigated how local form features or local motion signals are combined to generate a coherent global motion or form percept (Allard & Arleo, 2021; Anstis & Kim, 2011; Cropper, 2001; Hoffman, 1980; Loffler et al., 2003; McKendrick et al., 2005). According to psychophysical studies, internal noise and sampling efficiency are two primary elements that influence the perception of global form and motion (Bogfjellmo et al., 2013; Falkenberg et al., 2002, 2014; Joshi et al., 2021; Simpson et al., 2003; Tibber et al., 2014). In the case of the visual system, internal noise is the limit of the visual system to detect cues in the absence of noise. Sampling efficiency refers to the ability to combine individual elements to create a global percept of the visual stimulus of interest (e.g., Dakin et al., 2005).

A method to quantify the internal noise and sampling efficiency is the equivalent noise (EN) analysis (e.g., Baldwin et al., 2016; Dakin et al., 2005; Tibber et al., 2014). The EN analysis has its roots in engineering – to test the level of noise inherent in a system, one introduces a very small, imperceptible signal, and begins to increase it. When this signal has increased to a level that it is just perceptible, this is the equivalent to the internal noise of the system itself. The concept of equivalent noise analysis dates back to the work of Barlow in the 1950s based on assessing detection performance. Barlow (1957) argued that the ability of the visual system to detect near-threshold stimuli was influenced by factors such as the number of photons absorbed by the eye as well as “dark light,” spontaneous neural activity (Barlow, 1957). In later work, there has been a tradition of using the “ideal observer” as a comparison for estimating the efficiency of the visual system (Geisler, 1989; Geisler, 2003). The “ideal observer” is a model of the physiological limits of the visual system, including such factors as photon noise, chromatic aberrations, effects of the lens, and eye movements (Geisler, 2003), as well as spatial summation from retinal ganglion cells (Banks et al., 1991). When these “pre-neural” factors are accounted for (as the “ideal observer”), the difference in performance between the real human performance and the ideal observer must be the result of later neural processing. In addition, pixel noise can be used in conjunction with the ideal observer analysis (Geisler, 2003).

The ideal observer (or ideal discriminator) analysis has been used in the case of various visual processes including detection (Barlow, 1957), blur (Watt & Morgan, 1984), and motion perception (e.g., Watamaniuk et al., 1989; Watamaniuk, 1993; Watamaniuk & Heinen, 1999; Watamaniuk et al., 2011). The efficiency of the human visual system is much lower than that of the ideal observer; however, in some cases (e.g., chromatic contrast sensitivity), the efficiency is relatively high compared to the ideal observer (Geisler, 1989). Efficiency compared to the ideal observer is not the same as performance, as if the task is impossible, the ideal observer also performs poorly, and so the human observer may be quite efficient in comparison, despite task performance being poor.

In the case of form and motion perception, the estimate of internal noise depends on the ability of the observer to detect the local variance in orientation or motion direction of the single elements forming a visual pattern (Dakin et al., 2005; Joshi et al., 2021). To estimate sampling efficiency, the ability of the system to pool visual information is tested. In other words, for the direction integration of drifting dots, internal noise would affect the precision of estimating each dot’s direction, whereas sampling efficiency refers to the number of over which such estimates can be averaged (Dakin et al., 2005). Furthermore, the direction or the orientation of the individual cues that form a visual pattern is derived by a Gaussian distribution with a prescribed mean and standard deviation, in which all the local cues are assigned with independent local directions or orientations along the mean of the underlying distribution (Watamaniuk & Sekuler, 1992). The sampling efficiency is estimated by manipulating the variance of a specific motion direction or the variance of stimulus orientation (Dakin et al., 2005; Tibber et al., 2014), and relies on global processing. In this case, sampling efficiency is not a result of the observer’s ability to rule out noise; rather, it is the consequence of the best strategy for integrating the orientations/directions of a visual stimulus. It is important to note that the “best” strategy in human visual performance is often not efficient when compared to a model of the physiological limitations (Geisler, 1989, 2003; Watamaniuk, 1993). A linear amplifier model of the EN can be utilized to derive the values associated with internal noise and sampling efficiency from the performance of participants (Pelli & Farell, 1999).

Internal noise estimates are thought to be relatively stable within an observer (Baker, 2013), but sampling efficiency varies between different tasks that rely on global form or global motion (Joshi et al., 2021). There are two types of visual stimuli that are useful to study local versus global processing of motion/orientation and form-motion interactions: Glass patterns (GPs) (Glass, 1969) and Random Dot Kinematograms (RDKs) (Ghin et al., 2018; Joshi et al., 2016, 2020, 2021; O'Hare et al., 2021; Pavan et al., 2019). Static GPs are made of single pairs of dots called dipoles and are used to estimate global form perception. The local orientations of dipoles give rise to various global configurations such as translational, radial, spiral, circular, etc. GPs can be either static or dynamic. Dynamic GPs, instead, are made of a succession of unique static GPs where each frame is independent and dipoles do not follow an exact trajectory across the frames; they only maintain constant global orientation (Donato et al., 2021; Krekelberg et al., 2003, 2005; Nankoo et al., 2012; Pavan et al., 2021; Pavan, Bimson, et al., 2017a; Pavan, Ghin, et al., 2017b). Although dynamic GPs do not have a dipole-to-dipole correspondence across frames, they do give the impression of motion along the dipoles’ orientation axis (Joshi et al., 2020; Ross et al., 2000). Some studies call the percept of motion generated by dynamic GPs implied motion (Joshi et al., 2020; Krekelberg et al., 2003, 2005). However, in psychophysical studies, the term implied motion usually refers to motion implied in still pictures or photographs (Friedman & Stevenson, 1975; Lorteije et al., 2006; Pavan et al., 2011; Yamamoto & Miura, 2012). To prevent misinterpretations, we refer to the motion elicited by dynamic GPs as non-directional motion (Donato et al., 2020, 2021; Pavan et al., 2021). To estimate directional motion perception, as opposed to non-directional motion evoked by form cues, RDKs consist of an array of moving dots and, unlike GPs, are not arranged into dot pairs (dipoles) (Donato et al., 2020; Rajananda et al., 2018).

There are important distinctions between form and motion perception. Local detection in static GPs is thought to take place in early visual areas, such as V1 (Dakin, 1997), and the global pooling thought to take place in V4 (Wilson & Wilkinson, 1998). In global motion, local detection is thought to take place in V1, with integration of motion in later stages, such as V2, V3, and hMT+ for translational and complex motion (e.g., radial and expansion/contraction motion) (Beardsley & Vaina, 1998, 2001; Furlan & Smith, 2016; Morrone et al., 1995). However, there is also interdependence of motion and form perception as in the case of dynamic GPs. Indeed, non-directional motion from dynamic GPs seems to use both form and motion processes. Some studies have reported that local processing in dynamic GPs (i.e., dipole orientation) takes place at the early level of the visual system (i.e., V1/V2) (Donato et al., 2020; Pavan et al., 2017a, b; Ross et al., 2000). There is also evidence of dynamic GPs activating higher visual areas, such as hMT+ (Pavan et al., 2017a, b). This is important as it shows that the visual system seems to process directional motion evoked by RDKs similar to dynamic GPs (Krekelberg et al., 2003, 2005), which is probably responsible for the illusory directional motion that observers report when they look at a dynamic GP (Joshi et al., 2021; Krekelberg et al., 2005).

Form and motion processing can be explored through the EN paradigm (Joshi et al., 2021). There is a simplified version of the EN paradigm, with only two sampling points, one at zero stimulus variance, and one at high stimulus variance. This simplified version has been used with children (Falkenberg et al., 2014; Manning et al., 2014), children with reading difficulties (Manning et al., 2022), children with autism (Manning et al., 2015, 2017), and in clinical populations such as those with amblyopia (Joshi et al., 2016), migraine (O'Hare et al., 2021; Tibber et al., 2014), and schizophrenia (Tibber et al., 2015). The simplified version is a benefit with these populations as it dramatically reduces the number of trials for the participant. However, these simplified EN paradigms using only high and low noise levels neglect intermediate points and fail to examine the entire form of noise dependence. In the current study, we investigated whether the simplified and the multisampling EN paradigms produce comparable outcomes for static translational GPs, dynamic translational GPs, and RDKs. In the multiple-point EN paradigm, we used eight staircases to monitor the minimal discriminable angle at a certain external noise level by manipulating the mean orientation/direction. In the simplified method we used two staircases as in previous work (e.g., Tibber et al., 2014): one that adaptively changes the mean orientation/direction to track minimum discriminable offset in the absence of external noise, and another that manipulates the standard deviation to track maximum tolerable noise level at a fixed mean orientation/direction. If the widely used simplified procedure is valid, then there will be comparable estimates between the two methods for form, non-directional and directional motion EN tasks.

Methods

Participants

Two of the authors (RD and SKY) and 11 naïve observers (seven females, mean age = 23.5 ± 1.9 years) took part in the experiment. All participants reported normal or corrected-to-normal visual acuity. The sample size was estimated a priori using G*Power (v3.1; Faul et al., 2009) and based on a repeated-measures ANOVA (within-subjects factors) to possibly detect a difference between procedures, stimulus type, and the interaction between main effects. Correlation was factored into the effect size using the SPSS option for effect size specification. Based on Joshi et al. (2021), assuming a large effect size f of .4 (Cohen, 2013) to achieve a power level of 95% (at the alpha level of .05) with one group of participants, and assuming sphericity, the power analysis suggested a sample of ten participants. This is for 27 experimental levels in total, consisting of eight noise levels for the 8-point procedure plus a data point (high-noise level) for the 2-point procedure (thus nine in total) × three stimulus patterns (RDK, static GP, and dynamic GP). We slightly extended the suggested sample size by including three additional participants, achieving a power of 99%. The final sample of 13 participants was also compatible with the previous reports of EN analysis (Ghin et al., 2018). The study was conducted in accordance with the guidelines of the World Medical Association (2013) and approved by the Human Research Ethics Committee of Bilkent University (Ethics Protocol No.: 2021.10.04.03). Prior to their participation, all participants signed written informed consent. The naïve participants received monetary compensation (50 TL) for completing the experiment.

Apparatus

Stimuli were generated using MATLAB with the Psychophysics Toolbox (Brainard, 1997; Kleiner et al., 2007; Pelli, 1997) and displayed on a 24.5-in. Dell Alienware AW2521HFL IPS monitor with a refresh rate of 60 Hz. The screen resolution was 1,280 × 1,024 pixels. Each pixel subtended 2.8 arcmin. The screen luminance was measured using a SpectroCAL photometer (Cambridge Research Systems, Rochester, Kent, UK). The stimuli were presented on a gray background (42.78 cd/m2). Observers sat in a dark room at a viewing distance of 57 cm from the screen. Viewing was binocular. Head movements were constrained by a chinrest.

Stimuli

Stimuli were RDKs and static and dynamic translational GPs (see Fig. 1a). All three stimulus patterns were composed of 500 white dots (diameter: 0.083°, luminance: 247.80 cd/m2, Weber contrast: 4.79). The ensemble of dots was presented at the center of the screen within a circular aperture with a diameter of 10.0° (density: 6.37 dots/deg2). A black fixation point (diameter: 0.3°, luminance: 0.31 cd/m2) and a black vertical line (length: 2.3°, width: 0.047°) were presented at the center of the screen. The vertical line served as a reference, and it was used to judge whether the overall pattern drifted or was tilted clockwise or counter-clockwise from vertical. Stimulus duration was 500 ms. In the RDKs, the dots drifted at a speed of 10.0 deg/s. Each dot followed a trajectory for a limited lifetime of 83 ms (i.e., five screen refreshes), after which it was randomly relocated within the circular window to prevent covert attentional tracking of the motion direction. The static GPs were composed of 250 pairs of dots (i.e., dipoles). The dipoles were formed by a linear geometrical transformation. The dipole distance was 0.2°. The dynamic GPs were generated by rapid sequential presentation of a set of independent static GPs (temporal frequency: 10 Hz). In the dynamic GPs, while the spatial arrangement of dipoles was altered in each six-frame cycle, the overall orientation was kept constant, evoking the perception of non-directional apparent motion along this orientation (Donato et al., 2021; Nankoo et al., 2012; Pavan et al., 2021; Ross et al., 2000). Directional or orientational noise was added to these stimuli as described below.

Fig. 1
figure 1

Stimulus patterns and added noise in the equivalent noise paradigm. a Three representative frames for each stimulus type. For illustrative purposes, one dot was enlarged in the random dot kinematogram to indicate the upper leftward drift. In the static Glass pattern, one dipole was enlarged to indicate the counter-clockwise orientation and the position of the dipoles was constant throughout a given trial. In the dynamic Glass pattern, one dipole in each frame was enlarged to indicate the counter-clockwise orientation, and the position of the dipoles was changed every six frames throughout a given trial. Please note, in the actual experiment no dot was enlarged. For demonstrative purposes, for all the stimuli the direction/orientation was set to 30° counter-clockwise with respect to the vertical reference, and the noise level was set to 0°. b Angle histograms illustrate the distribution of the 8-point procedure's motion directions/orientations for each noise level. The mean of the distribution (i.e., the global direction/orientation) was fixed at 120° (i.e., 30° counter-clockwise from the vertical) for easier comparison between the standard deviation values.

Equivalent noise paradigm

Visual sensitivity to global directional/real motion, form, and non-directional/apparent motion was measured using the EN approach. Global motion and form perception has been typically studied utilizing coherence tasks, in which random elements are introduced to the stimulus and the procedure is aimed at specifying the amount of these randomly moving/oriented elements that can replace the coherent ones without disrupting reliable discrimination (Britten et al., 1992; Grossman & Blake, 1999; Snowden & Kavanagh, 2006). There is clearly a distinction between signal and noise elements in the coherence tasks. Contrary to this approach, in the EN paradigm all the individual elements are defined as signal, as they contribute to the global motion/form. This is accomplished by assigning the direction/orientation of dots/dipoles based on a Gaussian distribution around a given mean value. In this case, the variability in direction/orientation is introduced by varying the standard deviation of the Gaussian distribution (Watamaniuk, 1993; Watamaniuk et al., 1989).

In the current study, we implemented two variants of the EN paradigm to estimate discrimination thresholds and, accordingly, compute internal noise and sampling efficiency parameters for the three types of stimulus patterns. One of these methods sampled multiple points over a range of external noise levels (Joshi et al., 2021) and the other one simplified the procedure to a high- and a zero-noise level (Ghin et al., 2018; O'Hare et al., 2021; Pavan et al., 2019; Tibber et al., 2014).

In the multiple sampling procedure (i.e., 8-point method), all the individual elements within each pattern were assigned with independent local directions/orientations along the mean of a circular Gaussian distribution. Mean motion directions or orientations were perturbated by varying the standard deviation of the underlying distribution. Specifically, the 8-point method employed staircases manipulating the mean direction/orientation to track the minimum discriminable angle at a given external noise level (σ = 0, 2, 4, 8, 16, 24, 32, or 40°; see Fig. 1b). The simplified 2-point method, on the other hand, employed two independent staircases, one adaptively changing the mean orientation/direction to track the minimum discriminable angle in the absence of external noise (σ = 0 deg), and the other manipulating the standard deviation to track the maximum tolerable noise level at a fixed mean orientation/direction.

Procedure

Figure 2 shows the trial sequence used in the experiment. Each trial started with a fixation point and a vertical line crossing the fixation marker presented for 1.0 s. The fixation screen was followed by the presentation of a RDK or a GP (either static or dynamic) with a duration of 0.5 s. The response screen was identical to the fixation screen and endured until a response was made plus for an additional intertrial interval of 1.0 s. The observers performed a two-alternative forced-choice (2AFC) global motion direction or orientation discrimination task. They indicated whether the moving dots/oriented dipoles drifted/were tilted clockwise or counter-clockwise from vertical. They were instructed to press the left arrow key if the overall pattern was perceived to drift/tilted counter-clockwise with respect to the vertical reference or the right arrow key if it was perceived to drift/tilted clockwise from vertical.

Fig. 2
figure 2

Schematic representation of a trial sequence. For demonstrative purposes, the stimulus is a static Glass pattern with zero noise (σ = 0°) and tilted counter-clockwise from vertical by 30° (μ = 120°). RDK Random Dot Kinematogram, SGP static Glass pattern, DGP dynamic Glass pattern, ITI intertrial interval

The observers completed four experimental sessions that were run on four non-consecutive days. Three sessions were allocated to the 8-point procedure, each session comprising one type of the stimulus patterns (i.e., RDKs, static GPs, or dynamic GPs). The remaining session was allocated to the 2-point procedure, including all types of the stimulus patterns. The order of sessions was randomized to avoid sequence effects.

The eight-point procedure was characterized by a staircase procedure running with a 1-up/3-down rule tracking the 79.4% discrimination threshold (Levitt, 1971; Wetherill & Levitt, 1965). In each block including a specific noise level, two interleaved and randomized staircases were administered, one starting from 30° clockwise from vertical and another starting from 30° counter-clockwise from vertical. Within each block, the external noise was kept constant at one of the eight levels and the mean direction/orientation was adaptively changed to track the minimum angle offset from vertical that can be reliably discriminated. The step size was initially 15° and it was decreased to one-half, one-quarter, and ultimately one-eighth of an octave for each subsequent reversal (i.e., 15.0, 7.5, 3.75, 1.875°). After the fourth reversal, the step size was fixed at 1.875°. The relatively large step size was chosen as this was a similar magnitude to the work by Joshi et al. (2021). Additionally, although previous work has shown thresholds of less than 2° including experienced psychophysical observers (Watamaniuk & Sekuler, 1992), the observers in the current study were largely inexperienced in terms of psychophysics and so the larger step size was used to better match their ability. Individual thresholds for each stimulus pattern can be seen in the Online Supplementary Material (OSM; Table S1). Each staircase terminated after either 100 trials or 20 reversals. The threshold was calculated as follows: in the case of the 8-point procedure, in which there are two interleaved staircases for each stimulus pattern, we gathered the reversals of both staircases, ordered them with respect to the corresponding trials, and then computed the average of the last six of them. Such a choice was motivated by the fact that some staircases (which are shorter than those of the 2-point procedure) exhibited only a limited number of reversals, happening early in the staircase, and reducing the step almost immediately. In these cases, the above-described computation spontaneously neglects these reversals, avoiding possible systematics in the threshold estimate. It should be noted that the staircases of one participant (S10) proved to be unusable, with very high thresholds and uncertainties of various orders of magnitude greater than the other participants in most of the conditions. Since this would have excluded the contribution of the participant from the calculation of the statistics (infinite uncertainty corresponds to zero weight), the participant was excluded from the analyses (see Fig. S1 in the OSM). For the 2-point procedure the threshold was calculated by averaging the last six reversals of the staircase. Each session in this procedure was preceded by a short practice consisting of 16 trials (μ = 30°, σ = 0°) to familiarize the participants with the relevant stimulus pattern and discrimination task. Each block within a session started with a 10-trial practice having the same noise level as in the relevant block. In the practice trials, 0.5-s long feedback was provided after the response, by turning the fixation point to green for correct responses and red for incorrect responses.

In the two-point procedure, for the first point (i.e., zero noise level) we used the output of the first point from the eight-point procedure. For the second point, which included noise, a staircase with a 3-up/1-down rule was used to vary the standard deviation of the Gaussian distribution, while the mean direction/orientation remained at 45° clockwise or counter-clockwise across trials. The initial noise level was 0° and the noise level was increased/decreased with a fixed step size of 5°. The staircase terminated after either 200 trials or 20 reversals, and the threshold was calculated by averaging all the reversals. At the beginning of each block, there were 16 practice trials (μ = 45°) with a noise level randomly chosen between 8° and 64°, in steps of 8°.

For both 8-point and 2-point procedures, 10 catch trials without external noise were randomly inserted in each block to make sure that the participants performed the task according to the instructions. The average accuracy in catch trials was 96% for both procedures. In addition, the mean accuracy scores were above 89% for each noise level within each stimulus type. The observers received no feedback on these trials. The order of blocks in each session was randomized and the direction/orientation (left or right) was randomly selected for each trial.

Data analysis

For each observer, discrimination thresholds estimated with the 2-point procedure were used to compute the internal noise (σint) and the sampling efficiency (η) estimates according to Ghin et al. (2018), where the EN parameterization:

$$ {\sigma}_{obs}=\sqrt{\frac{{\sigma^2}_{int}+{\sigma^2}_{\mathrm{ext}}}{\eta }} $$
(1)

is constrained by two threshold values: a zero-noise (at fixed σext = 0) data point, which represents the minimum directional offset from vertical that can be discriminated with no external noise, and a high-noise (at fixed σobs) data point, which represents the maximum level of noise (i.e., the directional standard deviation of the normal distribution of directions) that can be tolerated for a large directional offset. The rationale behind such choice is that two points with orthogonal uncertainties (fixed external noise and varying observed noise for the former, the opposite for the latter) are highly effective at constraining the two EN parameters, each dominant in the regime spanned by one of the data points. In fact, at zero noise, Eq. (1) becomes:

$$ {\sigma}_{obs,0}=\frac{\sigma_{\mathrm{int}}}{\sqrt{\eta }} $$
(2)

while at high noise, where σext, H ≫ σint, it becomes:

$$ {\sigma}_{obs,H}\simeq \frac{\sigma_{\mathit{\operatorname{ext}},H}}{\sqrt{\eta }} $$
(3)

and the system composed by Eqs. (2) and (3) can be solved in terms of the EN parameters giving:

$$ \eta =\frac{\sigma_{\mathit{\operatorname{ext}},H}^2}{\sigma_{obs,H}^2}\kern0.5em \mathrm{and}\kern0.5em {\sigma}_{\mathrm{int}}={\sigma}_{obs,0}\sqrt{\eta } $$
(4)

The uncertainties associated with the retrieved parameters can be propagated from the measured uncertainties on the data points, δσobs, 0 for the zero-noise point and δσext, H for the high-noise one. Such propagations read:

$$ \delta \eta =\frac{2{\sigma}_{\mathit{\operatorname{ext}},H}}{\sigma_{obs,H}^2}\delta {\sigma}_{\mathit{\operatorname{ext}},H}\kern0.75em \mathrm{and}\kern0.5em \delta {\sigma}_{\mathrm{int}}=\sqrt{\eta {\left(\delta {\sigma}_{obs,0}\right)}^2+\frac{\sigma_{obs,0}}{4\eta}\left(\delta {\eta}^2\right)} $$
(5)

The 8-point procedure was tackled instead with a best fit of the data points against the parameterization function. However, given its power-law behavior, a log-log version was used for the best fit:

$$ y=\mathit{\ln}\left(\sqrt{\frac{e^{2x}+{\sigma^2}_{\mathrm{int}}}{\eta }}\right) $$
(6)

which is equivalent to Eq. (1) if x =  ln σext and y =  ln σobs. Such choice resulted in a general increase in quality of the fits, from which the EN parameters were retrieved. The associated uncertainties were also obtained from the best fit. Figure 3 shows an exemplary comparison between a 2-point and an 8-point procedure for a single participant.

Fig. 3
figure 3

An exemplary representation of a 2-point procedure (red dots) and an 8-point procedure (blue dots), in terms of the logarithmic variables x =  ln σext and y =  ln σobs. The three panels represent a single participant's different stimuli (a: dynamic GPs; b: RDKs; c: static GPs). The single point to the left of the axis break represents the zero-noise condition, which is common to both procedures. The 2-point procedure data points display their associated measurement uncertainties, from which the uncertainties on the EN parameters will be propagated. For the 8-point procedure, the log-log best fit with the associated 95% confidence interval is displayed alongside the data points. EN equivalent noise, RDK Random Dot Kinematogram, SGP static Glass pattern, DGP dynamic Glass pattern

We used a different method of analysis compared to that of Joshi et al. (2021). We compared the effects of procedure (2-point and 8-point method), stimulus type (static and dynamic Glass patterns, RDKs), and their interactions on the internal noise and sampling efficiency estimates by fitting generalized linear models (GLMs) (Fox, 2003) on individual data, rather than group data. Specifically, the EN parameters pertaining to each observer were analyzed using a GLM with 'lme4' package (Bates et al., 2015). The analyses were performed using R (v4.2.1) (R Core Team, 2022). Input data to the model were weighted for the reciprocal of their standard deviation (1/σ). For internal noise (σint) and sampling efficiency (η), a Gamma distribution and identity link transformation function were used in the GLM model. We chose a Gamma distribution for the regression analysis because data were well approximated by a Gamma distribution and almost all the internal noise (σint) values fell into the Gamma quantiles, allowing us to deal with outliers without removing or transforming them (Zuur et al., 2010). We created five different models that included only the main effect of the stimulus (model 1), only the main effect of the procedure (model 2), both main effects (model 3), the interaction term only (model 4), and the main effects plus the interaction term (model 5). The best fitting model was selected using the estimators of prediction error AIC and AICc (i.e., the AIC with a correction for small sample sizes). In the case of the internal noise, the best fitting model was shown to be the one where there was only the effect of the procedure, not of stimulus, or any interaction. Therefore, the model output for internal noise has only estimates for the two procedures (see Fig. 4a). For the sampling efficiency (η) estimate, the best fitting model was the one including the main effects of stimulus and procedure, and the interaction; this can be seen in Fig. 4b. Outliers were identified using the median absolute deviation with a cut-off of 3 (Leys et al., 2013). The mean values and 95% confidence intervals correspond to the output of the GLMs. Predictions and partial residuals of the best fitting GLMs for internal noise and sampling efficiency are reported in the OSM (Fig. S2).

Fig. 4
figure 4

Results of the equivalent noise analysis (n = 12). The mean values and standard errors correspond to the output of the generalized linear models, not the raw data itself. a For internal noise estimates, the best fitting model included only the effect of procedure, and so the model output can be seen to vary over the procedures used (in degrees). b For sampling efficiency estimates the best fitting model included stimulus type, procedure, and the interaction term, therefore the model output contains estimates varying with both stimulus type and procedure. Error bars correspond to 95% confidence intervals

Results

Figure 4 shows the results of the Equivalent Noise analysis for both procedures. For internal noise (σint), a Shapiro-Wilk test showed that residuals were not normally distributed (W = .642, p < .0001), with a high positive skewness of 4.0 (SE = .282). We identified four outliers that were included in the analysis (σint > 19°). The best fitting model (with the lowest AIC and AICc) included only the main effect of the procedure (model 2) (see Table S2 in the OSM for model selection). However, the regression analysis did not reveal a significant effect of the procedure (χ2 = 1.294, df = 1, p = .2553). The parameters of the best fitting model are reported in Table 1. Predicted internal noise values with partial residuals for the two procedures are reported in the OSM (see Fig. S2A).

Table 1 Estimated coefficients of the generalized linear model fitted on internal noise data with weights

For sampling efficiency (η), a Shapiro-Wilk test showed that residuals were not normally distributed (W = .706, p < .0001), with a positive skewness of 1.877 (SE = .282). We identified eight outliers (9.325 ≤η≤ 12.695) that were included in the analysis. The best fitting model included main effects (stimulus type and procedure) and the interaction term (model 5) (see Table S2 in the OSMfor model selection). The regression analysis revealed only a significant effect of the stimulus type (χ2 = 29.324, df = 2, p < .0001), but not for procedure (χ2 = .005, df = 1, p = .94) or stimulus type × procedure interaction (χ2 = 1.838, df = 2, p = .398). Holm-corrected post hoc comparisons for the stimulus type revealed a significant difference between RDKs and dynamic GPs (padj = .01), between RDKs and static GPs (padj = .005), but not between the two GP types (padj > .05). The parameters of the best fitting regression model are reported in Table 2. Predicted sampling efficiency values with partial residuals for the stimulus patterns used are reported in the OSM (see Fig. S2B). Taken together, these results suggest that the two EN procedures produce similar results. Internal noise and sampling efficiency values for each participant and experimental condition are reported in the OSM (see Table S3).

Table 2 Estimated coefficients of the generalized linear model fitted on sampling efficiency data with weights

Discussion

The main aim of the current study was to compare the simplified (2-point) and multisampling (8-point) EN procedures to estimate internal noise and sampling efficiency, for global form, motion, and non-directional motion processing. The 2-point EN paradigm is commonly used with populations that will not necessarily tolerate long experiments, such as children (e.g., Manning et al., 2014; Manning et al., 2022) and clinical populations (e.g., O'Hare et al., 2021; Tibber et al., 2014, 2015), as it is faster and so less demanding in terms of fatigue, etc. The assumptions of the 2-point version are twofold: (1) at low levels of directional/orientational variability, the response will be limited by internal noise, and (2) at high levels of directional/orientational variability, the response will be limited by sampling efficiency. However, the shape of the function between these two points is unknown, due to the sparse sampling. The EN paradigm relies on the assumptions of a linear transducer with additive noise (linear amplifier model, LAM), which may not be always met, as in the case of contrast (Baldwin et al., 2016). Commonly, noise is added to RDKs by changing the proportion of signal-to-noise dots (e.g., Zanker, 1995). This creates the issue of spurious pairings, known as the correspondence problem (Dakin et al., 2005). In traditional RDKs, where some dots are signal dots (moving in a coherent direction) and others are noise dots (moving in random directions), false correspondences between dots would be a problem for determining the limiting factor of the local direction estimations (Barlow & Tripathy, 1997). This is partly mitigated in the current experiment because all dots are signal dots, and so this is analogous to the “zero-dimensional noise” proposed by Baker (2013) for investigating EN with a contrast masking paradigm. However, it should be noted that the possibility of spurious matches remains across dot frames, although this can be minimized by having a relatively large spacing between the dots and moving them by a small amount on each frame (Williams & Sekuler, 1984). Williams and Sekuler (1984) report that at around 0.1 deg/s the likelihood of spurious matches was very low, even at the highest end of the dot densities tested in their experiment. In the current experiment, the dot density was considerably higher than those tested by Williams and Sekuler (1984), although the speed is comparable at 0.167° per frame. Although the correspondence problem has been mitigated to some extent, the possibilities of other non-linearities cannot be ruled out, therefore it is beneficial to collect data across a range of sampling points, to assess the shape of the overall function. However, this is not always desirable when working with certain populations, and so it is important to validate the simplified procedure, as we have done here. We find good overall agreement when the simplified and multiscale versions are used in the same observers, showing that the simplified method is valid across a range of different patterns.

The first assumption of the EN paradigm is that under zero-noise conditions the limiting factor of performance will be the internal noise of the system itself. There are several potential sources of internal noise for example, photon noise, chromatic aberrations, pupil diffraction, eye movements (for reviews, see Geisler, 1989; Geisler, 2003). It might be expected that these pre-neural factors would be relatively stable across stimulus conditions in the current experiment, but the possibility of their effects cannot be ruled out. Joshi et al. (2021) state that in the case of form and motion perception, one of the main sources of internal noise could include local signal detection (Joshi et al., 2021), possibly in early visual cortex (V1/V2). The early visual system contains cells that respond to local orientation (Hubel & Wiesel, 1959) and local motion (Palmer & Davis, 1981). In the static GP, dipoles provide an orientation signal only, and so local orientation detector noise would be the primary factor contributing to internal noise. For moving stimuli, local motion detectors are thought to be the first stage of processing (Heeger et al., 1996), and so this should be the case for the RDKs in the current experiment. Differently, in the case of dynamic GPs, there is evidence that the visual system uses both local orientation and motion detectors (Edwards & Crane, 2007; Johnson & Wenderoth, 2011; Krekelberg et al., 2003; Pavan, Ghin, et al., 2017b). Specifically, for dynamic GPs, the dot lifetime of 100 ms is sufficient for the motion streak effect to occur (Geisler, 1999). Motion streaks are blurred lines left behind a rapidly moving object and are an index of form that affect direction judgments (Apthorp et al., 2009; Geisler, 1999). Geisler (1999) demonstrated that orientation and motion selective neurons in the early visual cortex V1 are both activated and interact to help the observer to extract form information that guides motion discrimination decisions. Therefore, in dynamic GPs, local form and motion detectors can be used together to detect the pattern’s ambiguous and illusory direction (Alais et al., 2010; Edwards & Crane, 2007; Johnson & Wenderoth, 2011; Krekelberg et al., 2003; Pavan, Ghin, et al., 2017b), therefore noise in both types of detectors contributes to internal noise.

In the case of internal noise estimates, the best fitting model is the one with only the procedure, although it must be noted that the effect of procedure was not significant. Despite the fact that the 2-point procedure tends to overestimate the internal noise parameter compared to the 8-point version, the two methods seem to produce similar results. This result is also in line with previous findings (Joshi et al., 2021) and suggests that at the local level, motion and orientation detectors are likely to be affected by approximatively the same amount of internal noise.

The second assumption of the EN procedure is that at high levels of directional variability the response will be limited by sampling efficiency. In the current experiment, we focus on spatial sampling efficiency as in previous work (e.g., Joshi et al., 2021), and as this is the most relevant for the static GPs. However, it must also be noted that particularly in the case of motion tasks, there are also temporal aspects to sampling efficiency (Donato et al., 2021; Snowden & Braddick, 1991; Watamaniuk & Sekuler, 1992) – for example, human observers average over varying speeds of RDK elements, indicating that there are also important temporal integration mechanisms (Watamaniuk & Duchon, 1992). In a set of experiments investigating efficiency compared to the ideal observer, Watamaniuk (1993) demonstrated a nine-frame asymptote for temporal integration for RDKs. In the current experiment, although a single dot had a limited lifetime of five frames, the overall RDK was displayed for 500 ms, which is considerably longer than the time needed for asymptote. Since temporal aspects were not systematically manipulated in the current study, an investigation in the temporal domain remains for future research. The best fitting model of our data had variable sampling efficiency estimates for the different task types, and this showed a similar pattern to Joshi et al. (2021). Specifically, we found that sampling efficiency for RDKs is higher than for either of the GPs. Sampling efficiency is thought to relate to the global pooling that is of interest (Dakin et al., 2005), and so this result suggests global pooling mechanisms are different in global form and motion processing. There is evidence that confirms this difference, for example Glass and Switkes (1976) observed that black and white dot pairs in the dipole destroys the overall perception of static GPs, but dot polarity is irrelevant for global motion detection from RDKs (Edwards & Badcock, 1994). Other studies showed that RDKs are easier to perceive than dynamic and static GPs (Donato et al., 2020; Nankoo et al., 2012). This distinction applies not only to simple configurations such as translational patterns but also to more complex configurations such as radial, circular, and spiral patterns (Donato et al., 2021; Nankoo et al., 2012). More interestingly, Nankoo et al. (2012) demonstrated not only that RDKs and GPs are perceived differently but also that dynamic GPs are processed more similarly to static GPs than RDKs. This finding could imply that dynamic GPs are processed first for their global form features and only subsequently for their global motion properties. Our results are in line with this evidence, indicating that global processing in RDKs and dynamic GPs is mediated by two distinct mechanisms: motion pooling on the one hand and form-motion integration on the other.

Although there is relatively good agreement overall, there are also differences in estimates of sampling efficiency depending on the procedure used. For the 2-point method, estimates of sampling efficiency are slightly higher for RDKs compared to the 8-point method. Higher sampling efficiency suggests enhanced pooling of information (Manning et al., 2014; Watamaniuk, 1993). Allard and Cavanagh (2012) show that greater pooling is a better strategy in more noisy environments. Improvement in performance (i.e., perceptual learning) for RDKs has been shown merely by exposure to the motion signal, without participants being aware of their improvement (Watanabe et al., 2001). Therefore, we hypothesize that participants could have developed better strategies through (implicit) learning for this class of stimuli. However, how internal noise and sampling efficiency are modulated by visual perceptual learning and the type of procedure for these classes of stimuli remain to be investigated in future research.

Direct comparisons of the exact values between the parameter estimates of Joshi et al. (2021) and the current study may not be possible, for various reasons, such as slight methodological differences. For example, slight differences in equipment, and differences in the number of participants and their level of experience on psychophysical tasks (there were 2/6 vs. 2/12 experienced observers for Joshi et al., 2021, and the current study, respectively). However, there was a similar overall pattern of results for the two studies, demonstrating the robustness of the findings.

In the current experiment, we used translational GPs, and translational motion for the RDKs. The choice of translational GPs in the current study was for equivalence with the RDKs. However, it is worth noting that the spatial summation for this type of GP is thought to be different compared to other spatial configurations, such as radial and concentric GPs (Wilson & Wilkinson, 1998). Additionally, there is a difference in how motion signals from translational motion are pooled compared to radial or circular directions in global motion perception (Freeman & Harris, 1992; Lee & Lu, 2010; Rampone & Makin, 2020; Seu & Ferrera, 2001), although speed of motion seems to influence this (Lee & Lu, 2010). Wilson and Wilkinson (1998) highlighted that these other configurations are important cues for other perceptual tasks in the real world, for example, radial motion is related to optic flow patterns that indicate self-motion and is thought to be pooled over a larger area compared to parallel motion. Therefore, it is worth investigating these different spatial configurations in future work.

In conclusion, we found an overall agreement between the 2-point and the 8-point EN paradigms in the same set of observers, indicating that the simplified version is a good measure. Our findings showed a similar pattern of results to Joshi et al. (2021), demonstrating the robustness of the EN paradigm. Furthermore, they showed that internal noise estimates are similar across stimuli evoking form, motion, and non-directional motion, indicating that they have a common limiting process, thought to be local signal detectors in early visual cortex (V1/V2). We also found variability in sampling efficiency estimates between the three classes of visual stimuli, consistent with Joshi et al. (2021), indicating that global pooling processes are different for form, motion, and non-directional motion.