Next Article in Journal
Remark on a Fixed-Point Theorem in the Lebesgue Spaces of Variable Integrability Lp(·)
Next Article in Special Issue
Calibration-Based Mean Estimators under Stratified Median Ranked Set Sampling
Previous Article in Journal
Correlation Filter of Multiple Candidates Match for Anti-Obscure Tracking in Unmanned Aerial Vehicle Scenario
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Uniform Consistency for Functional Conditional U-Statistics Using Delta-Sequences

1
LMAC (Laboratory of Applied Mathematics of Compiègne), Université de Technologie de Compiègne, 57 Avenue de Landshut CS 60319, CEDEX, 60203 Compiegne, France
2
Laboratory MAEGE, FSJES Ain-Sebaa, University Hassan II, Casablanca 20000-20200, Morocco
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(1), 161; https://doi.org/10.3390/math11010161
Submission received: 28 November 2022 / Revised: 16 December 2022 / Accepted: 21 December 2022 / Published: 28 December 2022
(This article belongs to the Special Issue Statistics for Stochastic Processes)

Abstract

:
U-statistics are a fundamental class of statistics derived from modeling quantities of interest characterized by responses from multiple subjects. U-statistics make generalizations the empirical mean of a random variable X to the sum of all k-tuples of X observations. This paper examines a setting for nonparametric statistical curve estimation based on an infinite-dimensional covariate, including Stute’s estimator as a special case. In this functional context, the class of “delta sequence estimators” is defined and discussed. The orthogonal series method and the histogram method are both included in this class. We achieve almost complete uniform convergence with the rates of these estimators under certain broad conditions. Moreover, in the same context, we show the uniform almost-complete convergence for the nonparametric inverse probability of censoring weighted (I.P.C.W.) estimators of the regression function under random censorship, which is of its own interest. Among the potential applications are discrimination problems, metric learning and the time series prediction from the continuous set of past values.

1. Introduction

The regression problem has been studied by statisticians and probability theorists for many years to keep up with the various problems and topics brought up by technological and computational advances, resulting in the creation of many advanced and complex techniques. Among the problems addressed are modeling, estimation method applications, and tests. In this paper, we are interested in nonparametric regression estimation. Unlike the parametric framework, where one must estimate a finite number of parameters based on a specified structural model a priori, nonparametric estimation does not require any specific structure; instead, it allows the data to speak for themselves. However, as natural drawbacks, nonparametric procedures are more susceptible to estimation biases and losses in convergence rates than parametric methods. Since their introduction, kernel nonparametric function estimation approaches have garnered a significant amount of attention; for references to research literature and statistical applications in this area, consult [1,2,3,4,5,6,7] and the references therein. Popular as they may be, they represent only one of the numerous methods for developing accurate function estimators. These include nearest-neighbor, spline, neural network, and wavelet approaches. In addition, these techniques have been applied to a vast array of data types. This article will focus on constructing consistent estimators for the conditional U-statistics for functional data based on the delta sequence. The theory of U-statistics and U-processes, which was initially introduced in the seminal work of [8], has garnered a significant amount of interest over the course of the last few decades as a result of the diverse applications to which it has been applied. U-statistics can be utilized to solve complex statistical problems. Among the examples are nonparametric regression, density estimation, and goodness-of-fit tests. Furthermore, U-statistics contribute to the study of estimators with various degrees of smoothness (including function estimators). [9], for instance, analyzes the product limit estimator for truncated data applying a.s. uniform bounds for P -canonical U-processes. [10] introduces two new normality tests based on U-processes. Using the findings of [11,12,13] provided new normality tests that used as test statistics weighted L 1 -distances between the standard normal density and local U-statistics that were based on standardized observations. These tests were used to determine whether or not the data was normally distributed. Ref. [14] worked on the estimate of the mean of multivariate functions under the premise of possibly heavy-tailed distributions and presented the U-based median-of-means. U-processes are also necessary for a wide variety of statistical applications, such as the examination of qualitative aspects of functions in nonparametric statistics [15,16] as well as establishing limiting distributions of M-estimators (see, for example, [17,18,19]). In [20] the authors consider the problem of detecting distributional changes in a sequence of high dimensional data by using the weighted cumulative sums of U-statistics stemming from L p norms. In [21], the authors proposed tests based on U-statistics for testing the equality of marginal density functions. In the paper [22], the following problem is considered: Is it possible, given a sample of random variables that are independent, identically distributed, and have a finite variance, to build an estimator of the unknown mean that performs almost as well as if the data were normally distributed? The argument that was presented in the previous work is based on a new deviation inequality for the U-statistics of order that is permitted to rise with sample size. This inequality is the most important part of the argument. The first asymptotic results for the scenario in which the underlying random variables are assumed to be independent and distributed in an identical fashion were presented by [23,24] and [8] (amongst others), respectively. In contrast, the asymptotic results under weak dependency assumption were demonstrated in [25], in [26] or more recently in [27] and in more general settings in [28,29,30,31]. The interested reader may refer to [17,32] for an excellent collection of references on U-statistics and U-processes. We also refer to [19] for a profound understanding of the theory of U-processes.
In the present work, we consider the conditional U-statistics introduced by [33], that can be considered as generalizations of the Nadaraya-Watson ([34,35]) regression function estimates. To be more precise, let us consider the sequence of independent and identically distributed random vectors { ( X i , Y i ) , i N * } with X i R d and Y i R d , d , d 1 . Let φ : R d k R denote a measurable function. Within the scope of this work, our primary focus is on the estimation of the conditional expectation or regression function, as follows:
r ( k ) ( φ , t ) = E φ ( Y 1 , , Y k ) ( X 1 , , X k ) = t , f o r t R d k ,
whenever it exists, i.e., E φ ( Y 1 , , Y k ) < . Now, we are going to present a kernel function K : R d R with support contained in [ B , B ] d , B > 0 , fulfilling:
sup x R d | K ( x ) | = : κ < a n d K ( x ) d x = 1 .
Ref. [33] established a new category of estimators for r ( k ) ( φ , t ) , called conditional U-statistics, that is defined for each t R d k to be:
r ^ ^ n ( k ) ( φ , t ; h n ) = ( i 1 , , i k ) I ( k , n ) φ ( Y i 1 , , Y i k ) K t 1 X i 1 h n K t k X i k h n ( i 1 , , i k ) I ( k , n ) K x 1 X i 1 h n K x k X i k h n ,
where:
I ( k , n ) = i = ( i 1 , , i k ) : 0 i j n a n d i j i r i f j r ,
is the set of all k-tuples of different integers between 1 and n and { h n } n 1 is a sequence of positive constants that, at a certain rate, converge to the value zero, n h n d k . In the particular case k = 1 , the r ( k ) ( φ , t ) reduces to r ( 1 ) ( φ , t ) = E ( φ ( Y ) | X = t ) and the estimator developed by Stute is now known as the Nadaraya-Watson estimator r ( 1 ) ( φ , t ) , given by:
r ^ ^ n ( 1 ) ( φ , t , h n ) = i = 1 n φ ( Y i ) K X i t h n / i = 1 n K X i t h n .
The work of [36] focused on the estimation of the rate of the uniform convergence in t of r ^ ^ n ( k ) ( φ , t ; h n ) to r ( k ) ( φ , t ) . In [37], the limit distributions of r ^ ^ n ( k ) ( φ , t ; h n ) are analyzed and compared to those produced by Stute. Under suitable mixing conditions, ref. [38] extend the results of [33] to weakly dependent data and uses their findings to evaluate the Bayes risk consistency of the corresponding discriminating rules.
As alternatives to the standard kernel-type estimators, [39] presented symmetrized nearest neighbor conditional U-statistics. This work has observed a major advancement because of the contributions of [40], where a far more strong version of consistency can be found; to be specific, uniform in t and in bandwidth consistency (i.e., h n , h n [ a n , b n ] where a n < b n 0 at some specific rate) of r ^ ^ n ( k ) ( φ , t ; h n ) . Additionally, uniform consistency is achieved across φ F for a suitably restricted class of functions F , extended in [41,42,43,44] and [45]. The key component of their findings is the local conditional U-process studied in [11].
The case of functional data is the primary focus of this research. We present an excerpt from [46]: "Functional data analysis (FDA) is a branch of statistics concerned with analyzing infinite-dimensional variables such as curves, sets, and images. It has undergone phenomenal growth over the past 20 years, stimulated partly by significant data collection technology advances that have brought about the “Big Data” revolution. Often perceived as a somewhat arcane area of research at the turn of the century, FDA is now one of the most active and relevant fields of investigation in data science”. For an introduction to the subject of FDA, we refer to the books of [47,48], which contain different case studies in economics, archaeology, criminology, and neurophysiology, as well as fundamental analysis techniques. It is important to note that the extension of probability theory to random variables with values in normed vector spaces (such as Banach and Hilbert spaces), in conjunction with extensions of certain classical asymptotic limit theorems, predates the recent literature on functional data. This fact can be demonstrated by tracing back the history of the subject (see for instance, [49]). Ref. [50] investigated density and mode estimation for data with values in a normed vector space. At the same time, he brought attention to the issue of the curse of dimensionality, which affects functional data, and he suggested potential remedies to the problem. In the context of regression estimation, ref. [48] considered the nonparametric models. We may refer also to [51,52,53]. Recently, Modern theory has been applied to the treatment of functional data. For instance, ref. [54] provided the consistency rates of several functionals of the conditional distribution, such as the regression function, the conditional cumulative distribution, and the conditional density, amongst others, uniformly over a subset of the explanatory variable. Other examples include conditional density, which is a measure of the density of the conditional distribution, and conditional cumulative distribution, which is a measure of the conditional cumulative distribution. [55] also investigated the consistency rates for some functionals nonparametric models, such as the regression function, the conditional distribution, the conditional density, and the conditional hazard function, uniformly in bandwidth (UIB consistency) extended to the ergodic setting by [56]. In the paper [57], the topic of local linear estimation of the regression function in the case when the regressor is functional was investigated, and the results indicated robust convergence (with rates) consistently across bandwidth parameters. In the work of [58], the k-nearest neighbors (kNN) estimate of the nonparametric regression model for heavy mixing of functional time series data was explored. Under some mild conditions, a uniform and practically perfect convergence rate of the k-nearest neighbors estimator was established. In the work [59], the authors offer a variety of solutions for limiting laws for the conditional mode in the functional setting for ergodic data; for some current references, see the following: [45,60,61,62,63,64,65].
We will consider a general method for functional estimation by using the delta sequences. Delta sequences (also called “approximate identities” or “summability kernels”) arise in a wide variety of subfields within mathematics. Still, the applications that pertain to the theory of generalized functions are likely the most significant ones. The regularization of generalized functions is the major application for delta sequences. The proposed methods generalize several nonparametric estimation methods, including the kernel estimators given in (2) of [33]. To be more precise, the broad class of delta-sequence estimators includes the histogram estimators, Chentsov’s projection estimators [66], and nearest-neighbor estimators, among others. Certain types of these sequences were already studied by [67], who called them “ δ -function sequences”. They established, among other things, the asymptotic unbiasedness and the asymptotic variance of estimators based on them but did not consider convergence rates. Ref. [68] obtained the rate of strong consistency and the rate of asymptotic bias for estimators associated with delta sequences arising from the Fejér kernel of the Fourier series. The delta sequence method of density estimation of [69] is extended to certain non-i.i.d. cases in [70], where it is assumed that the observations are taken from a stationary Markov process. Ref. [71] considered the delta-sequence estimator for the marginal distribution of a strictly stationary stochastic process satisfying some mixing conditions. In [72], the author investigated the local and global convergence rates of delta-sequence type estimators of the density function, its derivative, and its mode. Ref. [73] proved the uniform strong consistency of delta-sequence estimators. Ref. [74] partially generalized the usual nonparametric estimators of a regression function by using an estimator based on quasi-positive delta sequences. Ref. [75] considered a general nonparametric statistical curve estimation setting called the class of “fractional delta sequence estimators”. Ref. [76] used the delta method to investigate the correlation model. [77] looked at the problem of estimating the density function of functional data with values in an infinite-dimensional separable Banach space using the method of delta sequences; for further information, we can also look into [78]. The copula estimation using the delta sequences methods is considered in [79]. The problem of the nonparametric minimax estimation of a multivariate density at a given point by the delta sequences was investigated in [80]. Ref. [81] used the delta sequence to propose an essential application to the classification problem of the value of the discrete random variable.
The goal of the current study is to present and investigate the delta sequences estimators for the conditional U-statistics for functional data, more specifically for random elements taking values in an infinite-dimensional separable Banach space, such as the space of continuous functions on the interval [0, 1] endowed with the supremum norm. This will allow the delta sequences estimators for the conditional U-statistics for functional data to be utilized for functional data analysis. Examples of functional data that can appear in these spaces include stochastic processes with continuous sample paths on a finite interval associated with the supremum norm and stochastic processes whose sample paths are square-integrable on the real line. Both of these types of stochastic processes can occur on the real line. The dimensionality problem must be addressed in a nonparametric functional data analysis in two ways: first, by working with data that have an infinite number of dimensions, and second, by making universal assumptions about the infinite number of dimensions for the probability distribution of variables in nonparametric modeling. This structure’s twofold infinity of dimensions is the basis for all subsequent developments in the discipline. Our previous work, delivered in the multivariate setting and cited as [82], is extended here in the present study. Although the concept behind our estimation approach is similar to that presented in [82] (containing the Stute estimator), we make allowances for the infinite dimensionality of the covariate. have determined the asymptotic characteristics of the multivariate delta sequence estimators, ref. [82]. Their findings do not directly apply to the current situation since we are working with a covariate with an unlimited number of dimensions. As a result, we must utilize other reasoning in our proofs to deal with the broader framework. These findings are beneficial in their own right, but they are also necessary for the inquiry being conducted in this work. To “simply” combine ideas from other publications would not be sufficient to solve the issue, as will be demonstrated in the following paragraphs. To be able to deal with delta sequence U-statistic estimators for functional data, you will need to resort to intricate mathematical derivations. Compared to the previous studies written on delta sequence estimators, the current paper considers the setting of an unbounded function φ , which adds a significant amount of complexity to the proof. The general assumptions that are required for the derivations of the asymptotic results for the conditional U-statistics delta sequence estimators are presented in this study.
The format of this article is structured as follows. Section 2 is devoted to introducing the delta sequences and the definitions we need in our work, where we introduce the new family of estimators. Section 3 gives the paper’s main results concerning the uniform convergences. In Section 4, we present a significant application for the censored data context of its interest. In Section 5, we provide some applications, including the Kendall rank correlation coefficient in Section 5.1, the discrimination in Section 5.2, the the metric learning in Section 5.3 and the time series prediction from a continuous set of past values in Section 5.4. Some final observations and possible developments in the future are moved to Section 6. To maintain a smooth flow throughout the presentation, all proofs have been compiled in Section 7. A selection of significant technical findings is presented in Appendix A.

2. Preliminaries and Estimation Procedure

Let ( Ω , F , P ) denote a probability space, ( X , d ( · , · ) ) denote an infinite-dimensional separable Banach space equipped with a norm . such that d ( u , v ) = u v and B be the σ -algebra of Borel subsets of X . Let us consider a sequence { X i , Y i : i 1 } of independent identically distributed random copies of the random element ( X , Y ) , where X is a random element defined on ( Ω , F , P ) taking values in ( X , B ) and Y takes values in some abstract space ( Y , B ) . Ref. [83] introduced the functional conditional U-statistics when x X m some semi-metric space as a generalization of Stute’s estimator by:
r ^ n ( m ) ( φ , x ; h K ) = ( i 1 , , i m ) I ( m , n ) φ ( Y i 1 , , Y i m ) K d ( x 1 , X i 1 ) h K K d ( x m , X i m ) h K ( i 1 , , i m ) I ( m , n ) K d ( x 1 , X i 1 ) h K K d ( x m , X i m ) h K .
As we mentioned, the delta-sequences procedures can be considered a more general class, including kernel estimation techniques. Therefore, we can naturally obtain a more general class of functional conditional U-statistics by replacing the kernel K ( · ) in Equation (3) with positive delta sequences δ m ( · , · ) (see Definition 1), which allows us to introduce the following conditional U-statistic for each x = ( x 1 , , x k ) X k and φ : Y k R a measurable function, by
r ^ n ( k ) ( φ , x ; m n ) = ( i 1 , , i k ) I ( k , n ) φ ( Y i 1 , , Y i k ) δ m n x 1 , X i 1 δ m n x k , X i k ( i 1 , , i k ) I ( k , n ) δ m n x 1 , X i 1 δ m n x k , X i k , i f ( i 1 , , i k ) I ( k , n ) δ m n x 1 , X i 1 δ m n x k , X i k 0 , n ! ( n k ) ! k ! ( i 1 , , i k ) I ( k , n ) φ ( Y i 1 , , Y i k ) i f ( i 1 , , i k ) I ( k , n ) δ m n x 1 , X i 1 δ m n x k , X i k = 0 ,
which we consider estimating the regression function
r ( k ) ( φ , x ) = E φ ( Y 1 , , Y k ) ( X 1 , , X k ) = x , f o r x X k ,
whenever it exists, i.e, E φ ( Y 1 , , Y k ) < .
Remark 1. 
It is worth noting that X may admit a probability density function f ( · ) in relation to a σ-finite measure μ on ( X , B ) in such a way that (for instance, refer [77,78,84]):
P ( X A ) = A f ( x ) μ ( d x ) , f o r   e v e r y A B s u c h   t h a t 0 < μ ( A ) < .
The concept of this remark is elaborated upon in [77] and its references. We denote by ( Ω , F , P ) a probability space and a nondecreasing family of sub- σ -algebras of F is denoted by F t t 0 . Let { W ( t ) } t 0 denote a standard Wiener process defined on ( Ω , F , P ) in such a way that W t is F t -measurable. We highlight that the probability measure μ W on the space C 0 ( 0 , T ) is connected with a Borel σ -algebra generated by the supremum norm topology is induced by the standard Wiener process. Let { X ( t ) } 0 t T be a diffusion process defined the stochastic differential equation:
d X ( t ) = a ( t , X ( t ) ) d t + b ( t , X ( t ) ) d W ( t ) ,
where X ( 0 ) = x 0 for 0 t T . By imposing some assumptions on the functions a ( · , · ) and b ( · , · ) , we can establish that the probability measure μ X on the space C 0 ( 0 , T ) induced by the process X is absolutely continuous with respect to the probability measure μ W . In addition, applying Girsanov’s Theorem permits the computation of the Radon-Nikodym derivative of μ X with respect to μ W . The probability density of X on the space C 0 ( 0 , T ) is the μ W derivative, for instance, see [85]. From this point of view, the main motivation leading to the analysis of functional data is the inference of stochastic processes; the reader is referred to [85,86]. For the purposes of drawing conclusions, we make the assumption that the entirety of the process can be observed. However, if the process can only be observed at discrete times, either on a tiny grid or when the data are sparse, then other approaches, such as parametric inference for discrete data, need to be devised. For instance, for the diffusion processes, these new methods are necessary (cf. [85,87,88]). Observe that if A = B ( x , κ ) for ( x , κ ) X × R + * , then (6) allows for the small ball probability to be considered.
Assume also that S X is a pseudo-compact subset of X satisfying the following property: for any ϵ > 0 , there exists t X , 1 d n such that
S X S n : = = 1 d n B ( t , ϵ ) ,
and there exists κ > 0 such that d n ϵ κ is a constant C > 0 . Here, the open ball with center t and radius ϵ is denoted by B t , ϵ .
It is worth mentioning that the hypothesis (7) is essential for assuming a geometrical link between the number d n of balls and their radius ϵ . In addition, this condition is fulfilled in usual nonparametric problems when X = R p is endowed with the Euclidean metric on R p (because κ = p suffices). However, this topological characteristic does not hold for any abstract semi-metric space, as [89] explains. Before we can use the delta-sequences approach to estimate the value of the regression operator r ( k ) ( · ) in the model (5), we must first have the following definition.
Definition 1. 
A sequence of non-negative functions δ m n ( x , y ) : m n 1 = δ m ( x , y ) , m 1 defined on X k × X k is called a delta-sequence with respect to the measure μ if the following properties are satisfied:
(C.1) 
For each γ in such a way that 0 < γ :
lim m + sup x S X k B ( x , γ ) δ m ( x , y ) μ ( d y ) 1 = 0 ,
where B ( x , γ ) : = j = 1 k B ( x j , γ ) , for all x = ( x 1 , , x k ) .
(C.2) 
There exists a positive constant C 1 , in such a way that
sup ( x , y ) S X k × X k δ m ( x , y ) C 1 s m < ,
where 0 < s m as m and lim m m s m log ( m ) = .
(C.3) 
There exist C 2 > 0 , β 1 > 0 and β 2 > 0 , in such a way that
δ m ( x 1 , y ) δ m ( x 2 , y ) C s m β 2 d ( x 1 , x 2 ) β 1 f o r a l l x 1 , x 2 , y X k ,
where
d ( x , y ) : = 1 k d x 1 , y 1 + + 1 k d x k , y k ,
for all x = ( x 1 , , x k ) and y = ( y 1 , , y k ) X k .
(C.4) 
For any γ > 0 :
lim m sup x S X k y B ¯ ( x , γ ) δ m ( x , y ) d ( x , y ) = 0 ,
where the complement set of the open ball B ( x , γ ) is denoted by B ¯ ( x , γ ) .
Notice that the conditions (C.1)(C.4) of Definition 1 are modelled after a similar set of conditions for kernel-type estimators. The condition (C.2) corresponds to the bound of δ m over S X k × X k whereas the condition (C.3) is pertains to the uniform Lipschitz property of δ m ( x , y ) . Contrarily, the condition (C.4) is and it is not an assumption on the bound of d ( x , y ) over S X k × S X k but an assumption on the limiting behaviour of δ m ( x , y ) d ( x , y ) as m + .
Proposition 1. 
Let δ m , 1 ( x 1 , y 1 ) , , δ m , k ( x k , y k ) each be non-negative delta-sequence with respect to the measure μ, then
δ m ( x , y ) : = j = 1 k δ m , j ( x j , y j ) ,
is also a non-negative delta sequence.
This proposition is similar to Proposition 2.2 [69] when X = R d , which means that the product of non-negative delta-sequences is also a positive delta sequence. The proposition provides a flexible way to construct delta sequences in high dimensions in a similar way to kernel type estimation. Unless otherwise specified, we will set
δ m ( x , y ) : = j = 1 k δ m n ( x j , y j )
for all x and y X k . This notation will unburden our results in the forthcoming theorems.

2.1. Examples of Delta Sequence

In this section, following the notation of [78], we provide guidelines for constructing and recovering some well-known estimators in literature.
Example 1. 
Kernel estimator
Let X = C 0 ( 0 , 1 ) denote the space of the real-valued continuous functions that vanishes at 0. Suppose that X is equipped with the uniform topology that is induced by the supremum norm, i.e., if x C 0 ( 0 , 1 ) then x is continuous on ( 0 , 1 ) with x ( 0 ) = 0 and that
x = sup t ( 0 , 1 ) | x ( t ) | .
The Wiener measure on the space X induced by the standard Wiener process is denoted by μ. Let us define
δ m ( x , y ) = 1 μ ( B ( x , 1 / m ) ) 𝟙 B ( x , 1 / m ) ( y ) ,
where as usual 𝟙 A denotes the indicator function of the set A. Set for all x = ( x 1 , , x k ) and y = x = ( y 1 , , y k ) X k
δ m ( x , y ) = j = 1 k δ m ( x j , y j ) ,
then, by Proposition 1, δ m ( x , y ) is a non-negative delta sequence, and the conditional U-statistic is defined in this case by
r ^ n ( k ) ( φ , x ; m n ) = i I ( k , n ) φ ( Y i 1 , , Y i k ) j = 1 k 𝟙 B ( x j , 1 / m ) ( X i j ) i I ( k , n ) j = 1 k 𝟙 B ( x j , 1 / m ) ( X i j ) = i I ( k , n ) φ ( Y i 1 , , Y i k ) j = 1 k 𝟙 B ( x j , 1 ) d ( x j , X i j ) 1 / m i I ( k , n ) j = 1 k 𝟙 B ( x j , 1 ) d ( x j , X i j ) 1 / m ,
which can be considered as the naive kernel estimator of r ( k ) ( · ) . We can observe clearly that δ m ( · , · ) in this example satisfies the condition (C.1). In fact
lim m + sup x S X k B ( x , γ ) j = 1 k 1 μ ( B ( x j , 1 / m ) ) 𝟙 B ( x j , 1 ) d ( x j , y j ) 1 / m μ ( d y 1 ) μ ( d y k ) 1 = lim m + sup x S X k j = 1 k 1 μ ( B ( x j , 1 / m ) ) μ B ( x j , γ ) B ( x j , 1 / m ) 1 = lim m + sup x S X k j = 1 k 1 μ ( B ( x j , 1 / m ) ) μ B x j , min ( γ , 1 / m ) 1 ,
this quantity tends to zero when m is sufficiently large.
For a bandwidth h n x , that is a sequence of positive numbers, define
δ m ( x , y ) = 1 h n x K n ( d ( x , y ) ) ,
where K n ( · ) is a sequence of functions fullfilling (C.1)–(C.4).
Example 2. 
Histogram estimator
Let P n = A n , j , j J n be a partition of the set F (cf. [90]), such that
J n = m n , max j J n μ A n , j 0 a n d n min j J n μ A n , j a s n .
Denote
δ m ( x , y ) = j J n 1 μ A n , j 𝟙 A n , j ( x ) 𝟙 A n , j ( y ) .
We can now construct the histogram and regressogram estimators in the conditional U-statistics framework by taking δ m ( x , y ) = i = 1 k δ m ( x i , y i ) .
Example 3. 
Orthogonal series estimator
Let e p p 1 be a complete orthonormal system of the space X , comprising eigenfunctions of a compact operator in the square integral functions space L 2 ( X ) , say). Define
δ m ( x , y ) = p = 1 m e p ( x ) e p ( y ) f o r x , y F .
As stated in [91], δ m ( · , · ) in this case are delta sequences. Now using Proposition 1, we can observe that
δ m ( x , y ) = i = 1 k δ m ( x i , y i )
are also a positive delta sequences.
For more examples of delta sequences, we refer to [69,92].

2.2. Conditions and Comments

In order to study the consistency of the proposed estimator, let us first state the following conditions:
(C.5) 
We assume that d n = n ζ for ζ > 0 and
ϵ β 1 s m β 2 < s m log ( m ) m .
(C.6) 
Suppose that m and that
0 < τ < 1 in such a way that n τ m n , for large n .
(C.7) 
We assume the following usual boundedness condition:
sup y Y k φ ( y ) = M < .
(C.7’) 
The function φ is unbounded and fullfils for some q > 2 :
μ q : = sup t S X k E φ q ( Y ) | X = t < .
(C.8) 
For every γ 0 :
sup x S X k B ¯ ( x , γ ) δ m ( x , y ) μ ( d y ) 1 = O D m ,
where D m = d ( x , y ) , x S X k a n d y X k , s u c h   t h a t δ m ( x , y ) > 0 = o ( 1 ) as m + .
(C.9) 
The regression operator r ( k ) ( φ , · ) is Lipschitzian in the following sense: C 3 > 0 in such a way that, for any x 1 S X k and x 2 X k , we have
| r ( k ) ( φ , x 1 ) r ( k ) ( φ , x 2 ) | C 3 d ( x 1 , x 2 ) .

2.3. Comments on the Assumptions

Similar to conditions (C.1)–(C.4), assumption (C.5) is also modelled after some kernel-type conditions, and it allows us to select β 1 and β 2 in condition (C.3). Due to the infinite nature of the problem, additional constraints are required to achieve uniform consistency across the pseudo-compact set. Ref. [89], discussed the assumption (7). This condition holds trivially for any finite-dimensional Euclidean space and remains valid for projection-based metric spaces with infinite dimensions. Condition (C.7) concerning the boundedness of the function φ ( · ) is essential to establish exponential bounds, this, coupled with the technical condition (C.6), allows us to obtain the almost complete convergence later in the proofs. Note that we can replace condition (C.7) with a more general one, that is, condition (C.7’), to obtain the results when the function φ ( · ) is unbounded. Finally, to establish precise rates of almost complete convergence in the functional context, additional conditions related to the topological nature of the problem are required. Mainly the assumption (C.8) and (C.9), where the latter condition concerning the Lipschitz property of the operator r ( k ) ( · ) is standard when studying with uniform consistency.
Remark 2. 
Note that the condition (C.7) can be replaced by more broad hypotheses at specific times of  Y , as shown in [93]. That is
(C.7’) 
We denote by { M ( x ) : x 0 } a nonnegative continuous function, increasing on [ 0 , ) , and such that, for some s > 2 , ultimately as x ,
( i ) x s M ( x ) ; ( i i ) x 1 M ( x ) .
For each t M ( 0 ) , we define M i n v ( t ) 0 by M ( M i n v ( t ) ) = t . We assume further that:
E M φ ( Y ) < .
The following choices of M ( · ) are of particular interest:
(i)
M ( x ) = x p for some p > 2 ;
(ii)
M ( x ) = exp ( s x ) for some s > 0 .
The boundedness assumption on φ ( · ) can be substituted by a finite moment assumption (C.7’), but doing so will add a significant amount of additional complexity to the proofs; for further information, check the most recent reference [42,45,62,94,95] for more details.

3. Some Asymptotic Results

In this subsection, we will discuss the uniform consistency of the functional conditional U-statistic, which is defined by (4). First, let us provide basic notation
X : = ( X 1 , , X k ) X k , Y : = ( Y 1 , , Y k ) Y k , X i : = ( X i 1 , , X i k ) , Y i : = ( Y i 1 , , Y i k ) , G φ , x ( X , Y ) : = φ ( Y ) δ m ( x , X ) f o r x S X k , u n ( φ , x , m n ) = u n ( k ) ( G φ , x ) : = ( n k ) ! n ! i I ( k , n ) G φ , x ( X i , Y i ) .
It is clear that, for all x X k :
r ^ n ( k ) ( φ , x ; m n ) = u n ( φ , x , m n ) u n ( 1 , x , m n ) ,
and u n ( φ , x , m n ) is a classical U-statistic with the U-kernel G φ , x , m n ( x , y ) . Therefore, to establish the uniform consistency of r ^ n ( k ) ( φ , x ; m n ) to r ( k ) ( φ , x ) we need to study the uniform consistency of u n ( φ , x , m n ) to E u n ( φ , x , m n ) . In this case, we will be considering a suitable centering parameter different from the expectation E r ^ n ( k ) ( φ , x ; m n ) ; hence, we define:
E ^ r ^ n ( k ) ( φ , x ; m n ) = E u n ( φ , x , m n ) E u n ( 1 , x , m n ) .
The notation and facts that are presented below should be included in the continuation of this discussion. For a kernel L of k 1 variables, we define
U n ( k ) ( L ) = ( n k ) ! n ! i I ( k , n ) L X i 1 , , X i k
Suppose that L is a function of 1 variables, symmetric in its entries. Then, the Hoeffding projections (see [8,19]) with respect to P , for 1 k , are defined as
π k , L x 1 , , x k = Δ x 1 P × × Δ x k P × P k ( L ) ,
and
π 0 , L = E L X 1 , , X ,
for some measures Q i on S, we denote
Q 1 Q k L = S k L ( x 1 , , x k ) d Q 1 ( x 1 ) d Q k ( x k ) ,
and Δ x denote Dirac measure at point x X . Then, the Hoeffding decomposition give the following
U n ( ) ( L ) E L = k = 1 k U n ( k ) π k , L ,
which is easy to check. For L L 2 P this denotes an orthogonal decomposition and E ( π k L X 2 , , X k ) = 0 for k 1 ; which is, the kernels π k , L are canonical for P . Moreover, π k , , k 1 , are nested projections, that is, π k , π k , = π k , if k k , and
E π k , L 2 E ( L E L ) 2 E L 2 .
For example,
π 1 , h ( x ) = E ( h ( X 1 , , X ) X 1 = x ) E h ( X 1 , , X ) .
Remark 3. 
The function G φ , x , m n is not necessarily symmetric; when we need to symmetrize them, we have:
G ¯ φ , x ( x , y ) : = 1 k ! σ I k k G φ , x , m n ( x σ , y σ ) = 1 k ! σ I k k φ ( y σ ) δ m n ( x σ , y σ ) ,
where x σ = ( x σ 1 , , x σ k ) and y σ = ( y σ 1 , , y σ k ) . After symmetrization, the expectation
E G ¯ φ , x , m n ( x , y ) = E G φ , x , m n ( x , y ) ,
and the U-statistic
u n ( k ) ( G φ , x , m n ) = u n ( k ) ( G ¯ φ , x , m n ) : = u n ( φ , x , m n )
do not change.

3.1. Uniform Consistency of Functional Conditional U-Statistics

Let z n for n N , be a sequence of real r.v.’s. We say that z n converges almost-completely (a.co.) toward zero if, and only if, for all
ϵ > 0 , n = 1 P z n > ϵ < .
Moreover, we say that the rate of the almost-complete convergence of z n toward zero is of order u n (with u n 0 ) and we write z n = O a . c o . u n if, and only if, there exists ϵ > 0 such that
n = 1 P z n > ϵ u n < .
This kind of convergence implies both the almost-sure convergence and the convergence in probability. The following result concerns the uniform deviation of the estimate u n ( φ , x , m n ) with respect to E ( u n ( φ , x , m n ) ) when the function φ is bounded.
Theorem 1. 
Under the conditions (C.1)–(C.4), and if conditions (C.5), (C.7) are satisfied, then we have:
sup x S X k u n ( φ , x , m n ) E u n ( φ , x , m n ) = O a . c o s m log ( m ) m .
We present a more general result concerning the case when the function φ is unbounded in the sense of the condition (C.7’). That being said, the preceding theorem constitutes an important step in the truncation method used in the proof of the following theorem.
Theorem 2. 
Under the conditions (C.1)–(C.4), and if conditions (C.5), (C.6) and (C.7’) are satisfied, then we have:
sup x S X k u n ( φ , x , m n ) E u n ( φ , x , m n ) = O a . c o s m log ( m ) m .
The following result handles the uniform deviation of the estimator r ^ n ( k ) ( φ , x ; m n ) with respect to E ^ r ^ n ( k ) ( φ , x ; m n ) in the case of the function φ is bounded or unbounded.
Theorem 3. 
Under the conditions (C.1)–(C.4), and if conditions (C.5), (C.6) and condition (C.7) (or(C.7’)) are satisfied, then we have:
sup x S X k r ^ n ( k ) ( φ , x ; m n ) E ^ r ^ n ( k ) ( φ , x ; m n ) = O a . c o s m log ( m ) m ,
where s m m N * is a sequence of positive real numbers, in such a way that m s m log ( m ) 1 as n .
Theorem 4. 
Under the conditions (C.1)–(C.4) and (C.9), we have:
sup x S X k E ^ r ^ n ( k ) ( φ , x ; m n ) r ( k ) ( φ , x ) 0 .
The following corollary is more or less straightforward, given Theorems 3 and 4.
Corollary 1. 
Under the conditions of Theorems 3 and 4 it follows that, as m tends to infinity:
sup x S X k r ^ n ( k ) ( φ , x ; m n ) r ( k ) ( φ , x ) 0 , a . c o . ,
where s m m N * is a sequence of positive real numbers, in such a way that m s m log ( m ) 1 as n .

3.2. Uniform Strong Consistency with Rates

This section is devoted to the uniform version with the rate of Theorem 1’s result. More specifically, our objective is to obtain the uniform almost-complete convergence of r ^ n ( k ) ( · ) on some subset S X k of X k satisfying condition (7). In the following theorem, we establish the bias order.
Theorem 5. 
Under the conditions (C.1)–(C.4), and if conditions (C.8) and (C.9) are satisfied, then we have:
sup x S X k E ^ r ^ n ( k ) ( φ , x ; m n ) r ( k ) φ , x = O D m .
The almost-complete convergence is then given by the corollary that follows, which uses a rate of r ^ n ( k ) ( · ) .
Corollary 2. 
Under the conditions of Theorems 3 and 5 it follows that:
sup x S X k r ^ n ( k ) ( φ , x ; m n ) r ( k ) ( φ , x ) = O D m + O a . c o s m log ( m ) m .

4. Conditional U -Statistics for Censored Data

Consider a triple ( Y , C , X ) of random variables defined in R × R × X . Here Y is the variable of interest, C is a censoring variable, and X is a concomitant variable. Throughout, we will use [96] notation and we work with a sample { ( Y i , C i , X i ) 1 i n } of independent and identically distributed replication of ( Y , C , X ) , n 1 . Actually, in the right censorship model, the pairs ( Y i , C i ) , 1 i n , are not directly observed, and the corresponding information is given by Z i : = min { Y i , C i } and Δ i : = 𝟙 { Y i C i } , 1 i n . Accordingly, the observed sample is
D n = { ( Z i , Δ i , X i ) , i = 1 , , n } .
This type of censoring is commonly applied to the survival data collected during clinical trials as well as the failure time data collected during reliability studies, for example. To be more specific, the majority of statistical experiments end up producing incomplete samples, even when the conditions are carefully monitored. For instance, clinical data for surviving the majority of diseases are typically censored due to the presence of other competing risks to life that ultimately result in death. In the sequel, we impose the following assumptions upon the distribution of ( X , Y ) . Denote by I a given compact set in X with nonempty interior and set, for any α > 0 ,
I α = { x : inf u I x u α } .
For < t < , set
F Y ( t ) = P ( Y t ) , G ( t ) = P ( C t ) , a n d H ( t ) = P ( Z t ) ,
the right-continuous distribution functions of Y, C and Z, respectively. For any right-continuous distribution function L defined on R , denote by
T L = sup { t R : L ( t ) < 1 }
the upper point of the corresponding distribution. Now, consider a pointwise measurable class F of real measurable functions defined on R , and assume that F is of VC-type. We recall the regression function of ψ ( Y ) evaluated at X = x , for ψ F and x I α , given by
r ( 1 ) ( ψ , x ) = E ( ψ ( Y ) X = x ) ,
when Y is right-censored. To estimate r ( 1 ) ( ψ , · ) , we make use of the Inverse Probability of Censoring Weighted (I.P.C.W.) estimators that have recently gained popularity in the censored data literature (see [97,98]). The key idea of I.P.C.W. estimators is as follows. Introduce the real-valued function Φ ψ ( · , · ) defined on R 2 by
Φ ψ ( y , c ) = 𝟙 { y c } ψ ( y c ) 1 G ( y c ) .
Assuming the function G ( · ) to be known, first note that
Φ ψ ( Y i , C i ) = Δ i ψ ( Z i ) / ( 1 G ( Z i ) )
is observed for every 1 i n . In addition, under Assumption ( I ) below
( I )
C and ( Y , X ) are independent.
We have
r ( 1 ) ( Φ ψ , x ) : = E ( Φ ψ ( Y , C ) X = x ) = E 𝟙 { Y C } ψ ( Z ) 1 G ( Z ) X = x = E ψ ( Y ) 1 G ( Y ) E ( 𝟙 { Y C } X , Y ) X = x = r ( 1 ) ( ψ , x ) .
Therefore, every estimate of r ( 1 ) ( Φ ψ , · ) that can be constructed using completely observed data is also an estimate of r ( 1 ) ( ψ , · ) . This characteristic permits the natural application of the majority of statistical procedures known to produce estimates of the regression function in the uncensored case to the censored case. Estimates of the kernel type, for instance, are exceptionally straightforward to build. Set, for x I , h 0 , 1 i n ,
ω ¯ n , K , h , i ( 1 ) ( x ) : = δ m n x , X i / j = 1 n δ m n x , X j .
Making use of the Equations (17)–(19), whenever G ( · ) is known, we define the kernel estimator of r ( 1 ) ( ψ , · ) by
r ˘ n ( 1 ) ( ψ , x ; h n ) = i = 1 n ω ¯ n , K , h , i ( 1 ) ( x ) Δ i ψ ( Z i ) 1 G ( Z i ) .
Since the function G ( · ) is unknown, it is to be estimated. Let G n * ( · ) denote the Kaplan-Meier estimator of the function G ( · ) [99]. To be precise, adopting the conventions
Φ = 1
and 0 0 = 1 and setting
N n ( u ) = i = 1 n 𝟙 { Z i u } ,
we have
G n * ( u ) = 1 i : Z i u N n ( Z i ) 1 N n ( Z i ) ( 1 Δ i ) , f o r u R .
Given this notation, we will examine the next estimate of r ( 1 ) ( ψ , · )
r ˘ n ( 1 ) * ( ψ , x ; h n ) = i = 1 n ω ¯ n , K , h , i ( 1 ) ( x ) Δ i ψ ( Z i ) 1 G n * ( Z i ) ,
the reader is invited to see the papers of [96,97]. The convention 0 / 0 = 0 is used, this quantity is well defined, since G n * ( Z i ) = 1 if and only if Z i = Z ( n ) and Δ ( n ) = 0 , where Z ( k ) is the kth ordered statistic related with the sample ( Z 1 , , Z n ) for k = 1 , , n and Δ ( k ) is the Δ j corresponding to Z k = Z j . When the variable of interest is right-censored, it is often impossible to estimate the function of the (conditional) law on the whole support (see [100]). Ref. [101] introduces a right-censored version of an unconditional U-statistic with a kernel of degree m 1 based on the notion of a mean-preserving reweighting technique. Ref. [102] have demonstrated the almost sure convergence of multi-sample U-statistics under random censorship and presented an application by analyzing the consistency of a novel class of tests meant to evaluate distribution equality. Ref. [103] presented improvements to the traditional U-statistics to counteract potential biases caused by right-censoring of the outcomes and the existence of confounding factors. Ref. [104] suggested an alternative method for estimating the U-statistic by employing a substitution estimator of the conditional kernel given observed data. We also refer to [44,45,105]. To our best knowledge, estimating the conditional U-statistics in the censored data setting is a current open problem, and it gives the main motivation for the study of this section.
The function described by (17) has a natural expansion given by
Φ ψ ( y 1 , , y k , c 1 , , c k ) = i = 1 k { 𝟙 { y i c i } ψ ( y 1 c 1 , , y k c m ) i = 1 k { 1 G ( y i c i ) } .
We have an analogous relationship to (18) based on the formula:
E ( Φ ψ ( Y 1 , , Y k , C 1 , , C k ) ( X 1 , , X k ) = x ) = E i = 1 k { 𝟙 { Y i C i } ψ ( Y 1 C 1 , , Y k C k ) i = 1 k { 1 G ( Y i C i ) } ( X 1 , , X k ) = x = E ψ ( Y 1 , , Y k ) i = 1 k { 1 G ( Y i ) } E i = 1 k { 𝟙 { Y i C i } ( Y 1 , X 1 ) , ( Y k , X k ) ( X 1 , , X k ) = x = E ψ ( Y 1 , , Y k ) ( X 1 , , X k ) = x .
An analogue estimator to (4) in the censored situation is given by
r ˘ n ( k ) ( ψ , x ; m n ) = ( i 1 , , i k ) I ( k , n ) Δ i 1 Δ i k ψ ( Z i 1 , , Z i k ) ( 1 G ( Z i 1 ) ( 1 G ( Z i k ) ) ω ¯ n , δ , m n , i ( k ) ( x ) ,
where, for i = ( i 1 , , i k ) I ( k , n ) ,
ω ¯ n , δ , m n , i ( k ) ( x ) = δ m n x 1 , X i 1 δ m n x k , X i k ( i 1 , , i k ) I ( k , n ) δ m n x 1 , X i 1 δ m n x k , X i k .
The estimator we shall examine is provided by
r ˘ n ( k ) * ( ψ , x ; m n ) = ( i 1 , , i k ) I ( k , n ) Δ i 1 Δ i k ψ ( Z i 1 , , Z i k ) ( 1 G n * ( Z i 1 ) ( 1 G n * ( Z i k ) ) ω ¯ n , δ , m n , i ( k ) ( x ) .
In a similar way as in [44], we arrive to the following conclusion.
Corollary 3. 
Assume that the condition ( I ) and the assumptions of Theorems 3 and 5 are satisfied. Then, we have
r ˘ n ( k ) * ( ψ , x ; m n ) r ( k ) ( φ , x ˜ ) = O D m + O a . c o s m log ( m ) m , a . s .
This last result is a direct consequence of Corollary (2) and the law of iterated logarithm for G n * ( · ) obtained in [106] gives
sup t τ | G n * G ( t ) | = O log log n n almost surely as n .
At this point, we may refer to [44,45,105].

5. Applications

5.1. Kendall Rank Correlation Coefficient

To test the independence of one-dimensional random variables Y 1 and Y 2 [107] proposed a method based on the U-statistic K n with the kernel function:
φ s 1 , t 1 , s 2 , t 2 = 𝟙 s 2 s 1 t 2 t 1 > 0 𝟙 s 2 s 1 t 2 t 1 0 ·
Its rejection on the region is of the form n K n > γ . In this example, we consider a multivariate case. To test the conditional independence of ξ , η : Y = ( ξ , η ) given X, we propose a method based on the conditional U-statistic:
r ^ n ( 2 ) ( φ , t ) = i j n φ Y i , Y j δ m ( t 1 , X i ) δ m ( t 2 , X j ) i j n δ m ( t 1 , X i ) δ m ( t 2 , X j ) ,
where t = t 1 , t 2 I R 2 and φ ( · ) is Kendall’s kernel (25). Suppose that ξ and η are d 1 and d 2 -dimensional random vectors, respectively, and d 1 + d 2 = d . Furthermore, suppose that Y 1 , , Y n are observations of ( ξ , η ) , we are interested in testing:
H 0 : ξ a n d η are   conditionally   independent   given X . v s H a : H 0 is   not   true .
Let a = a 1 , a 2 R d such as a = 1 and a 1 R d 1 , a 2 R d 2 , and F ( · ) , G ( · ) be the distribution functions of ξ and η , respectively. Suppose F a 1 ( · ) and G a 2 ( · ) to be continuous for any unit vector a = a 1 , a 2 where F a 1 ( t ) = P a 1 ξ < t and G a 2 ( t ) = P a 2 η < t and a 1 T means the transpose of the vector a i , 1 i 2 . For n = 2 , let Y ( 1 ) = ξ ( 1 ) , η ( 1 ) and Y ( 2 ) = ξ ( 2 ) , η ( 2 ) such as ξ ( i ) R d 1 and η ( i ) R d 2 for i = 1 , 2 , and:
φ a Y ( 1 ) , Y ( 2 ) = φ a 1 ξ ( 1 ) , a 2 η ( 1 ) , a 1 ξ ( 2 ) , a 2 η ( 2 ) .
An application of Corollary 2 gives
sup x S X 2 r ^ n ( 2 ) ( φ a , x ; m n ) r ( 2 ) ( φ a , x ) = O D m + O a . c o s m log ( m ) m .

5.2. Discrimination Problems

Now, we apply these findings to the discrimination problem outlined in Section 3 of [108], refer to also to [109]. We will employ a similar setup and notation. Let φ ( · ) be any function taking at most finitely many values, say 1 , , M . The sets
A j = ( y 1 , , y k ) : φ ( y 1 , , y k ) = j , 1 j M
subsequently, produce a partition of the feature space. Predicting the value of φ ( Y 1 , , Y k ) is equivalent to making a guess about which set will be in the partition to which ( Y 1 , , Y k ) belongs. For any discrimination rule g, we have
P ( g ( X ) = φ ( Y ) ) j = 1 M x ˜ : g ( x ˜ ) = j } max M j ( x ˜ ) d P ( x ˜ ) ,
where
M j ( x ˜ ) = P ( φ ( Y ) = j X = x ˜ ) , x ˜ R d .
The inequality described above becomes equality if
g 0 ( x ˜ ) = arg max 1 j M M j ( x ˜ ) .
g 0 ( · ) is known as the Bayes rule, and the associated error probability
L * = 1 P ( g 0 ( X ) = φ ( Y ) ) = 1 E max 1 j M M j ( x ˜ )
is called the Bayes risk. Each of the unknown M j functions can be reliably estimated using one of the techniques described in the prior sections. Let, for 1 j M ,
M n j ( x ˜ ) = ( i 1 , , i k ) I ( k , n ) 𝟙 { φ ( Y i 1 , , Y i k ) = j } δ m n x 1 , X i 1 δ m n x k , X i k ( i 1 , , i k ) I ( k , n ) δ m n x 1 , X i 1 δ m n x k , X i k ,
Set
g 0 , n ( x ˜ ) = arg max 1 j M M n j ( x ˜ ) .
Let us define
L n * = P ( g 0 , n ( X ) φ ( Y ) ) .
The discrimination rule g 0 , n ( · ) is asymptotically Bayes’ risk consistent
L n * L * .
This is a consequence of the relation
L * L n * 2 E max 1 j M M n j ( X ) M j ( X ) .

5.3. Metric Learning

Metric learning seeks to adapt the metric to the data and has garnered a great deal of attention in recent years; for instance, see [110,111] for an account of metric learning and its applications. This is driven by a wide variety of applications, spanning from information retrieval via bioinformatics to computer vision as the primary source of inspiration. For the purpose of demonstrating the applicability of this idea, we will now discuss the metric learning problem for supervised classification as shown in [111]. Let us consider independent copies X 1 , Y 1 , , X n , Y n of a X × Y valued random couple ( X , Y ) , where X is some feature space and Y = { 1 , , C } , with C 2 say, a finite set of labels. Let D be a set distance measures D : X × X R + . The purpose of metric learning in this context is, intuitively speaking, to identify a metric under which pairs of points with the same label are close to each other, while those with different labels are far away from each other. A natural way to characterize the risk associated with a metric D is as follows
R ( D ) = E ϕ 1 D X , X · 2 𝟙 Y = Y 1 ,
where ϕ ( u ) is a convex loss function upper bounding the indicator function 𝟙 { u 0 } , for instance, the hinge loss ϕ ( u ) = max ( 0 , 1 u ) . To estimate R ( D ) , we consider the usual empirical estimator
R n ( D ) = 2 n ( n 1 ) 1 i < j n ϕ D X i , X j 1 · 2 𝟙 Y i = Y j 1 ,
which is one sample U-statistic of degree two with kernel given by:
φ D ( x , y ) , x , y = ϕ D x , x 1 · 2 𝟙 y = y 1 .
The convergence to (29) of a minimizer of (30) has been studied in the frameworks of algorithmic stability ([112]), algorithmic robustness ([113]) and based on the theory of U-processes under appropriate regularization ([114]).

5.4. Time Series Prediction from Continuous Set of Past Values

Let { Z n ( t ) , t R } n 1 denote a sequence of processes with value in R . Let s denote a fixed positive real number. In this model, we suppose that the process is observed from t = 0 until t = t max , and assume without loss of generality that t max = n T + s < τ . The method ensures splitting the observed process into n fixed-length segments. Let us denote each piece of the process by
X i = { Z ( t ) , ( i 1 ) T t < i T } .
The response value is therefore Y i = Z ( i T + s ) , and this can be formulated as a regression problem:
φ ( Z 1 ( τ + s ) , , Z k ( τ + s ) ) = r ( k ) ( Z 1 ( t ) , , Z k ( t ) ) , f o r τ T t < τ .
provided that we make the assumption that a function of this kind, r, does not depend on i (which is satisfied if the process is stationary, for example). Because of this, when we get to time τ , we can use the following predictor, which is directly derived from our estimator, to make a prediction about the value that will be at time τ + s
r ^ n ( k ) ( φ , z ; m n ) = ( i 1 , , i k ) I ( k , n ) φ ( Z i 1 ( τ + s ) , , Z i k ( τ + s ) ) δ m n z 1 , X i 1 δ m n z k , X i k ( i 1 , , i k ) I ( k , n ) δ m n z 1 , X i 1 δ m n z k , X i k
where z = ( z 1 , , z k ) = { ( Z 1 ( t ) , , Z k ( t ) ) , for τ T t < τ } . Corollary 2 provides mathematical support for this nonparametric functional predictor and extends previous results in numerous ways in [48,78]. Notice that this modelization encompasses a wide variety of practical applications, as this procedure allows for the consideration of a large number of past process values without being affected by the curse of dimensionality.

5.5. Example of U-Kernels

Example 4. 
Hoeffding’s D From the symmetric kernel,
h D z 1 , , z 5 : = 1 16 i 1 , , i 5 P 5 1 z i 1 , 1 z i 5 , 1 1 z i 2 , 1 z i 5 , 1 1 z i 3 , 1 z i 5 , 1 1 z i 4 , 1 z i 5 , 1 × 1 z i 1 , 2 z i 5 , 2 1 z i 2 , 2 z i 5 , 2 1 z i 3 , 2 z i 5 , 2 1 z i 4 , 2 z i 5 , 2 .
We obtain Hoeffding’s D statistic, which is a rank-based U-statistic of order 5.
Example 5 
(Blum-Kiefer-Rosenblatt’s R). The symmetric kernel
h R z 1 , , z 6 : = 1 32 i 1 , , i 6 P 6 1 z i 1 , 1 z i 5 , 1 1 z i 2 , 1 z i 5 , 1 1 z i 3 , 1 z i 5 , 1 1 z i 4 , 1 z i 5 , 1 × 1 z i 1 , 2 z i 6 , 2 1 z i 2 , 2 z i 6 , 2 1 z i 3 , 2 z i 6 , 2 1 z i 4 , 2 z i 6 , 2 ,
gives Blum-Kiefer-Rosenblatt’s R statistic (see [115]), which is a rank-based U-statistic of order 6, refer also to [116,117,118,119,120].
Example 6. 
Bergsma-Dassios-Yanagimoto’s τ * [121] introduced a rank correlation statistic as a U-statistic of order 4 with the symmetric kernel
h τ * z 1 , , z 4 : = 1 16 i 1 , , i 4 P 4 1 z i 1 , 1 , z i 3 , 1 < z i 2 , 1 , z i 4 , 1 + 1 z i 2 , 1 , z i 4 , 1 < z i 1 , 1 , z i 3 , 1 1 z i 1 , 1 , z i 4 , 1 < z i 2 , 1 , z i 3 , 1 1 z i 2 , 1 , z i 3 , 1 < z i 1 , 1 , z i 4 , 1 × 1 z i 1 , 2 , z i 3 , 2 < z i 2 , 2 , z i 4 , 2 + 1 z i 2 , 2 , z i 4 , 2 < z i 1 , 2 , z i 3 , 2 1 z i 1 , 2 , z i 4 , 2 < z i 2 , 2 , z i 3 , 2 1 z i 2 , 2 , z i 3 , 2 < z i 1 , 2 , z i 4 , 2 .
Here
1 y 1 , y 2 < y 3 , y 4 : = 1 y 1 < y 3 1 y 1 < y 4 1 y 2 < y 3 1 y 2 < y 4 .
Example 7. 
Two generic vectors y = y 1 , y 2 and z = z 1 , z 2 in R 2 are said to be concordant if y 1 z 1 y 2 z 2 > 0 . For m , k = 1 , , p , define
τ m k = 1 n ( n 1 ) 1 i j n 1 X i m X j m X i k X j k > 0 .
Then, Kendall’s tau rank correlation coefficient matrix T = τ m k m , k = 1 p denotes a matrix-valued U-statistic, for wich the kernel is bounded. It is obvious that τ m k quantifies the monotonic dependency between X 1 m , X 1 k and X 2 m , X 2 k and it is an unbiased estimator of
P X 1 m X 2 m X 1 k X 2 k > 0 ,
that is, the probability that X 1 m , X 1 k and X 2 m , X 2 k are concordant.
Example 8. 
The Gini mean difference. The Gini index provides another usual measure of dispersion. It corresponds to the case where E R and h ( x , y ) = | x y | :
G n = 2 n ( n 1 ) 1 i < j n X i X j

6. Concluding Remarks

In this paper, the conditional U-statistics regression operator estimation methods for random elements taking values in an infinite-dimensional separable Banach space are generalized to the delta-sequences techniques. The space of continuous functions on the interval ( 0 , 1 ) with the supremum norm illustrates a separable Banach space. Notably, the method of delta-sequences unifies the kernel method of the probability density function estimation, the histogram method, and a few other methods, including the method of orthogonal series for appropriate choices of orthonormal bases in the one-dimensional and finite-dimensional cases. We have obtained strong uniform consistency results in abstract settings under some conditions on the model. The general framework that we consider extends the existing methods to higher-order statistics; this has a significant impact both from a theoretical and practical point of view. In a future investigation, considering the limiting law of the conditional U-statistics regression estimators based on the delta sequence will be of interest. A natural extension of the present investigation is to consider the serial-dependent setting such as the mixing (see [61,62,122]) or the ergodic processes (see [56,123]). In a future investigation of the functional delta sequence local linear approach estimators, it will be natural to think about the possibility of obtaining an alternative estimator that benefits from the advantages of both methods, the local linear method and the delta sequence approach. This is because both methods have their own distinct advantages. Many methods have been developed and established to construct, in asymptotically optimal ways, bandwidth selection rules for nonparametric kernel estimators, particularly for the Nadaraya-Watson regression estimator. We quote several of these methods, including [44,45,124]. This parameter needs to have an appropriate value chosen for it to ensure that satisfactory practical performances are achieved, either in the typical situation of finite dimensions or in the framework of infinite dimensions. On the other hand, to the best of our knowledge, no such studies are currently conducted to treat generic functional conditional U-statistics. This exemplifies a potential new avenue for research in the future.

7. Mathematical Development

This section contains the proof of our results. The preceding notation is also used in the subsequent text. Keeping in mind the relation (7), we can conclude that, for each x = ( x 1 , , x k ) S X k , there exists ( x ) = ( ( x 1 ) , , ( x k ) ) where 1 i k , 1 ( x i ) d n and such that
x i = 1 k B ( x ( x i ) , ε n ) a n d d ( x i , x ( x i ) ) = arg min 1 d n d ( x i , x ) .
We denote for each x = ( x 1 , , x k ) S X k and x ( x ) = ( x ( x 1 ) , , x ( x k ) ) :
i = 1 k B ( x ( x i ) , ε n ) = : B ( x ( x ) , ε n ) .
Hence, for each x S X k , we can reformulated the U-statistic as
u n ( φ , x ; m n ) E [ u n ( φ , x ; m n ) ]     u n ( φ , x ; m n ) u n ( φ , x ( x ) ; m n )         + E [ u n ( φ , x ( x ) ; m n ) ] E [ u n ( φ , x ; m n ) ]         + u n ( φ , x ( x ) ; m n ) E [ u n ( φ , x ( x ) ; m n ) ] .
Proof of Theorem 1. 
We need to establish that there exists some η > 0 , in such a way that
n 1 P sup x S X k m s m log ( m ) u n ( φ , x , m n ) E u n ( φ , x , m n ) η < .
To do that, we need to obtain an exponential bound for
P sup x S X k u n ( φ , x , m n ) E u n ( φ , x , m n ) > η s m log ( m ) m .
We first remark that we have
u n ( φ , x , m n ) E u n ( φ , x , m n ) = ( n k ) ! n ! i I ( k , n ) φ ( Y i 1 , , Y i k ) j = 1 k δ m n ( x j , X i j ) E φ ( Y i 1 , , Y i k ) j = 1 k δ m n ( x j , X i j ) = ( n k ) ! n ! i I ( k , n ) G φ , x ( X i , Y i ) E G φ , x ( X i , Y i ) = ( n k ) ! n ! i I ( k , n ) H ( X i , Y i ) ,
where
H ( X , Y ) = G φ , x ( X , Y ) E G φ , x ( X , Y ) .
In order to get the desired result, we apply Lemma A1 on the function H ( · , · ) . Throughout the rest of the proof, we suppose the function G φ , x is symmetric. Moreover, it is clear that the function H ( · , · ) is bounded by 2 M C 1 s m by condition (C.2) and the fact that the function φ ( · ) is bounded by the condition (C.7). We obviously remark that,
θ = E [ H ( X , Y ) ] = 0
by design, and
σ 2 = Var ( H ( X , Y ) ) 2 ( M C 1 s m ) 2 .
For any η > 0 and m large enough, we obtain that
P u n ( φ , x , m n ) E u n ( φ , x , m n ) > η s m log ( m ) m 2 exp n ( ( s m log ( m ) ) / m ) η 2 4 ( M C 1 s m ) 2 + 4 3 M C 1 s m η ( s m log ( m ) ) / m .
We can write
P sup x S X k u n ( φ , x ; m n ) E [ u n ( φ , x ; m n ) ] > 2 η s m log ( m ) m     P sup x S X k u n ( φ , x ; m n ) u n ( φ , x ( x ) ; m n )
+ E [ u n ( φ , x ( x ) ; m n ) ] E [ u n ( φ , x ; m n ) ] > η s m log ( m ) m
+ P sup x S X k u n ( φ , x ( x ) ; m n ) E [ u n ( φ , x ( x ) ; m n ) ] > η s m log ( m ) m .
Taking into account the condition (C.3), we have
u n ( φ , x ; m n ) u n ( φ , x ( x ) ; m n ) ( n k ) ! n ! i I ( k , n ) φ ( Y i 1 , , Y i k ) j = 1 k δ m n ( x j , X i j ) j = 1 k δ m n ( x ( x j ) , X i j ) M ( n k ) ! n ! i I ( k , n ) δ m n ( x , X i ) δ m n ( x ( x ) , X i ) M ( n k ) ! n ! i I ( k , n ) C 2 s m β 2 d ( x , x ( x ) ) β 1 M C 2 s m β 2 d ( x , x ( x ) ) β 1 M C 2 s m β 2 ϵ n β 1 .
Consequently, we obtain uniformly on x S X k :
sup x S X k u n ( φ , x ; m n ) u n ( φ , x ( x ) ; m n ) O ( s m β 2 ϵ n β 1 ) = O s m log ( m ) m ,
by condition (C.5). We deduce from (36) that:
E [ u n ( φ , x ( x ) ; m n ) ] E [ u n ( φ , x ; m n ) ]     = E u n ( φ , x ( x ) ; m n ) u n ( φ , x ; m n )     E u n ( φ , x ( x ) ; m n ) u n ( φ , x ; m n ) .
The passage from (37) to (37) follows by applying Jensen’s inequality further to some properties of the absolute value function. Now using the fact that the function φ ( · ) is bounded and that the function δ m is Lipschitz in addition for any constant a, E [ a ] = a , we can directly conclude that
sup x S X k E [ u n ( φ , x ( x ) ; m n ) ] E [ u n ( φ , x ; m n ) ] O s m β 2 ϵ n β 1 = O s m log ( m ) m .
For some η > 0 and for sufficiently large n and large m, we have
P sup x S X k u n ( φ , x ; m n ) u n ( φ , x ( x ) ; m n )             + E [ u n ( φ , x ( x ) ; m n ) ] E [ u n ( φ , x ; m n ) ] > η s m log ( m ) m = 0 .
Continue, now, with (35), supposing that the kernel function G φ , x ( · ) is symmetric, we have to decompose the U-statistic by making use of the [8] decomposition, we infer that
u n ( φ , x ; m n ) E [ u n ( φ , x ; m n ) ] = p = 1 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x , m n ) = k u n ( 1 ) π 1 , k ( G φ , x ) + p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ) .
Let us first start with the linear term. We have
k u n ( 1 ) π 1 , k ( G φ , x ) = k n j = 1 n π 1 , k ( G φ , x ) ( X i , Y i ) .
From Hoeffding’s projection, we have
π 1 , k ( G φ , x ) ( x , y ) = E G φ , x ( x , X 2 , , X k ) , ( y , Y 2 , , Y k ) E [ G φ , x X , Y ] = E [ G φ , x X , Y | ( X 1 , Y 1 ) = ( x , y ) ] E [ G φ , x X , Y ] .
Set
Z i = π 1 , k ( G φ , x ) ( X i , Y i ) .
We can see that Z i are independent and identically distributed random variables bounded by 2 k M C 1 s m with zero mean and
σ 2 ( M C 1 s m ) 2 .
An application of Bernstein’s inequality yields
P sup x S X k u n ( 1 ) π 1 , k ( G φ , x ) > η s m log ( m ) m i = 1 d n P max 1 i d n u n ( 1 ) π 1 , k ( G φ , x ) > η s m log ( m ) m 2 d n exp n ( ( s m log ( m ) ) / m ) η 2 4 ( M C 1 s m ) 2 + 4 3 M C 1 s m η ( s m log ( m ) ) / m n α τ η 2 / C 4 ,
resulting from the fact m n and log ( m ) τ log ( n ) . This implies that
n 1 P sup x S X k u n ( 1 ) π 1 , k ( G φ , x ) > η s m log ( m ) m n 1 n α τ η 2 / C 4 < .
Consequently, we obtain the following:
sup x S X k u n ( 1 ) π 1 , k ( G φ , x ) = O a . c o s m log ( m ) m .
Moving to the nonlinear term, we want to prove that for 2 p k :
sup x S X k k p m u n ( p ) π p , k G φ , x ( x ) s m log ( m ) = O a . c o ( 1 ) ,
which implies that, for 1 i k and = ( 1 , , k ) :
max 1 i d n k p m u n ( p ) π p , k G φ , x ( x ) s m log ( m ) = O a . c o ( 1 ) .
To prove the above-mentioned equation, we need to apply Proposition 1 of [125] (see Lemma A2). We can see that G φ , x is bounded by M C 1 s m , hence for η > 0 we have
P n 1 / 2 p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ) > η s m log ( m ) m = P p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ) > n 1 / 2 η s m log ( m ) m = P p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ) > ε 0 s m log ( m ) m ,
where ε 0 = η n . Now for t = η s m log ( m ) m , Lemma A2 gives us:
P p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ) > ε 0 s m log ( m ) m 2 exp t ( n 1 ) 1 / 2 2 k + 2 k k + 1 M C 1 s m 2 exp η s m log ( m ) / m ( n 1 ) 1 / 2 2 k + 2 k k + 1 M C 1 s m 2 exp η log ( m ) / m ( n 1 ) 1 / 2 2 k + 2 k k + 1 M C 1 s m .
By the fact that m n and log ( m ) τ log ( n ) , it follows that there exists η > 0 in such a way that
P p = 2 k k p u n ( p ) π p , k ( G φ , x ) > ε 0 s m log ( m ) m n τ / 2 C 5 ,
where C 5 = C 2 k + 2 k k + 1 M C 1 s m with C > 0 . Therefore, for each ε 0 > 0 , 1 i k and = ( 1 , , k ) :
P sup x S X k p = 2 k k p u n ( p ) π p , k ( G φ , x ) > ε 0 s m log ( m ) m d n max 1 i d n P p = 2 k k p u n ( p ) π p , k ( G φ , x ) > ε 0 s m log ( m ) m n k ( τ / 2 C 5 ) .
Consequently, we have
n 1 P sup x S X k p = 2 k k p u n ( p ) π p , k ( G φ , x ) > ε 0 s m log ( m ) m n 1 n α τ / 2 C 5 0 a s n 0 .
Hence, the proof is achieved. □
Proof of Theorem 2. 
We will need to truncate the conditional U-statistic to prove this theorem. Taking the condition ( C . 7 ) into account, for each λ > 0 and
ξ n : = ξ m n = m n log m n = : m log m ,
we have
G φ , x ( x , y ) = G φ , x ( x , y ) 𝟙 { φ ( y ) λ ξ n 1 / q } + G φ , x ( x , y ) 𝟙 { φ ( y ) > λ ξ n 1 / q } = : G φ , x ( T ) ( x , y ) + G φ , x ( R ) ( x , y ) ,
which means that each function φ ( · ) is truncated as follows:
φ ( y ) = φ ( y ) 𝟙 φ ( y ) λ ξ n 1 / q + φ ( y ) 𝟙 φ ( y ) > λ ξ n 1 / q = φ ( T ) ( y ) + φ ( R ) ( y ) .
Notice that the G φ , x ( T ) ( x , y ) denotes the truncated part and G φ , x ( R ) ( x , y ) refers to the reminder part. It is possible to write the U-statistic in the following way
u n ( φ , x , m n ) = u n ( k ) G φ , x ( T ) + u n ( k ) G φ , x ( R ) = : u n ( T ) ( φ , x , m n ) + u n ( R ) ( φ , x , m n ) .
The first term of the right side u n ( T ) ( φ , x , m n ) , as usual, is called the truncated part and the second one u n ( R ) ( φ , x , m n ) is the remainder part. Let us investigate the term u n ( T ) ( φ , x , m n ) .

7.1. Truncated Part

In a similar way as in the preceding proof, we infer
u n ( T ) ( φ , x , m n ) E u n ( T ) ( φ , x , m n ) = ( n k ) ! n ! i I ( k , n ) φ ( T ) ( Y i 1 , , Y i k ) j = 1 k δ m n ( x j , X i j ) E φ ( T ) ( Y i 1 , , Y i k ) j = 1 k δ m n ( x j , X i j ) = ( n k ) ! n ! i I ( k , n ) G φ , x ( T ) ( X i , Y i ) E G φ , x ( T ) ( X i , Y i ) = ( n k ) ! n ! i I ( k , n ) H ( T ) ( X i , Y i ) ,
where
H ( T ) ( X , Y ) = G φ , x ( T ) ( X , Y ) E G φ , x ( T ) ( X , Y ) .
Similar to the proof of Theorem 1, we apply Lemma A1 on the function H ( T ) ( · , · ) . Throughout the rest of the proof, we suppose that the function G φ , x ( T ) is symmetric. Moreover, it is clear that the function H ( T ) ( · , · ) is bounded by 2 λ ξ n 1 / q C 1 s m by condition (C.2). We obviously remark that,
θ = E [ H ( T ) ( X , Y ) ] = 0
by design, and
σ 2 = Var ( H ( T ) ( X , Y ) ) 2 ( λ ξ n 1 / q C 1 s m ) 2 .
For any η > 0 and m large enough, we obtain that
P u n ( T ) ( φ , x , m n ) E u n ( T ) ( φ , x , m n ) > η s m log ( m ) m 2 exp n ( ( s m log ( m ) ) / m ) η 2 4 ( λ ξ n 1 / q C 1 s m ) 2 + 4 3 λ ξ n 1 / q C 1 s m η ( s m log ( m ) ) / m .
We can write
P sup x S X k u n ( T ) ( φ , x , m n ) E u n ( T ) ( φ , x , m n ) > 2 η s m log ( m ) m P sup x S X k u n ( T ) ( φ , x ; m n ) u n ( T ) ( φ , x ( x ) ; m n ) + E [ u n ( T ) ( φ , x ( x ) ; m n ) ] E [ u n ( T ) ( φ , x ; m n ) ] > η s m log ( m ) m + P sup x S X k u n ( T ) ( φ , x ( x ) ; m n ) E [ u n ( T ) ( φ , x ( x ) ; m n ) ] > η s m log ( m ) m .
Notice that
u n ( T ) ( φ , x ; m n ) u n ( T ) ( φ , x ( x ) ; m n ) ( n k ) ! n ! i I ( k , n ) φ ( T ) ( Y i 1 , , Y i k ) j = 1 k δ m n ( x j , X i j ) j = 1 k δ m n ( x ( x j ) , X i j ) ( n k ) ! n ! i I ( k , n ) φ ( T ) ( Y i 1 , , Y i k ) δ m n ( x , X i ) δ m n ( x ( x ) , X i ) ( n k ) ! n ! i I ( k , n ) C 2 s m β 2 d ( x , x ( x ) ) β 1 φ ( T ) ( Y i 1 , , Y i k ) ( n k ) ! n ! i I ( k , n ) C 2 s m β 2 ϵ n β 1 φ ( T ) ( Y i 1 , , Y i k ) ( n k ) ! n ! i I ( k , n ) 1 n j = 1 n C 2 s m β 2 ϵ n β 1 φ ( T ) ( Y i 1 , , Y i k ) ( n k ) ! n ! i I ( k , n ) 1 n j = 1 n W j , T ,
where for 1 j n ,
W j , T : = C 2 s m β 2 ϵ n β 1 φ ( T ) ( Y 1 , , Y k ) ,
and we can write
E W j , T = C 2 s m β 2 ϵ n β 1 E φ ( T ) ( Y 1 , , Y k ) = C 2 s m β 2 ϵ n β 1 E E φ ( T ) ( Y 1 , , Y k ) X = x ,
which means that for 2 ν q :
sup x S X k E W j , T ν = sup x S X k C 2 s m β 2 ϵ n β 1 ν E E φ ( T ) ( Y i 1 , , Y i k ) X = x ν
C 2 s m β 2 ϵ n β 1 ν ( λ ξ n ) ν / q μ q ν / q
C 2 ν s m log m m ν / 2 ( λ ξ n ) ν / q μ q ν / q C 2 ν λ ν / q ( ξ n ) ν / q ν / 2 ( s m ) 1 ν / 2 μ q ν / q ( s m ) ν 1 C s m ( ν 1 ) .
The passage from (40) to (41) is possible by the use of the Jensen’s inequality for the concave function z a , for 0 < a 1 , while (42) is by condition (C.5). Then, for ν 2
sup x S X k E W j , T ν C s m ( ν 1 ) ,
where C > 0 . Then, an application of classical inequality (see Corollary A.8-(ii) [48]) with Z i : = W j , T and a n 2 = s m , which gives us
u m = a n 2 ln ( m ) m 1 = s m ln ( m ) m 1 ,
and it is clear that u m 0 with m by condition (C.2). Consequently, we obtain uniformly on x S X k :
sup x S X k u n ( T ) ( φ , x ; m n ) u n ( T ) ( φ , x ( x ) ; m n ) = O a . c o s m log ( m ) m .
Now, we obtain from (43) that:
E [ u n ( T ) ( φ , x ( x ) ; m n ) ] E [ u n ( T ) ( φ , x ; m n ) ]
= E u n ( T ) ( φ , x ( x ) ; m n ) u n ( T ) ( φ , x ; m n )
E u n ( T ) ( φ , x ( x ) ; m n ) u n ( T ) ( φ , x ; m n ) .
Similar to the bounded case, the transition from (44) to (45) is due to Jensen’s inequality and some properties of the absolute value function. Furthermore, using the fact that for any constant a, E [ a ] = a , we can directly conclude that
sup x S X k E [ u n ( φ , x ( x ) ; m n ) ] E [ u n ( φ , x ; m n ) ] = O a . c o s m log ( m ) m .
For sufficiently large n and large m, we infer that, for some η > 0 ,
P sup x S X k u n ( φ , x ; m n ) u n ( φ , x ( x ) ; m n ) + E [ u n ( φ , x ( x ) ; m n ) ] E [ u n ( φ , x ; m n ) ] > η s m log ( m ) m = 0 .
Continue, now, with (39), by imposing that the kernel function G φ , x ( T ) ( · ) is symmetric, the U-statistic is decomposed according to [8] decomposition, that is
u n ( T ) ( φ , x ; m n ) E [ u n ( T ) ( φ , x ; m n ) ] = p = 1 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x , m n ( T ) ) = k u n ( 1 ) π 1 , k ( G φ , x ( T ) ) + p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ( T ) ) .
Let us first start with the linear term. We have
k u n ( 1 ) π 1 , k ( G φ , x ( T ) ) = k n j = 1 n π 1 , k ( G φ , x ( T ) ) ( X i , Y i ) .
From Hoeffding’s projection, we have
π 1 , k ( G φ , x ( T ) ) ( x , y ) = E G φ , x ( T ) ( x , X 2 , , X k ) , ( y , Y 2 , , Y k ) E [ G φ , x ( T ) X , Y ] = E [ G φ , x ( T ) X , Y | ( X 1 , Y 1 ) = ( x , y ) ] E [ G φ , x ( T ) X , Y ] .
Set
Z i ( T ) = π 1 , k ( G φ , x ( T ) ) ( X i , Y i ) .
We can clearly observe that Z i ( T ) are independent and identically distributed random variables bounded by 2 k λ ξ n 1 / q C 1 s m with zero mean and
σ 2 ( λ ξ n 1 / q C 1 s m ) 2 .
An application of Bernstein’s inequality yields
P sup x S X k u n ( 1 ) π 1 , k ( G φ , x ( T ) ) > η s m log ( m ) m i = 1 d n P max 1 i d n u n ( 1 ) π 1 , k ( G φ , x ( T ) ) > η s m log ( m ) m 2 d n exp n ( ( s m log ( m ) ) / m ) η 2 4 ( λ ξ n 1 / q C 1 s m ) 2 + 4 3 λ ξ n 1 / q C 1 s m η ( s m log ( m ) ) / m n α τ η 2 / C 4 ,
for some positive constant C 4 , resulting from the fact m n and log ( m ) τ log ( n ) . This implies that
n 1 P sup x S X k u n ( 1 ) π 1 , k ( G φ , x ( T ) ) > η s m log ( m ) m n 1 n α τ η 2 / C 4 < .
Consequently, we obtain:
sup x S X k u n ( 1 ) π 1 , k ( G φ , x ( T ) ) = O a . c o s m log ( m ) m .
Moving to the nonlinear term, we want to prove that for 2 p k :
sup x S X k k p m u n ( p ) π p , k G φ , x ( x ) ( T ) s m log ( m ) = O a . c o ( 1 ) ,
which implies that, for 1 i k and = ( 1 , , k ) :
max 1 i d n k p m u n ( p ) π p , k G φ , x ( x ) ( T ) s m log ( m ) = O a . c o ( 1 ) .
To prove the above-mentioned equation, we need to apply Proposition 1 of [125] (see Lemma A2). We can observe that G φ , x ( T ) is bounded by C 1 s m λ ξ n 1 / q , hence for η > 0 we have
P n 1 / 2 p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ( T ) ) > η s m log ( m ) m = P p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ( T ) ) > n 1 / 2 η s m log ( m ) m = P p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ( T ) ) > ε 0 s m log ( m ) m ,
where ε 0 = η n . Now for t = η s m log ( m ) m , Lemma A2 gives us:
P p = 2 k k ! ( k p ) ! u n ( p ) π p , k ( G φ , x ( T ) ) > ε 0 s m log ( m ) m 2 exp t ( n 1 ) 1 / 2 2 k + 2 k k + 1 λ ξ n 1 / q C 1 s m 2 exp η s m log ( m ) / m ( n 1 ) 1 / 2 2 k + 2 k k + 1 λ ξ n 1 / q C 1 s m 2 exp η log ( m ) / m ( n 1 ) 1 / 2 2 k + 2 k k + 1 λ ξ n 1 / q C 1 s m .
By the fact that m n and log ( m ) τ log ( n ) , it follows that there exists η > 0 in such a way that
P p = 2 k k p u n ( p ) π p , k ( G φ , x ( T ) ) > ε 0 s m log ( m ) m n τ / 2 C 6 ,
where
C 6 = C 2 k + 2 k k + 1 λ ξ n 1 / q C 1 s m ,
with C > 0 . Therefore, for each ε 0 > 0 , 1 i k and = ( 1 , , k ) :
P sup x S X k p = 2 k k p u n ( p ) π p , k ( G φ , x ( T ) ) > ε 0 s m log ( m ) m d n max 1 i d n P p = 2 k k p u n ( p ) π p , k ( G φ , x ( T ) ) > ε 0 s m log ( m ) m n k ( τ / 2 C 6 ) .
Consequently, we have
n 1 P sup x S X k p = 2 k k p u n ( p ) π p , k ( G φ , x ( T ) ) > ε 0 s m log ( m ) m n 1 n α τ / 2 C 6 0 a s n 0 .

7.2. Remainder Part

We now consider the remainder part, which is the U-process u n ( R ) ( φ , x , m n ) related on the unbounded kernel given by:
G φ , x ( R ) ( x , y ) = G φ , x ( x , y ) 𝟙 { φ ( y ) > λ ξ n 1 / q }
We have establish that the process is negligible, meaning that
sup x S X k m u n ( k ) ( G φ , x ( R ) ) E u n ( k ) G φ , x ( R ) s m log ( m ) = o a . c o ( 1 ) .
Observe that for x , y X k ,
G φ , x = φ ( y ) δ m ( x , y ) C 1 s m φ ( y ) = : F ˜ ( y ) .
Taking into account that F ˜ is symmetric, we have:
u n ( k ) G φ , x ( R ) u n ( k ) F ˜ 𝟙 { F ˜ > λ ξ n 1 / q } ,
where u n ( k ) F ˜ ( y ) 𝟙 { φ ( y ) > λ ξ n 1 / q } is a U-statistic based on the U-kernel F ˜ 𝟙 { φ > λ ξ n 1 / q } :
sup x S X k m u n ( k ) ( G φ , x ( R ) ) s m log ( m ) ( s m 1 ξ n ) 1 / 2 u n ( k ) F ˜ 𝟙 { F ˜ > λ ξ n 1 / q }
C 7 ξ n u n ( k ) F ˜ 𝟙 { F ˜ > λ ξ n 1 / q } ,
and
sup x S X k m E u n ( k ) G φ , x ( R ) s m log ( m ) C 7 ξ n E u n ( k ) F ˜ 𝟙 { φ ( Y ) > λ ξ n 1 / q } C 7 E F ˜ 1 + q 𝟙 { φ ( Y ) > λ ξ n 1 / q } .
Therefore, as m when n , we have
sup x S X k m E u n ( k ) G φ , x ( R ) s m log ( m ) = o ( 1 ) .
Hence to achieve the proof, it remains to establish that:
u n ( k ) F ˜ 𝟙 { φ ( y ) > λ ξ n 1 / q } = o a . c o s m 1 ξ n 1 / 2 .
An application of the Chebyshev’s inequality, for any η > 0 , gives
P u n ( k ) F ˜ 𝟙 { φ ( Y ) > λ ξ n 1 / q } E u n ( k ) F ˜ 𝟙 { φ ( Y ) > λ ξ n 1 / q } η ( s m 1 ξ n ) 1 / 2 η 2 ( s m 1 ξ n ) V a r u n ( k ) F ˜ 𝟙 { φ ( Y ) > λ ξ n 1 / q } k η 2 ξ n E F ˜ 2 𝟙 { φ ( Y ) > λ ξ n 1 / q } k n 2 η 2 ( ξ n ) q E F ˜ 2 𝟙 { φ ( Y ) > λ ξ n 1 / q } η E F ˜ 3 𝟙 { φ ( Y ) > λ ξ n 1 / q } 1 n 2 ,
so by using the fact that
η E F ˜ 3 𝟙 { φ ( y ) > λ ξ n 1 / q } n 1 1 n 2 < ,
we deduce that
n 1 P u n ( k ) F ˜ 𝟙 { φ ( y ) > λ ξ n 1 / q } E u n ( k ) F ˜ 𝟙 { φ ( y ) > λ ξ n 1 / q } η ( m ξ n ) 1 / 2 < .
Finally, note that (48) implies
E u n ( k ) F ˜ 𝟙 { φ ( y ) > λ ξ n 1 / q } = o s m 1 ξ n 1 / 2 .
The preceding results of the arbitrary choice of λ > 0 gives that (51) holds, which, by combining with (50) and (48), completes the proof of (47). We finally obtain
sup x S X k u n ( φ , x , m n ) E u n ( φ , x , m n ) = O a . c o s m log ( m ) m .
Hence, the proof is complete. □
Proof of Theorem 3. 
The conclusion of Theorem 3 can be obtained from the results of Theorems 1 and 2. We have
r ^ n ( k ) ( φ , x ; m n ) E ^ r ^ n ( k ) ( φ , x ; m n ) = u n ( φ , x ; m n ) u n ( 1 , x ; m n ) E u n ( φ , x ; m n ) E u n ( 1 , x ; m n ) u n ( φ , x ; m n ) E u n ( φ , x ; m n ) u n ( 1 , x ; m n ) + E u n ( φ , x ; m n ) · u n ( 1 , x ; m n ) E u n ( 1 , x ; m n ) u n ( 1 , x ; m n ) · E u n ( 1 , x ; m n ) = : I 1 + I 2 .
Notice that, given the imposed hypothesis and previously obtained results, and for some c 1 , c 2 , we obtain:
sup x S X k u n ( 1 , x , m n ) = c 1 a . c o , sup x S X k E u n ( 1 , x , m n ) = c 2 , sup x S X m E u n ( φ , x , m n ) = O ( 1 ) .
Hence now, depending on whether the function φ ( · ) is bounded or unbounded, we can apply Theorem 1 or Theorem 2 (respectively) to handle both I 1 and I 2 , and get for some c > 0 with probability 1:
sup x S X k m r ^ n ( k ) ( φ , x ; m n ) E ^ r ^ n ( k ) ( φ , x ; m n ) s m log ( m ) sup x S X k m I 1 s m log ( m ) + sup x S X k m I 2 s m log ( m ) c .
Hence, the proof is complete. □
Proof of Theorem 4. 
Let γ > 0 and x S X k . We have
E ^ r ^ n ( k ) ( φ , x ; m n ) r ( k ) φ , x = E u n ( φ , x , m n ) E u n ( 1 , x , m n ) r ( k ) φ , x .
Notice that
E ^ r ^ n ( k ) ( φ , x ; m n ) = 1 E δ m ( x , X ) E φ ( Y 1 , , Y k ) j = 1 k δ m n ( x j , X j ) = 1 E δ m ( x , X ) X k r ( k ) ( φ , t ) δ m ( x , t ) f ˜ ( t ) μ ( d t ) ,
where for t = ( t 1 , , t k ) we denote μ ( dt ) : = μ ( d t 1 ) μ ( d t k ) , and
f ˜ ( t ) : = j = 1 k f ( t j ) .
Which means that
E ^ r ^ n ( k ) ( φ , x ; m n ) r ( k ) φ , x = 1 E δ m ( x , X ) X k r ( k ) ( φ , t ) δ m ( x , t ) f ˜ ( t ) μ ( d t ) r ( k ) ( φ , x ) X k δ m ( x , t ) f ˜ ( t ) μ ( d t ) = 1 E δ m ( x , X ) X k r ( k ) ( φ , t ) r ( k ) ( φ , x ) δ m ( x , t ) f ˜ ( t ) μ ( d t ) : = I 1 ( x ) + I 2 ( x ) ,
where
I 1 ( x ) : = 1 E δ m ( x , X ) B ( x , γ ) r ( k ) ( φ , t ) r ( k ) ( φ , x ) δ m ( x , t ) f ˜ ( t ) μ ( dt ) ,
and
I 2 ( x ) : = 1 E δ m ( x , X ) B ¯ ( x , γ ) r ( k ) ( φ , t ) r ( k ) ( φ , x ) δ m ( x , t ) f ˜ ( t ) μ ( dt ) .
Therefore, we need to study the asymptotic behavior of both sup x S X k I 1 ( x ) and sup x S X k I 2 ( x ) to obtain the desired result.
Let us start with the term sup x S X k I 1 ( x ) , we have
sup x S X k I 1 ( x ) = sup x S X k 1 E δ m ( x , X ) B ( x , γ ) r ( k ) ( φ , t ) r ( k ) ( φ , x ) δ m ( x , t ) f ˜ ( t ) μ ( d t ) 1 E δ m ( x , X ) sup x S X k B ( x , γ ) r ( k ) ( φ , t ) r ( k ) ( φ , x ) δ m ( x , t ) f ˜ ( t ) μ ( d t ) ,
taking into account the fact that the density function f ( · ) is bounded, and by condition (C.9), we get:
sup x S X k I 1 ( x ) C f E δ m ( x , X ) sup x S X k B ( x , γ ) C 3 d ( x , t ) δ m ( x , t ) μ ( d t ) C f C 3 γ E δ m ( x , X ) sup x S X k B ( x , γ ) δ m ( x , t ) μ ( d t ) ,
hence, the term in (54) can be chosen smaller than 2 ε as m by using the condition (8).
To investigate the term sup x S X k I 2 ( x ) , notice that
sup x S X k I 2 ( x ) = sup x S X k 1 E δ m ( x , X ) B ¯ ( x , γ ) r ( k ) ( φ , t ) r ( k ) ( φ , x ) δ m ( x , t ) f ˜ ( t ) μ ( d t ) C f C 3 E δ m ( x , X ) sup x S X k B ¯ ( x , γ ) d ( x , t ) δ m ( x , t ) μ ( d t ) .
By condition (11), we conclude that
sup x S X k I 2 ( x ) 0 a s m .
This concludes the proof of the Theorem. □
Proof of Theorem 5. 
Following the same steps as the proof of Theorem 4, we can write directly:
E ^ r ^ n ( k ) ( φ , x ; m n ) r ( k ) φ , x = 1 E δ m ( x , X ) X k r ( k ) ( φ , t ) r ( k ) ( φ , x ) δ m ( x , t ) f ˜ ( t ) μ ( d t ) .
Taking into account conditions (C.8), we can easily deduce that
sup x S X k 1 E δ m ( x , X ) X k r ( k ) ( φ , t ) r ( k ) ( φ , x ) δ m ( x , t ) f ˜ ( t ) μ ( d t ) sup x S X k I 1 ( x ) + I 2 ( x ) ,
where I 1 ( x ) and I 2 ( x ) are defined in (52) and (53), respectively. Presently, Equation (54) gives us
sup x S X k I 1 ( x ) C f E δ m ( x , X ) sup x S X k B ( x , γ ) C 3 d ( x , t ) δ m ( x , t ) μ ( d t ) C f C 3 D m E δ m ( x , X ) sup x S X k B ( x , γ ) δ m ( x , t ) μ ( d t ) O ( D m ) ,
by conditions (C.8) and (C.1). On the other hand, Equation (55) gives us:
sup x S X k I 2 ( x ) C f C 3 E δ m ( x , X ) sup x S X k B ¯ ( x , γ ) d ( x , t ) δ m ( x , t ) μ ( d t ) C f C 3 D m E δ m ( x , X ) sup x S X k B ¯ ( x , γ ) δ m ( x , t ) μ ( d t ) O ( D m ) ,
by condition (C.8). This completes the proof of the theorem. □

Author Contributions

S.B., A.N. and T.Z., conceptualization, methodology, investigation, writing—original draft, writing—review and editing. All authors contributed equally to the writing of this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the three referees for a carefully and thoroughly compiled list of places where the presentation could be improved. The paper has benefited from those points.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Lemma A1 
(Theorem A. page 201 [126]). Let f denote a symmetric X k -valued function fullfilling f c ,
E f X 1 , , X k = θ ,
and
σ 2 = V a r f X 1 , , X k ,
then for t > 0 and n k , we infer:
P | u n ( k ) ( f ) θ | t exp [ n / k ] t 2 2 σ 2 2 3 c t .
Lemma A2 
(Proposition.1 [125]). If G : S k R is a measurable symmetric function with G = b then
P n 1 / 2 j = 2 k k j u n ( j ) π j , k G t 2 exp t ( n 1 ) 1 / 2 2 k + 2 k k + 1 b .
Definition A1. 
A symmetric and P m -integrable kernel f : X k R is P -degenerate of order r 1 , notationally f L 2 r P k , if and only if
f x 1 , , x k d P k r + 1 x r , , x m = f d P k
holds for any x 1 , , x r 1 X , and
x 1 , , x r f x 1 , , x k d P k r x r + 1 , , x m
is not a constant function.

References

  1. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Monographs on Statistics and Applied Probability; Chapman & Hall: London, UK, 1986; p. x+175. [Google Scholar] [CrossRef]
  2. Nadaraya, E.A. Nonparametric Estimation of Probability Densities and Regression Curves; Mathematics and Its Applications (Soviet Series); Kluwer Academic Publishers Group: Dordrecht, The Netherland, 1989; Volume 20, p. x+213, Translated from the Russian by Samuel Kotz. [Google Scholar] [CrossRef]
  3. Härdle, W. Applied Nonparametric Regression; Econometric Society Monographs; Cambridge University Press: Cambridge, UK, 1990; Volume 19, p. xvi+333. [Google Scholar] [CrossRef]
  4. Wand, M.P.; Jones, M.C. Kernel Smoothing; Monographs on Statistics and Applied Probability; Chapman and Hall, Ltd.: London, UK, 1995; Volume 60, p. xii+212. [Google Scholar] [CrossRef]
  5. Eggermont, P.P.B.; LaRiccia, V.N. Maximum Penalized Likelihood Estimation. Density Estimation. Vol. I; Springer Series in Statistics; Springer: New York, NY, USA, 2001; p. xviii+510. [Google Scholar]
  6. Devroye, L.; Lugosi, G. Combinatorial Methods in Density Estimation; Springer Series in Statistics; Springer: New York, NY, USA, 2001; p. xii+208. [Google Scholar] [CrossRef]
  7. Jiang, J. Nonparametric Statistics. In Large Sample Techniques for Statistics; Springer International Publishing: Cham, Switzerland, 2022; pp. 379–415. [Google Scholar] [CrossRef]
  8. Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 1948, 19, 293–325. [Google Scholar] [CrossRef]
  9. Stute, W. Almost sure representations of the product-limit estimator for truncated data. Ann. Statist. 1993, 21, 146–156. [Google Scholar] [CrossRef]
  10. Arcones, M.A.; Wang, Y. Some new tests for normality based on U-processes. Statist. Probab. Lett. 2006, 76, 69–82. [Google Scholar] [CrossRef]
  11. Giné, E.; Mason, D.M. Laws of the iterated logarithm for the local U-statistic process. J. Theoret. Probab. 2007, 20, 457–485. [Google Scholar] [CrossRef]
  12. Giné, E.; Mason, D.M. On local U-statistic processes and the estimation of densities of functions of several sample variables. Ann. Statist. 2007, 35, 1105–1145. [Google Scholar] [CrossRef] [Green Version]
  13. Schick, A.; Wang, Y.; Wefelmeyer, W. Tests for normality based on density estimators of convolutions. Statist. Probab. Lett. 2011, 81, 337–343. [Google Scholar] [CrossRef]
  14. Joly, E.; Lugosi, G. Robust estimation of U-statistics. Stoch. Process. Appl. 2016, 126, 3760–3773. [Google Scholar] [CrossRef]
  15. Lee, S.; Linton, O.; Whang, Y.J. Testing for stochastic monotonicity. Econometrica 2009, 77, 585–602. [Google Scholar] [CrossRef] [Green Version]
  16. Ghosal, S.; Sen, A.; van der Vaart, A.W. Testing monotonicity of regression. Ann. Statist. 2000, 28, 1054–1082. [Google Scholar] [CrossRef]
  17. Arcones, M.A.; Giné, E. Limit theorems for U-processes. Ann. Probab. 1993, 21, 1494–1542. [Google Scholar] [CrossRef]
  18. Sherman, R.P. Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann. Statist. 1994, 22, 439–459. [Google Scholar] [CrossRef]
  19. de la Peña, V.H.; Giné, E. Decoupling. From Dependence to Independence, Randomly Stopped Processes. U-Statistics and Processes. Martingales and beyond; Probability and its Applications (New York); Springer: New York, NY, USA, 1999; p. xvi+392. [Google Scholar] [CrossRef]
  20. Boniece, B.C.; Horváth, L.; Jacobs, P. Change point detection in high dimensional data with U-statistics. arXiv 2022, arXiv:2207.08933. [Google Scholar] [CrossRef]
  21. Kulathinal, S.; Dewan, I. Weighted U-statistics for likelihood-ratio ordering of bivariate data. Statist. Papers 2022. [Google Scholar] [CrossRef]
  22. Minsker, S. U-statistics of growing order and sub-Gaussian mean estimators with sharp constants. arXiv 2022, arXiv:2202.11842. [Google Scholar] [CrossRef]
  23. Halmos, P.R. The theory of unbiased estimation. Ann. Math. Stat. 1946, 17, 34–43. [Google Scholar] [CrossRef]
  24. von Mises, R. On the asymptotic distribution of differentiable statistical functions. Ann. Math. Stat. 1947, 18, 309–348. [Google Scholar] [CrossRef]
  25. Borovkova, S.; Burton, R.; Dehling, H. Limit theorems for functionals of mixing processes with applications to U-statistics and dimension estimation. Trans. Amer. Math. Soc. 2001, 353, 4261–4318. [Google Scholar] [CrossRef]
  26. Denker, M.; Keller, G. On U-statistics and v. Mises’ statistics for weakly dependent processes. Z. Wahrsch. Verw. Geb. 1983, 64, 505–522. [Google Scholar] [CrossRef]
  27. Leucht, A. Degenerate U- and V-statistics under weak dependence: Asymptotic theory and bootstrap consistency. Bernoulli 2012, 18, 552–585. [Google Scholar] [CrossRef]
  28. Leucht, A.; Neumann, M.H. Degenerate U- and V-statistics under ergodicity: Asymptotics, bootstrap and applications in statistics. Ann. Inst. Statist. Math. 2013, 65, 349–386. [Google Scholar] [CrossRef] [Green Version]
  29. Bouzebda, S.; Soukarieh, I. Renewal type bootstrap for U-process Markov chains. Markov Process. Relat. Fields 2023, 1–52. [Google Scholar]
  30. Soukarieh, I.; Bouzebda, S. Exchangeably Weighted Bootstraps of General Markov U-Process. Mathematics 2022, 10, 3745. [Google Scholar] [CrossRef]
  31. Soukarieh, I.; Bouzebda, S. Renewal type bootstrap for increasing degree U-Process A Markov Chain. J. Multivar. Anal. 2023, 195, 105143. [Google Scholar] [CrossRef]
  32. Lee, A.J. Vol. 110, Statistics: Textbooks and Monographs. In U-statistics. Theory and Practice; Marcel Dekker Inc.: New York, NY, USA, 1990; p. xii+302. [Google Scholar]
  33. Stute, W. Conditional U-statistics. Ann. Probab. 1991, 19, 812–825. [Google Scholar] [CrossRef]
  34. Nadaraja, E.A. On a regression estimate. Teor. Verojatnost. Primenen. 1964, 9, 157–159. [Google Scholar]
  35. Watson, G.S. Smooth regression analysis. Sankhyā Indian J. Stat. Ser. 1964, 26, 359–372. [Google Scholar]
  36. Sen, A. Uniform strong consistency rates for conditional U-statistics. Sankhyā Indian J. Stat. Ser. 1994, 56, 179–194. [Google Scholar]
  37. Prakasa Rao, B.L.S.; Sen, A. Limit distributions of conditional U-statistics. J. Theoret. Probab. 1995, 8, 261–301. [Google Scholar] [CrossRef]
  38. Harel, M.; Puri, M.L. Conditional U-statistics for dependent random variables. J. Multivar. Anal. 1996, 57, 84–100. [Google Scholar] [CrossRef] [Green Version]
  39. Stute, W. Symmetrized NN-conditional U-statistics. In Research Developments in Probability and Statistics; VSP: Utrecht, The Netherlands, 1996; pp. 231–237. [Google Scholar]
  40. Dony, J.; Mason, D.M. Uniform in bandwidth consistency of conditional U-statistics. Bernoulli 2008, 14, 1108–1133. [Google Scholar] [CrossRef]
  41. Bouzebda, S.; Nemouchi, B. Central limit theorems for conditional empirical and conditional U-processes of stationary mixing sequences. Math. Methods Statist. 2019, 28, 169–207. [Google Scholar] [CrossRef]
  42. Bouzebda, S.; Nemouchi, B. Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametr. Stat. 2020, 32, 452–509. [Google Scholar] [CrossRef]
  43. Bouzebda, S.; Elhattab, I.; Nemouchi, B. On the uniform-in-bandwidth consistency of the general conditional U-statistics based on the copula representation. J. Nonparametr. Stat. 2021, 33, 321–358. [Google Scholar] [CrossRef]
  44. Bouzebda, S.; El-hadjali, T.; Ferfache, A.A. Uniform in bandwidth consistency of conditional U-statistics adaptive to intrinsic dimension in presence of censored data. Sankhya A 2022, 1–59. [Google Scholar] [CrossRef]
  45. Bouzebda, S.; Nezzal, A. Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data. Jpn. J. Stat. Data Sci. 2022, 5, 431–533. [Google Scholar] [CrossRef]
  46. Aneiros, G.; Cao, R.; Fraiman, R.; Genest, C.; Vieu, P. Recent advances in functional data analysis and high-dimensional statistics. J. Multivar. Anal. 2019, 170, 3–9. [Google Scholar] [CrossRef]
  47. Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis; Springer Series in Statistics; Springer: New York, NY, USA, 2002; p. x+190, Methods and case studies. [Google Scholar] [CrossRef]
  48. Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis. Theory and Practice; Springer Series in Statistics; Springer: New York, NY, USA, 2006; p. xx+258. [Google Scholar]
  49. Araujo, A.; Giné, E. The Central Limit Theorem for Real and Banach Valued Random Variables; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons: New York, NY, USA; Chichester, UK; Brisbane, Australia, 1980; p. xiv+233. [Google Scholar]
  50. Gasser, T.; Hall, P.; Presnell, B. Nonparametric estimation of the mode of a distribution of random curves. J. R. Stat. Soc. Ser. B Stat. Methodol. 1998, 60, 681–691. [Google Scholar] [CrossRef]
  51. Bosq, D. Linear Processes in Function Spaces. Theory and Applications; Lecture Notes in Statistics; Springer: New York, NY, USA, 2000; Volume 149, p. xiv+283. [Google Scholar] [CrossRef]
  52. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Series in Statistics; Springer: New York, NY, USA, 2012; p. xiv+422. [Google Scholar] [CrossRef]
  53. Ling, N.; Vieu, P. Nonparametric modelling for functional data: Selected survey and tracks for future. Statistics 2018, 52, 934–949. [Google Scholar] [CrossRef]
  54. Ferraty, F.; Laksaci, A.; Tadj, A.; Vieu, P. Rate of uniform consistency for nonparametric estimates with functional variables. J. Statist. Plann. Inference 2010, 140, 335–352. [Google Scholar] [CrossRef]
  55. Kara-Zaitri, L.; Laksaci, A.; Rachdi, M.; Vieu, P. Uniform in bandwidth consistency for various kernel estimators involving functional data. J. Nonparametr. Stat. 2017, 29, 85–107. [Google Scholar] [CrossRef]
  56. Bouzebda, S.; Chaouch, M. Uniform limit theorems for a class of conditional Z-estimators when covariates are functions. J. Multivar. Anal. 2022, 189, 21. [Google Scholar] [CrossRef]
  57. Attouch, M.; Laksaci, A.; Rafaa, F. On the local linear estimate for functional regression: Uniform in bandwidth consistency. Comm. Statist. Theory Methods 2019, 48, 1836–1853. [Google Scholar] [CrossRef]
  58. Ling, N.; Meng, S.; Vieu, P. Uniform consistency rate of kNN regression estimation for functional time series data. J. Nonparametr. Stat. 2019, 31, 451–468. [Google Scholar] [CrossRef]
  59. Bouzebda, S.; Chaouch, M.; Laïb, N. Limiting law results for a class of conditional mode estimates for functional stationary ergodic data. Math. Methods Statist. 2016, 25, 168–195. [Google Scholar] [CrossRef]
  60. Mohammedi, M.; Bouzebda, S.; Laksaci, A. The consistency and asymptotic normality of the kernel type expectile regression estimator for functional data. J. Multivar. Anal. 2021, 181, 24. [Google Scholar] [CrossRef]
  61. Bouzebda, S.; Mohammedi, M.; Laksaci, A. The k-Nearest Neighbors method in single index regression model for functional quasi-associated time series data. Rev. Mat. Complut. 2022, 1–30. [Google Scholar] [CrossRef]
  62. Bouzebda, S.; Nemouchi, B. Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data. Stat. Inference Stoch. Process. 2022, 1–56. [Google Scholar] [CrossRef]
  63. Almanjahie, I.M.; Bouzebda, S.; Kaid, Z.; Laksaci, A. Nonparametric estimation of expectile regression in functional dependent data. J. Nonparametr. Stat. 2022, 34, 250–281. [Google Scholar] [CrossRef]
  64. Almanjahie, I.M.; Bouzebda, S.; Chikr Elmezouar, Z.; Laksaci, A. The functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors. Stat. Risk Model. 2022, 38, 47–63. [Google Scholar] [CrossRef]
  65. Litimein, O.; Laksaci, A.; Mechab, B.; Bouzebda, S. Local linear estimate of the functional expectile regression. Statist. Probab. Lett. 2023, 192, 109682. [Google Scholar] [CrossRef]
  66. Chentsov, N.N. Evaluation of an unknown distribution density from observations. Sov. Math., Dokl. 1962, 3, 1559–1562. [Google Scholar]
  67. Watson, G.S.; Leadbetter, M.R. Hazard analysis. II. Sankhyā Indian J. Stat. Ser. A 1964, 26, 101–116. [Google Scholar]
  68. Winter, B.B. Rate of strong consistency of two nonparametric density estimators. Ann. Statist. 1975, 3, 759–766. [Google Scholar] [CrossRef]
  69. Susarla, V.; Walter, G. Estimation of a multivariate density function using delta sequences. Ann. Statist. 1981, 9, 347–355. [Google Scholar] [CrossRef]
  70. Prakasa Rao, B.L.S. Density estimation for Markov processes using delta-sequences. Ann. Inst. Statist. Math. 1978, 30, 321–328. [Google Scholar] [CrossRef]
  71. Ahmad, I.A. A note on nonparametric density estimation for dependent variables using a delta sequence. Ann. Inst. Statist. Math. 1981, 33, 247–254. [Google Scholar] [CrossRef]
  72. Basu, A.K.; Sahoo, D.K. Speed of convergence in density estimation by delta sequence and estimation of mode. Calcutta Statist. Assoc. Bull. 1985, 34, 1–13. [Google Scholar] [CrossRef]
  73. Nolan, D.; Marron, J.S. Uniform consistency of automatic and location-adaptive delta-sequence estimators. Probab. Theory Relat. Fields 1989, 80, 619–632. [Google Scholar] [CrossRef]
  74. Isogai, E. Nonparametric estimation of a regression function by delta sequences. Ann. Inst. Statist. Math. 1990, 42, 699–708. [Google Scholar] [CrossRef]
  75. Marron, J.S.; Härdle, W. Random approximations to some measures of accuracy in nonparametric curve estimation. J. Multivar. Anal. 1986, 20, 91–113. [Google Scholar] [CrossRef] [Green Version]
  76. Cristóbal Cristóbal, J.A.; Faraldo Roca, P.; González Manteiga, W. A class of linear regression parameter estimators constructed by nonparametric estimation. Ann. Statist. 1987, 15, 603–609. [Google Scholar] [CrossRef]
  77. Prakasa Rao, B.L.S. Nonparametric density estimation for functional data by delta sequences. Braz. J. Probab. Stat. 2010, 24, 468–478. [Google Scholar] [CrossRef]
  78. Ouassou, I.; Rachdi, M. Regression operator estimation by delta-sequences method for functional data and its applications. AStA Adv. Stat. Anal. 2012, 96, 451–465. [Google Scholar] [CrossRef]
  79. Bouzebda, S. On the strong approximation of bootstrapped empirical copula processes with applications. Math. Methods Statist. 2012, 21, 153–188. [Google Scholar] [CrossRef]
  80. Belitser, E. Local minimax pointwise estimation of a multivariate density. Statist. Neerl. 2000, 54, 351–365. [Google Scholar] [CrossRef]
  81. Walter, G.G. A general approach to classification problems. Inform. Sci. 1983, 30, 67–77. [Google Scholar] [CrossRef]
  82. Bouzebda, S.; Nezzal, A. Asymptotic properties of conditional U-statistics using delta sequences. Comm. Statist. Theory Methods 2023, 1–45. [Google Scholar]
  83. Fu, K.A. An application of U-statistics to nonparametric functional data analysis. Comm. Statist. Theory Methods 2012, 41, 1532–1542. [Google Scholar] [CrossRef]
  84. Dabo-Niang, S. Kernel density estimator in an infinite-dimensional space with a rate of convergence in the case of diffusion process. Appl. Math. Lett. 2004, 17, 381–386. [Google Scholar] [CrossRef] [Green Version]
  85. Prakasa Rao, B.L.S. Statistical Inference for Diffusion Type Processes; Kendall’s Library of Statistics; Arnold: London, UK, 1999; Volume 8. [Google Scholar]
  86. Prakasa Rao, B.L.S. Semimartingales and Their Statistical Inference; Monographs on Statistics and Applied Probability; Chapman & Hall/ CRC: Boca Raton, FL, USA, 1999; Volume 83. [Google Scholar]
  87. Rachdi, M.; Monsan, V. Asymptotic properties of p-adic spectral estimates of second order. J. Combin. Inform. Syst. Sci. 1999, 24, 113–142. [Google Scholar]
  88. Prakasa Rao, B.L.S. Nonparametric functional estimation. In Probability and Mathematical Statistics; Academic Press (Harcourt Brace Jovanovich, Publishers: Orlando, FL, USA, 1983; Volume XIV, 522p. [Google Scholar]
  89. Ferraty, F.; Vieu, P. Erratum of:Non-parametric models for functional data, with application in regression, time-series prediction and curve discrimination. J. Nonparametric Stat. 2008, 20, 187–189. [Google Scholar] [CrossRef] [Green Version]
  90. Maouloud, S.M.O. Quelques Aspects Fonctionnels et Non Fonctionnels des Grandes Déviations et des Déviations Modérées en Estimation Non-Paramétrique. Ph.D. Thesis, Université de Reims-Champagne Ardenne, Reims, France, 2007. [Google Scholar]
  91. Walter, G.G. Expansions of distributions. Trans. Am. Math. Soc. 1965, 116, 492–510. [Google Scholar] [CrossRef]
  92. Walter, G.; Blum, J. Probability density estimation using delta sequences. Ann. Stat. 1979, 328–340. [Google Scholar] [CrossRef]
  93. Deheuvels, P. One bootstrap suffices to generate sharp uniform bounds in functional estimation. Kybernetika 2011, 47, 855–865. [Google Scholar]
  94. Bouzebda, S.; Taachouche, N. Rates of the strong uniform consistency for the kernel-type regression function estimators with general kernels on manifolds. Math. Methods Statist. 2022, 1–62. [Google Scholar]
  95. Chokri, K.; Bouzebda, S. Uniform-in-bandwidth consistency results in the partially linear additive model components estimation. Comm. Statist. Theory Methods 2022, 2022, 2153605. [Google Scholar] [CrossRef]
  96. Maillot, B.; Viallon, V. Uniform limit laws of the logarithm for nonparametric estimators of the regression function in presence of censored data. Math. Methods Statist. 2009, 18, 159–184. [Google Scholar] [CrossRef]
  97. Kohler, M.; Máthé, K.; Pintér, M. Prediction from randomly right censored data. J. Multivar. Anal. 2002, 80, 73–100. [Google Scholar] [CrossRef] [Green Version]
  98. Carbonez, A.; Györfi, L.; van der Meulen, E.C. Partitioning-estimates of a regression function under random censoring. Statist. Decis. 1995, 13, 21–37. [Google Scholar] [CrossRef]
  99. Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
  100. Brunel, E.; Comte, F. Adaptive nonparametric regression estimation in presence of right censoring. Math. Methods Statist. 2006, 15, 233–255. [Google Scholar]
  101. Datta, S.; Bandyopadhyay, D.; Satten, G.A. Inverse probability of censoring weighted U-statistics for right-censored data with an application to testing hypotheses. Scand. J. Stat. 2010, 37, 680–700. [Google Scholar] [CrossRef]
  102. Stute, W.; Wang, J.L. Multi-sample U-statistics for censored data. Scand. J. Statist. 1993, 20, 369–374. [Google Scholar]
  103. Chen, Y.; Datta, S. Adjustments of multi-sample U-statistics to right censored data and confounding covariates. Comput. Statist. Data Anal. 2019, 135, 1–14. [Google Scholar] [CrossRef]
  104. Yuan, A.; Giurcanu, M.; Luta, G.; Tan, M.T. U-statistics with conditional kernels for incomplete data models. Ann. Inst. Statist. Math. 2017, 69, 271–302. [Google Scholar] [CrossRef]
  105. Bouzebda, S.; El-hadjali, T. Uniform convergence rate of the kernel regression estimator adaptive to intrinsic dimension in presence of censored data. J. Nonparametr. Stat. 2020, 32, 864–914. [Google Scholar] [CrossRef]
  106. Földes, A.; Rejto, L. A LIL type result for the product limit estimator. Z. Wahrsch. Verw. Geb. 1981, 56, 75–86. [Google Scholar] [CrossRef]
  107. Kendall, M.G. A New Measure of Rank Correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
  108. Stute, W. Universally consistent conditional U-statistics. Ann. Statist. 1994, 22, 460–473. [Google Scholar] [CrossRef]
  109. Stute, W. Lp-convergence of conditional U-statistics. J. Multivar. Anal. 1994, 51, 71–82. [Google Scholar] [CrossRef] [Green Version]
  110. Bellet, A.; Habrard, A.; Sebban, M. A Survey on Metric Learning for Feature Vectors and Structured Data. arXiv 2013, arXiv:1306.6709. [Google Scholar]
  111. Clémençon, S.; Colin, I.; Bellet, A. Scaling-up empirical risk minimization: Optimization of incomplete U-statistics. J. Mach. Learn. Res. 2016, 17, 2682–2717. [Google Scholar]
  112. Jin, R.; Wang, S.; Zhou, Y. Regularized Distance Metric Learning:Theory and Algorithm. In Advances in Neural Information Processing Systems; Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C., Culotta, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2009; Volume 22. [Google Scholar]
  113. Bellet, A.; Habrard, A. Robustness and generalization for metric learning. Neurocomputing 2015, 151, 259–267. [Google Scholar] [CrossRef] [Green Version]
  114. Cao, Q.; Guo, Z.C.; Ying, Y. Generalization bounds for metric and similarity learning. Mach. Learn. 2016, 102, 115–132. [Google Scholar] [CrossRef]
  115. Blum, J.R.; Kiefer, J.; Rosenblatt, M. Distribution free tests of independence based on the sample distribution function. Ann. Math. Statist. 1961, 32, 485–498. [Google Scholar] [CrossRef]
  116. Bouzebda, S. General tests of independence based on empirical processes indexed by functions. Stat. Methodol. 2014, 21, 59–87. [Google Scholar] [CrossRef]
  117. Bouzebda, S. Some new multivariate tests of independence. Math. Methods Statist. 2011, 20, 192–205. [Google Scholar] [CrossRef]
  118. Bouzebda, S.; Zari, T. Asymptotic behavior of weighted multivariate Cramér-von Mises-type statistics under contiguous alternatives. Math. Methods Statist. 2013, 22, 226–252. [Google Scholar] [CrossRef]
  119. Bouzebda, S.; El Faouzi, N.E.; Zari, T. On the multivariate two-sample problem using strong approximations of empirical copula processes. Comm. Statist. Theory Methods 2011, 40, 1490–1509. [Google Scholar] [CrossRef]
  120. Bouzebda, S. General Tests of Conditional Independence Based on Empirical Processes Indexed by Functions. Jpn. J. Stat. Data Sci. 2023, 1–60. [Google Scholar]
  121. Bergsma, W.; Dassios, A. A consistent test of independence based on a sign covariance related to Kendall’s tau. Bernoulli 2014, 20, 1006–1028. [Google Scholar] [CrossRef]
  122. Bouzebda, S.; Soukarieh, I. Non-Parametric Conditional U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics 2023, 11, 16. [Google Scholar] [CrossRef]
  123. Didi, S.; Bouzebda, S. Wavelet Density and Regression Estimators for Continuous Time Functional Stationary and Ergodic Processes. Mathematics 2022, 10, 4356. [Google Scholar] [CrossRef]
  124. Rachdi, M.; Vieu, P. Nonparametric regression for functional data: Automatic smoothing parameter selection. J. Statist. Plann. Inference 2007, 137, 2784–2801. [Google Scholar] [CrossRef]
  125. Arcones, M.A. A Bernstein-type inequality for U-statistics and U-processes. Stat. Probab. Lett. 1995, 22, 239–247. [Google Scholar] [CrossRef]
  126. Serfling, R.J. Approximation Theorems of Mathematical Statistics; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1980; p. xiv+371. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bouzebda, S.; Nezzal, A.; Zari, T. Uniform Consistency for Functional Conditional U-Statistics Using Delta-Sequences. Mathematics 2023, 11, 161. https://doi.org/10.3390/math11010161

AMA Style

Bouzebda S, Nezzal A, Zari T. Uniform Consistency for Functional Conditional U-Statistics Using Delta-Sequences. Mathematics. 2023; 11(1):161. https://doi.org/10.3390/math11010161

Chicago/Turabian Style

Bouzebda, Salim, Amel Nezzal, and Tarek Zari. 2023. "Uniform Consistency for Functional Conditional U-Statistics Using Delta-Sequences" Mathematics 11, no. 1: 161. https://doi.org/10.3390/math11010161

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop