Back To Index Previous Article Next Article Full Text

Statistica Sinica 33 (2023), 2787-2808

MEASURING, TESTING, AND IDENTIFYING
HETEROGENEITY OF LARGE PARALLEL DATASETS

Liuhua Peng, Guanghui Wang and Changliang Zou

The University of Melbourne, East China Normal University
and Nankai University

Abstract: When working with large parallel data sets, it is necessary to check whether they are collected from different regression models before conducting further modeling, estimation, and inference. We propose a novel metric for such heterogeneity based on a projection strategy. We then use this metric to a new fully data-driven test for the equivalence of a large number of unknown regression models. We also construct the asymptotic normality of the proposed test, and apply the test to identify outlying data sets with regression models that deviate from the majority. Extensive numerical studies demonstrate that our methods perform satisfactorily.

Key words and phrases: Heterogeneity, outlier detection, parallel data sets, projections, U-statistics

Back To Index Previous Article Next Article Full Text