高级检索
    梁斌, 李光辉, 代成龙. 面向概念漂移且不平衡数据流的G-mean加权分类方法[J]. 计算机研究与发展, 2022, 59(12): 2844-2857. DOI: 10.7544/issn1000-1239.20210471
    引用本文: 梁斌, 李光辉, 代成龙. 面向概念漂移且不平衡数据流的G-mean加权分类方法[J]. 计算机研究与发展, 2022, 59(12): 2844-2857. DOI: 10.7544/issn1000-1239.20210471
    Liang Bin, Li Guanghui, Dai Chenglong. G-mean Weighted Classification Method for Imbalanced Data Stream with Concept Drift[J]. Journal of Computer Research and Development, 2022, 59(12): 2844-2857. DOI: 10.7544/issn1000-1239.20210471
    Citation: Liang Bin, Li Guanghui, Dai Chenglong. G-mean Weighted Classification Method for Imbalanced Data Stream with Concept Drift[J]. Journal of Computer Research and Development, 2022, 59(12): 2844-2857. DOI: 10.7544/issn1000-1239.20210471

    面向概念漂移且不平衡数据流的G-mean加权分类方法

    G-mean Weighted Classification Method for Imbalanced Data Stream with Concept Drift

    • 摘要: 数据流中的概念漂移和类别不平衡问题会严重影响数据流分类算法的性能和稳定性.针对二分类数据流中概念漂移和类别不平衡的问题,在基于数据块的集成分类方法上引入成员分类器权重的在线更新机制,结合重采样和自适应滑动窗口技术,提出了一种基于G-mean加权的不平衡数据流在线分类方法(online G-mean update ensemble for imbalance learning, OGUEIL).该方法基于集成学习框架,利用时间衰减因子增量计算成员分类器最近若干实例上的G-mean性能,并确定成员分类器权重,每到达一个新实例,在线更新所有成员分类器及其权重,并对少类实例进行随机过采样.同时,OGUEIL会周期性地根据当前数据构造类别平衡数据集训练新的候选分类器,并选择性地添加至集成框架中.在真实和人工数据集上的结果表明,所提方法的综合性能优于其他同类方法.

       

      Abstract: Concept drift and class imbalance in data stream seriously degrade the performance and stability of the traditional data stream classification algorithms. To solve this issue in binary classification of data stream, an online G-mean weighted ensemble classification method for imbalanced data stream with concept drift termed OGUEIL is proposed. It exploits the online update mechanism of component classifiers’ weights to modify block-based ensemble algorithms, combining the hybrid resampling and adaptive sliding window algorithm. OGUEIL is based on the ensemble learning framework that once a new instance reaches, each component classifier in the ensemble and its weight are correspondingly updated online, and the minority class instance is randomly oversampled at the same time. Particularly, each component classifier determines its weight according to the G-mean performance on several recently incoming instances, where G-mean of each component classifier is calculated based on the time decay factor increment. At the same time, OGUEIL periodically constructs a balanced dataset according to the data in the current sliding window and trains a new candidate classifier, then adds it to the ensemble based on specific conditions. The experimental results on both real-world and synthesized datasets show that the comprehensive performance of the proposed method outperforms other baseline algorithms.

       

    /

    返回文章
    返回