引用本文: | 张亦翔,虞佳淼,王慧芳,费正明,罗华峰,宣佳卓.基于类内空间夹角约束和小样本采样的错误标签数据识别方法[J].电力自动化设备,2025,45(4):169-176,185 |
| ZHANG Yixiang,YU Jiamiao,WANG Huifang,FEI Zhengming,LUO Huafeng,XUAN Jiazhuo.Mislabeled data identification method based on intra-class spatial angle constraint and few-sample sampling[J].Electric Power Automation Equipment,2025,45(4):169-176,185 |
|
摘要: |
在电力专业领域的文本分类样本集中,常存在比例难以掌握的错误标签数据,导致基于神经网络训练的分类模型的准确率难以通过改进算法获得突破,亟需高效准确的高质量数据集构建方法。为此,引入加性角度边距惩罚,提出基于类内空间夹角约束与小样本采样的错误标签识别方法,用于识别错误标签数据。该方法提出了特征向量类内空间夹角的概念,并将其作为模型预测结果置信度的评价标准,使得置信度具备较强的几何特性,增强了样本间的区分度;分析错误标签数据对特征向量类内空间夹角分布的影响,向类内空间夹角添加加性角度边距约束实现对错误标签数据的分离,并提出了置信度阈值的自动选取方法;提出小样本采样方法进一步提升错误标签数据的识别效果。分别采用公开的THUCNews样本集和电力现场作业文本数据集进行实验,结果验证了所提方法的有效性。 |
关键词: 电力领域文本分类 错误标签数据识别 类内空间夹角 加性角度边距惩罚 小样本采样 |
DOI:10.16081/j.epae.202412010 |
分类号:TM73 |
基金项目:国家电网有限公司华东分部科技项目(520800230008) |
|
Mislabeled data identification method based on intra-class spatial angle constraint and few-sample sampling |
ZHANG Yixiang1, YU Jiamiao1, WANG Huifang1, FEI Zhengming2, LUO Huafeng3, XUAN Jiazhuo3
|
1.College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China;2.State Grid East China Branch, Shanghai 200120, China;3.Electric Power Research Institute of Zhejiang Electric Power Company Limited, Hangzhou 310014, China
|
Abstract: |
Mislabeled data with uncertain proportions, have become a prevalent issue in text classification datasets within the electric power professional domain, which hinder breakthroughs in the accuracy of classification models trained based on neural networks by improving algorithms. There is an urgent demand for efficient and accurate methods for constructing high-quality datasets. To this end, a mislabeled data identification method based on an intra-class spatial angle constraint with additive angular margin penalty and few-sample sampling is proposed. This method introduces the concept of intra-class spatial angle of the feature vector and uses it as the evaluation criterion for the confidence of the model prediction results, which ensures the confidence exhibits strong geometric characteristics and enhances the discriminative power. By analyzing the influence of the distribution of the mislabeled data on the intra-class spatial angle of the feature vector, an additive angle margin constraint is added into the intra-class spatial angle for the separation of the mislabeled data, then an automatic selection method of the confidence threshold is proposed. A few-sample sampling method is proposed to further enhance the identification effectiveness of mislabeled data. Experiments on the public THUCNews dataset and the power field operation text dataset demonstrate the effectiveness of the proposed method. |
Key words: text classification in electric power professional domain mislabeled data identification intra-class spatial angle additive angular margin penalty few-sample sampling |