实验室博士生黄建德撰写的论文《DBCGM: A Granular Model for Big Data Classification based on Data Bisection and Cascade Weighted Clustering》被《 IEEE Transactions on Knowledge and Data Engineering》录用。 IEEE Transactions on Knowledge and Data Engineering为CCF 推荐A类国际期刊。论文将于2025年正式发表。
Abstract—Researchers confront significant challenges when classifying and modeling data due to the growing system complexity, data volume, and requirement for accurate and reliable models. Information granules, a fundamental component of Granular Computing (GrC), play a crucial role in human cognition. In this study, we develop a granular classifier called DBCGM, which facilitates delivering models with improved accuracy and efficiency in big data classification. In particular, DBCGM achieves the goal through the following four steps. First, we construct a 1-D index for each point and then utilize a context-based data bisection method to obtain non-overlapping subsets. These disjoint subsets enhance both the quality and efficiency of big data clustering and make it possible to process the entire dataset simultaneously. Next, we propose a cascade weighted clustering (CWC) algorithm to generate numeric prototypes from the obtained subsets. Then, following the principle of justification granularity (PJG), the numeric prototypes are refined into information granules. Finally, the classifier can be regarded as the weighted sum of all the values of the spatial relationship between the input instance and the information granules. We evaluate the performance of DBCGM in terms of accuracy, V-measure, and execution time. We compare DBCGM with benchmark classifiers and three big data granulating methods. Experimental results on both synthetic and public datasets show that DBCGM outperforms the existing methods. In particular, DBCGM reduces the running time by an average of 9.15%, and improves the V-measure and accuracy by an average of 5.62% and 10.64%, respectively.