实验室博士生杨其奋撰写的论文《CIRCUS: A Causal Intervention-based Framework for Enhancing Counterfactual Fairness in Trained Classifiers》被国际权威学术期刊《IEEE Transactions on Neural Networks and Learning Systems》录用。论文将于2026年正式发表。
Abstract—Ensuring model fairness for preventing potential biases based on any sensitive attribute is crucial for the societal acceptance of artificial intelligence in critical applications. Among various fairness concepts, counterfactual fairness has gained prominence as it is grounded in causal inference. This concept requires that an individual’s prediction in the original world remains consistent with that in the counterfactual world where the sensitive feature value is modified. In this article, we aim to mitigate counterfactual biases of the model through causal intervention. Specifically, we first achieve effective causal intervention and counterfactual generation by proposing the Causal Inference Tabular Generative Adversarial Network (CITGAN) architecture. Unlike prior approaches based on variational auto encoders that inherently lack structural causal model fidelity due to simultaneous generation, CITGAN strictly enforces causal consistency via an end-to-end topological generation process. By integrating exogenous variable inference with sequential generation, CITGAN ensures that functional dependencies are structurally preserved. Building on CITGAN, we design the CIRCUS framework, a causal intervention-based framework designed to intuitively enhance the counterfactual fairness in trained classifiers. CIRCUS generates counterfactually discriminatory samples via causal intervention, guided by gradients and feature contributions, and subsequently applies bias correction preprocessing to their labels for classifier retraining. Experimental results demonstrate that CIRCUS effectively enhances counterfactual fairness while maintaining robust classification performance. Specifically, for the Deep Neural Network (DNN) model, the MMDL and MMDK values are reduced by averages of 39.7% and 40.4%, respectively, compared with the second-best result. For the Residual Network (ResNet) model, these reductions amount to 56.7% and 54.5%, respectively.