Towards fair graph-based machine learning software: unveiling and mitigating graph model bias

Published in Journal of AI and Ethics, 2025

Machine Learning (ML) software is increasingly influencing decisions that impact individuals’ lives. However, some of these decisions show discrimination and thus introduce algorithmic biases against certain social groups defined by sensitive attributes (e.g., gender or race). This has elevated software fairness bugs to an increasingly significant concern for software engineering (SE). However, most existing bias mitigation works focus on tabular data. Exploration in the context of graph data, which is widely prevalent in real-world applications, has been relatively neglected. This paper thus sheds light on the impact of biases within graph data on the fairness of ML models and how to detect biases in graph based software. Subsequently, we introduce Fair Graph Redistribution and Generation (FGRG), a novel fair imbalance node learning technique designed to enhance the fairness of graph-based ML software. Specifically, FGRG commences with the generation of unbiased node embeddings for each node. Next, FGRG identifies node similarities within the embedding space and generates new nodes to rebalance the internal distribution, ensuring subgroups with different sensitive attributes are balanced representations across both positive and negative classes. Experiments on 8 real-world graphs across 3 fairness and 4 performance measurements show that our framework significantly outperforms the state-of-the-art baselines, and also achieves comparable prediction performance. In particular, FGRG beats the trade-off baseline in 79.16% of the fairness cases evaluated and 62.5% of the performance cases evaluated.