The human voice conveys unique characteristics of an individual, making voice biometrics a key technology for verifying identities in various industries. Despite the impressive progress of speaker recognition systems in terms of accuracy, a number of ethical and legal concerns has been raised, specifically relating to the fairness of such systems. Our work aims to explore the disparity in performance achieved by state-of-the-art deep speaker recognition systems, when different groups of individuals characterized by a common sensitive attribute (e.g., gender) are considered. In order to mitigate the unfairness we uncovered by means of an exploratory study, we investigate whether balancing the representation of the different groups of individuals in the training set can lead to a more equal treatment of these demographic groups. Experiments on two state-of-the-art neural architectures and a large-scale public dataset show that models trained with demographically-balanced training sets exhibit a fairer behavior on different groups, while still being accurate. Our study is expected to provide a solid basis for instilling beyond-accuracy objectives (e.g., fairness) in speaker recognition.
Here you can find some statistics about the dataset.
1.046.078
The data we collected includes individuals that span from different countries and different languages (i.e., Chinese, French, German, English, and Kabyle)
12.057
Each speaker declared some sensitive attributes, i.e., his/her accent, age, and gender.
Speaker distribution across lenguage.
Speaker distribution across languages.
Age distribution across speakers.
Utterance distribution across speakers..
Here you can find the code and the dataset used for the test, if you want to download the dataset please compile the form at the following link
.The FairVoice dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners. A complete version of the license can be found here.
Please contact the authors below if you have any question regarding the dataset.
Please cite the following if you make use of the dataset.