Team Members: Joshua Riznyk, Harry L. Sanders, Siddhi Suresh
Research Proposal Topic: Fairness in Facial Classification Algorithms
Proposal Abstract: Our research aims at better understanding how the use of demographically balanced versus unbalanced data sets, in the training and testing of classification algorithms, affects their ability to classify individuals, especially those in minority groups. We plan to utilize untrained classification algorithm(s), training with proportional data (in terms of social categorizations) and disproportional data to determine if algorithmic bias is inherent to the design of the algorithm, or rather the methods and data used in their training. Recent studies illustrate how algorithms are inherently biased even before training, however, we aim to better understand how biased training data can additionally affect the level of bias present within these algorithms. Highlighting this disparity could improve and refine the direction of future studies to assist in the understanding of how these biases infiltrate these algorithms, and in the development of methods to prevent these disparities in future classification models. Open-source, untrained classification model will be trained using both unbiased data sets and biased data sets, with the purpose of comparing the results from each algorithm. By training each algorithm with different data sets and comparing results, a fairness metric will be used to determine how fair these algorithms are, and thus determine the effect of biased and unbiased training sets on the outcome of these algorithms. Additionally, this research plans to draw correlations between the proportionality of the data sets used for training and the quantitative fairness of the algorithm. The fairness metric is defined as the degree to which an algorithm performs equally amongst all subjects, regardless of demographics like race, gender, or age. This metric will use the ratio of the best to worst Positive Predictive values, and from this, it can be determined if there exists a correlation between the fairness levels of the algorithm and the proportionality within the training dataset. This research intends to lead to improved algorithms that, with better accuracy and precision, can lead to better trust and value of this technology and contribute to its fair use in the greater society.