See Discrimination and Privacy in the Information Society: Call for more information.
Data mining is a research discipline aimed at the development of automatic tools for analyzing huge collections of data. One of the tasks in data mining is classification: a set of training examples divided into classes is given. Based on these examples a model is learned that allows for automatically classifying new examples. Many models and learning algorithms exist for this problem. The quality of the learned models, however, depends critically on the quality of the training data. If the training data is incorrect, poor models will result. Within this research topic we study cases in which the input data contains dependencies between some attributes and the class that are either incorrect, or undesired. Such cases arise naturally when the data is collected from different sources where data was labelled in different ways. Examples include: comparing reviewer scores for research papers, movie ratings given by different people, grades for different courses, etc. In this project we propose the classification with independence constraints problem as a principled approach to tackle this issue. These results also have ethical and societal implications because of the connection with discrimination-aware data mining.