Mammography dataset

Dataset Information

The original Mammography (Woods et al., 1993) data set was made available by the courtesy of Aleksandar Lazarevic. This dataset is publicly available in openML. It has 11,183 samples with 260 calcifications. If we look at predictive accuracy as a measure of goodness of the classifier for this case, the default accuracy would be 97.68% when every sample is labeled non-calcification. But, it is desirable for the classifier to predict most of the calcifications correctly. For outlier detection, the minority class of calcification is considered as outlier class and the non-calcification class as inliers.  

Source (citation)

File: mammography.mat

Description: X = Multi-dimensional point data, y = labels (1 = outliers, 0 = inliers)