The original MNIST dataset of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. This dataset is converted for outlier detection as digit-zero class is considered as inliers, while 700 images are sampled from digit-six class as the outliers. In addition, 100 features are randomly selected from 784 total features.
Bandaragoda, Tharindu R., et al. “Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble.” 2014 IEEE International Conference on Data Mining Workshop. IEEE, 2014.
Description: X = Multi-dimensional point data, y = labels (1 = outliers, 0 = inliers)