Dataset Information
The original pendigits (Pen-Based Recognition of Handwritten Digits) dataset from UCI machine learning repository is a multiclass classification dataset having 16 integer attributes and 10 classes (0 … 9). The digit database is created by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by the other 14 are used for writer independent testing. In this dataset, all classes have equal frequencies. So the number of objects in one class (corresponding to the digit “0”) is reduced by a factor of 10%.
Source (citation)
F. Keller, E. Muller, K. Bohm.“HiCS: High-contrast subspaces for density-based outlier ranking.” ICDE, 2012.
Saket Sathe and Charu C. Aggarwal. LODES: Local Density meets Spectral Outlier Detection. SIAM Conference on Data Mining, 2016.
Download
File: pendigits.mat
Description: X = Multi-dimensional point data, y = labels (1 = outliers, 0 = inliers)