Wisconsin-Breast Cancer (Diagnostics) dataset

Dataset information

The original Wisconsin-Breast Cancer (Diagnostics) dataset (WBC) from UCI machine learning repository is a classification dataset, which records the measurements for breast cancer cases. There are two classes, benign and malignant. The malignant class of this dataset is downsampled to 21 points, which are considered as outliers, while points in the benign class are considered inliers.

Source (citation)

F. Keller, E. Muller, K. Bohm.“HiCS: High-contrast subspaces for density-based outlier ranking.” ICDE, 2012.

A. Zimek, M. Gaudet, R. J. Campello, and J. Sander,Subsampling for efficient and effective unsupervised outlier detection ensembles.” in ACM SIGKDD, 2013, pp. 428–436.

C. C. Aggarwal and S. Sathe, “Theoretical foundations and algorithms for outlier ensembles.” ACM SIGKDD Explorations Newsletter, vol. 17, no. 1, pp. 24–47, 2015.Downloads

Downloads

File: wbc.mat

Description: X = Multi-dimensional point data, y = labels (1 = outliers, 0 = inliers)