Japanese Vowels dataset

Dataset information

The original Japanese Vowels (Vowels) dataset from UCI machine learning repository is a multivariate time series data, where nine male speakers uttered two Japanese vowels /ae/ successively. Here, one utterance by a speaker forms a time series whose length is in the range 7-29 and each point of a time series is of 12 features (12 coefficients). This is a classification dataset to classify the speakers. For outlier detection, each frame in the training data is treated as an individual data point, whereas the UCI repository treats a block of frames (utterance) as an individual point. In this case, class (speaker) 1 is downsampled to 50 outliers. The inliers contained classes 6, 7 and 8. Other classes are discarded.

Source (citation)

C. C. Aggarwal and S. Sathe, “Theoretical foundations and algorithms for outlier ensembles.” ACM SIGKDD Explorations Newsletter, vol. 17, no. 1, pp. 24–47, 2015.

Saket Sathe and Charu C. Aggarwal. LODES: Local Density meets Spectral Outlier Detection. SIAM Conference on Data Mining, 2016.

Downloads

File: vowels.mat

Description: X = Multi-dimensional point data, y = labels (1 = outliers, 0 = inliers)