Skip to content
Menu
  • Outlier Detection DataSets (ODDS)
  • About ODDS
  • Search
  • Outlier Detection DataSets (ODDS)
  • About ODDS
ODDS

Speech dataset

Dataset Information

The real-world speech data set consists of 3686 segments of English speech spoken with different accents. This dataset is provided by the Speech Processing Group at Brno University of Technology, Czech Republic. The majority data corresponds to American accent and only 1.65% corresponds to one of seven other accents (these are referred to as outliers). The speech segments are represented by 400-dimensional so called i-vectors which are widely used state-of-the-art features for speaker and language recognition. It is a subset of data described here.

Source (citation)

Learing Outlier Ensembles: The Best of Both Worlds – Supervised and Unsupervised. Barbora Micenkova, Brian McWilliams, and Ira Assent, KDD ODD2 Workshop, 2014.

Downloads

File: speech.mat

Description: X = Multi-dimensional point data, y = labels (1 = outliers, 0 = inliers)

Archives

Categories

  • No categories
  • Outlier Detection DataSets (ODDS)
  • About ODDS

Copyright © 2023 ODDS. All Rights Reserved.

Codilight Theme by FameThemes