Outliers or anomalies are instances that do not conform to the norm of a dataset. Outlier detection is an important data mining problem that has been researched within diverse research areas and applications domains such as intrusion detection, fraud detection, unusual event detection, disease condition detection etc.
The exact notion of an outlier is different for different application domains. Hence, applying a technique developed for one domain to another is not straightforward. Moreover, availability of labeled data for training/validation of outlier detection methods is scarce and often noise contained in data tends to be similar to outliers, thus makes it difficult to distinguish them. Because of these challenges outlier detection is not an easy problem to solve. Furthermore, research on outlier detection has been held back by the lack of good benchmark datasets with ground truths. Existing benchmarks are typically either proprietary or else very artificial. Moreover, existing real-world outlier/anomaly detection datasets lack the availability of ground truth.
In ODDS, we openly provide access to a large collection of outlier detection datasets with ground truth (if available). Our focus is to provide datasets from different domains and present them under a single platform for the research community. As such, we arrange the datasets based on their types into different tables in ODDS library.
The ODDS library is being actively developed since summer 2016 and is growing as a result of our research pursuits in outlier/anomaly mining and also to help the corresponding research community. Researchers are welcome to share their datasets with us to include in ODDS library by emailing firstname.lastname@example.org.
Disclaimer: ODDS library contains datasets collected by DATALab as well as many other different research groups. Readers are suggested to email the corresponding contacts of the research group for specific datasets.
- Shebuti Rayana – email@example.com
Please use the above contact or comment section of the website to send us comments, questions, bug reports, broken links and inquires about hosting your datasets on our website.