Data Repositories
This is a list of public dataset repositories we aim to connect to for getting more varied datasets in OpenML. These have widely varying data formats, so we need both manual selection plus automatic conversion or meta-data extraction to make them easily usable.
A collection of sources made by different users
Machine learning dataset repositories (mostly already in OpenML)
- UCI: https://archive.ics.uci.edu/ml/index.html
- KEEL: http://sci2s.ugr.es/keel/datasets.php
- LIBSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
- AutoWEKA datasets: http://www.cs.ubc.ca/labs/beta/Projects/autoweka/datasets/
- skData package: https://github.com/jaberg/skdata/tree/master/skdata
- Rdatasets: http://vincentarelbundock.github.io/Rdatasets/datasets.html
- DataBrewer: https://github.com/rmax/databrewer
Time series data:
- UCR: http://timeseriesclassification.com/
- Older version: http://www.cs.ucr.edu/~eamonn/time_series_data/
Deep learning datasets (mostly image data)
- http://deeplearning.net/datasets/
- https://deeplearning4j.org/opendata
- http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html
Extreme classification:
MLData (will merge with OpenML in 2017)
AutoWEKA datasets:
Kaggle public datasets
RAMP Challenge datasets
Wolfram data repository
Data.world
Figshare (needs digging, lots of Excel files)
KDNuggets list of data sets (meta-list, lots of stuff here):
Benchmark Data Sets for Highly Imbalanced Binary Classification
Feature Selection Challenge Datasets
BigML's list of 1000+ data sources
Massive list from Data Science Central.
R packages (also see https://github.com/openml/openml-r/issues/185)
- http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html
- mlbench
- Stata datasets: http://www.stata-press.com/data/r13/r.html
UTwente Activity recognition datasets:
Vanderbilt:
Quandl
Microarray data:
- http://genomics-pubs.princeton.edu/oncology/
- http://svitsrv25.epfl.ch/R-doc/library/multtest/html/golub.html
Medical data:
- http://www.healthdata.gov/
- http://homepages.inf.ed.ac.uk/rbf/IAPR/researchers/PPRPAGES/pprdat.htm
- http://hcup-us.ahrq.gov/
- https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Physician-and-Other-Supplier.html
- https://nsduhweb.rti.org/respweb/homepage.cfm
- http://orwh.od.nih.gov/resources/policyreports/womenofcolor.asp
Nature.com Scientific data repositories list