Dataset generator Datgen, formerly SCDS, is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.

DELVE - Data for Evaluating Learning in Valid Experiments Data for Evaluating Learning Valid Experiments: A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with other methods on many datasets.

HS3D - Homo Sapiens Splice Sites Dataset HS3D (Homo Sapiens Splice Sites Dataset) is a database of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank primate sequences Rel.123. The aim of this data set is to give standardized material to train and to assess the prediction accuracy of computational approaches for gene identification and characterization.

National Space Science Data Center Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software.

Time Series Data Library A collection of over 500 time series, maintained by Rob Hyndman. Time series are organized by subject.

TREC Data Text datasets used in information retrieval and learning in text domains.

UCI Machine Learning Repository A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.

Web->KB dataset Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.