Led dataset
The information provided here has been largely replicated from the notes
accompanying the LED dataset in the UCI repository of
machine learning databases.
1. Title of Database: LED display domain + 17 irrelevant attributes
2. Sources:

Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984).
Classification and Regression Trees. Wadsworth International
Group: Belmont, California. (see pages 4349).
 Donor: David Aha
 Date: 11/10/1988
3. Past Usage: (many)

CART book (above):
 Optimal Bayes classification rate: 74%
 CART decision tree algorithm: 70%
 200 training and 5000 test instances
 nearest neighbor algorithm: 41%
 Quinlan,J.R. (1987). Simplifying Decision Trees. In International
Journal of ManMachine Studies.
 C4 decision tree algorithm: 72.6% (using pessimistic pruning)
 2000 training and 500 test instances
 Tan,M. & Eshelman,L. (1988). Using Weighted Networks to Represent
Classification Knowledge in Noisy Domains. In Proceedings of the
5th International Conference on Machine Learning, 121134, Ann
Arbor, Michigan: Morgan Kaufmann.
 IWN system: 73.3% (using the AndOR classification algorithm)
 400 training and 500 test cases
 Aha,D.W., & Kibler,D. (1988). Unpublished data.
 NTgrowth+ instancebased learning algorithm: (500 test instances)
 700 training instances: 70.7%
 1000 training instances: 71.5%
4. Relevant Information Paragraph:
This simple domain contains 7 Boolean attributes and 10 concepts,
the set of decimal digits. Recall that LED displays contain 7
lightemitting diodes  hence the reason for 7 attributes. The
problem would be easy if not for the introduction of noise. In
this case, each attribute value has the 10% probability of having
its value inverted.
It's valuable to know the optimal Bayes rate for these databases.
In this case, the misclassification rate is 26% (74% classification
accuracy).
5. Number of Attributes:
7, all Booleanvalued.
6. Attribute Information:
 All attribute values are either 0 or 1, according to whether
the corresponding light is on or not for the decimal digit.
 Each attribute (excluding the class attribute, which is an
integer ranging between 0 and 9 inclusive) has a 10% percent
chance of being inverted.
7. Missing Attribute Values: None
8. Class Distribution: 10% (Theoretical)
Each concept (digit) has the same theoretical probability