delve led dataset Detail

Led dataset

The information provided here has been largely replicated from the notes accompanying the LED dataset in the UCI repository of machine learning databases.

1. Title of Database: LED display domain + 17 irrelevant attributes

2. Sources:

Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group: Belmont, California. (see pages 43-49).
Donor: David Aha
Date: 11/10/1988

3. Past Usage: (many)

CART book (above):
- Optimal Bayes classification rate: 74%
- CART decision tree algorithm: 70%
- 200 training and 5000 test instances
- nearest neighbor algorithm: 41%
Quinlan,J.R. (1987). Simplifying Decision Trees. In International Journal of Man-Machine Studies.

C4 decision tree algorithm: 72.6% (using pessimistic pruning)
2000 training and 500 test instances

Tan,M. & Eshelman,L. (1988). Using Weighted Networks to Represent Classification Knowledge in Noisy Domains. In Proceedings of the 5th International Conference on Machine Learning, 121-134, Ann Arbor, Michigan: Morgan Kaufmann.
- IWN system: 73.3% (using the And-OR classification algorithm)
- 400 training and 500 test cases
Aha,D.W., & Kibler,D. (1988). Unpublished data.
- NTgrowth+ instance-based learning algorithm: (500 test instances)
- 700 training instances: 70.7%
- 1000 training instances: 71.5%

4. Relevant Information Paragraph:

This simple domain contains 7 Boolean attributes and 10 concepts, the set of decimal digits. Recall that LED displays contain 7 light-emitting diodes -- hence the reason for 7 attributes. The problem would be easy if not for the introduction of noise. In this case, each attribute value has the 10% probability of having its value inverted.

It's valuable to know the optimal Bayes rate for these databases. In this case, the misclassification rate is 26% (74% classification accuracy).

5. Number of Attributes:

7, all Boolean-valued.

6. Attribute Information:

All attribute values are either 0 or 1, according to whether the corresponding light is on or not for the decimal digit.
Each attribute (excluding the class attribute, which is an integer ranging between 0 and 9 inclusive) has a 10% percent chance of being inverted.

7. Missing Attribute Values: None

8. Class Distribution: 10% (Theoretical)

Each concept (digit) has the same theoretical probability

Last Updated 7 November 1996
Comments and questions to: delve@cs.toronto.edu