Led dataset
The information provided here has been largely replicated from the notes
accompanying the LED dataset in the UCI repository of
machine learning databases.
1. Title of Database: LED display domain + 17 irrelevant attributes
2. Sources:
-
Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984).
Classification and Regression Trees. Wadsworth International
Group: Belmont, California. (see pages 43-49).
- Donor: David Aha
- Date: 11/10/1988
3. Past Usage: (many)
-
CART book (above):
- Optimal Bayes classification rate: 74%
- CART decision tree algorithm: 70%
- 200 training and 5000 test instances
- nearest neighbor algorithm: 41%
- Quinlan,J.R. (1987). Simplifying Decision Trees. In International
Journal of Man-Machine Studies.
- C4 decision tree algorithm: 72.6% (using pessimistic pruning)
- 2000 training and 500 test instances
- Tan,M. & Eshelman,L. (1988). Using Weighted Networks to Represent
Classification Knowledge in Noisy Domains. In Proceedings of the
5th International Conference on Machine Learning, 121-134, Ann
Arbor, Michigan: Morgan Kaufmann.
- IWN system: 73.3% (using the And-OR classification algorithm)
- 400 training and 500 test cases
- Aha,D.W., & Kibler,D. (1988). Unpublished data.
- NTgrowth+ instance-based learning algorithm: (500 test instances)
- 700 training instances: 70.7%
- 1000 training instances: 71.5%
4. Relevant Information Paragraph:
This simple domain contains 7 Boolean attributes and 10 concepts,
the set of decimal digits. Recall that LED displays contain 7
light-emitting diodes -- hence the reason for 7 attributes. The
problem would be easy if not for the introduction of noise. In
this case, each attribute value has the 10% probability of having
its value inverted.
It's valuable to know the optimal Bayes rate for these databases.
In this case, the misclassification rate is 26% (74% classification
accuracy).
5. Number of Attributes:
7, all Boolean-valued.
6. Attribute Information:
- All attribute values are either 0 or 1, according to whether
the corresponding light is on or not for the decimal digit.
- Each attribute (excluding the class attribute, which is an
integer ranging between 0 and 9 inclusive) has a 10% percent
chance of being inverted.
7. Missing Attribute Values: None
8. Class Distribution: 10% (Theoretical)
Each concept (digit) has the same theoretical probability