Led dataset

The information provided here has been largely replicated from the notes accompanying the LED dataset in the UCI repository of machine learning databases.

1. Title of Database: LED display domain + 17 irrelevant attributes

2. Sources:

3. Past Usage: (many)

  1. CART book (above):
  2. Quinlan,J.R. (1987). Simplifying Decision Trees. In International Journal of Man-Machine Studies.
  3. Tan,M. & Eshelman,L. (1988). Using Weighted Networks to Represent Classification Knowledge in Noisy Domains. In Proceedings of the 5th International Conference on Machine Learning, 121-134, Ann Arbor, Michigan: Morgan Kaufmann.
  4. Aha,D.W., & Kibler,D. (1988). Unpublished data.

4. Relevant Information Paragraph:

This simple domain contains 7 Boolean attributes and 10 concepts, the set of decimal digits. Recall that LED displays contain 7 light-emitting diodes -- hence the reason for 7 attributes. The problem would be easy if not for the introduction of noise. In this case, each attribute value has the 10% probability of having its value inverted.

It's valuable to know the optimal Bayes rate for these databases. In this case, the misclassification rate is 26% (74% classification accuracy).

5. Number of Attributes:

7, all Boolean-valued.

6. Attribute Information:

7. Missing Attribute Values: None

8. Class Distribution: 10% (Theoretical)

Each concept (digit) has the same theoretical probability



Last Updated 7 November 1996
Comments and questions to: delve@cs.toronto.edu
Copyright