Census dataset
The dataset was designed on the basis of data provided by US Census Bureau (under Lookup access: Summary
Tape File 1).
The data were collected as part of the 1990 US census. These are
mostly counts cumulated at different survey levels. For the purpose of
this data set a level State-Place was used. Data from all states
was obtained. Most of the counts were changed into appropriate
proportions.
There are 4 prototasks:
- house-price-8H
- house-price-8L
- house-price-16H
- house-price-16L
These are all concerned with predicting the median price of the house
in the region based on demographic composition and a state of housing
market in the region. A number in the name signifies dimensionality of
the input. A following letter denotes a very rough approximation to
the difficulty of the task. For Low task difficulty, more correlated
inputs were choosen as signified by univariate smooth fit of that
input on the target. Tasks with High difficulty have had their inputs
choosen to make the modeling more difficult due to higher variance or
lower correlation of the inputs to the target.
Each prototask has 6 tasks of sizes: 64 128 256 512 1024
2048.
The test sets are hierarchical - within each task, separate
test set is used for each instance.
The inputs used by any of the four prototasks are as follows:
- P1 ---- total persons count in the region
- P2 ---- total families' count in the region
- P3 ---- total number of households (HH's)
- P5.1 --- percentage of males
- P6.2 --- percentage of black people
- P6.4 --- percentage of people which are of Asian or Pacific Islander race
- P11.3 -- percentage of people between 25-64 years of age
- P11.4 -- percentage over 64 years old
- P14.6 -- percentage of never-married females
- P14.9 -- percentage widowed females
- P15.1 -- percentage of people in family HH's
- P15.3 -- percentage of people in group quarters (incl
jails)
- P16.1 -- percentage of HH's with 1 person
- P16.2 -- percentage of HH's with 2 or more persons
which are family HH-lds
- P17A -- average family size
- P18.2 -- percentage of HH's with 1+ persons under 18
which are non-family HH-lds
- P19.2 -- percentage of HH's with black Householder (HH'lder)
- P19.4 -- percentage of HH's with asian HH'lder
- P20.1 -- percentage of HH's with Hispanic HH'lder
- P25.1 -- percentage of HH's with more then two persons 65 years old or more
- P26.1 -- percentage of HH's with more then one non-relatives living in
- P27.4 -- percentage of HH-lds which are non-family with
2+ persons
- H1 ---- total number of Housing Units (HU's)
- H2.1 --- percentage of HU's occupied
- H2.2 --- percentage of HU's vacant
- H3.1 --- percentage of occupied HU's which are owner-occupied
- H5.2 -- percentage of vacant HU's which are for sale only
- H5.6 -- percentage of vacant HU's which are not for rent, sale,
migrant workers nor for seasonal, recreational or occasional use
- H8.1 --- percentage of occ-ed HU's with white HH-lder
- H8.2 --- percentage of occ-ed HU's with black HH-lder
- H10.1 -- percentage of occ-ed HU's with HH-lder not of
Hispanic origin
- H10.2 -- percentage of occ-ed HU's with HH-lder of
Hispanic origin
- H13.1 -- percentage of HU's with 1-4 rooms
- H15.1 -- Average number of rooms in an owner-occupied HU's
- H18.A -- average number of persons per ownOcc HU's
- H40.4 -- percentage of vacant-for-sale HU's vacant more
then 6 months
Contributed by: Rafal Kustra.