Categorical variables represent sorts of data which can be divided into groups. Samples of categorical variables are race, sex, age group, and academic level. While the latter two variables can also be considered during a numerical manner by using exact values for age and highest grade completed, it’s often more informative to categorize such variables into a comparatively small number of groups.

Analysis of categorical data generally involves the utilization of knowledge tables. A two-way table presents categorical data by counting the amount of observations that fall under each group for 2 variables, one divided into rows and therefore the other divided into columns. For instance, suppose a survey was conducted of a gaggle of 20 individuals, who were asked to spot their hair and eye color. A two-way table presenting the results might appear as follows:

Eye Color

Hair Color Blue Green Brown Black Total

—————————————————–

Blonde 2 1 2 1 6

Red 1 1 2 0 4

Brown 1 0 4 2 7

Black 1 0 2 0 3

—————————————————–

Total 5 2 10 3 20

The totals for every category, also referred to as marginal distributions, provide the amount of people in each row or column without accounting for the effect of the opposite variable (in the instance above, the entire number of people with blue eyes, no matter hair color, is 5).

Since simple counts are often difficult to research, two-way tables are often converted into percentages. Within the above example, there are 4 individuals with red hair. Since there have been a complete of 20 observations, this suggests that 20% of the individuals survered are redheads. One also might want to research the odds within a given category — of the 4 redheads, 2 (50%) have brown eyes, 1 (25%) has blue eyes, and 1 (25%) has green eyes.

For a more detailed example, consider the subsequent dataset, “Weights of 1996 US Olympic Rowing Team.” the primary column gives the name of the rower, the second gives his event, and therefore the third gives his weight. Alltogether with weight given as numeric data exist 8 different event categories.

Auth LW_double_sculls 154 Klepacki four 205

Beasley single_sculls 224 Koven eight 200

Brown eight 214 Mueller quad 215

Burden eight 195 Murphy eight 220

Carlucci LW_four 160 Murray four 205

Collins,D LW_four 155 Peterson,M pair 210

Collins,P eight 195 Peterson,S LW_double_sculls 160

Gailes quad 205 Pfaendtner LW_four 160

Hall four 195 Schnieder LW_four 158

Holland pair 195 Scott four 208

Honebein eight 200 Segaloff coxswain 121

Jamieson quad 210 Smith eight 207

Kaehler eight 210 Young quad 207

Data source: Team member biographies given on the NBC Olympic internet site. Dataset available through the JSE Dataset Archive.

Before creating a two-way table for events and weights, the analyst must first divide the numeric “weight” column into groups, creating a categorical variable. Using the MINITAB “DESCRIBE” command gives the subsequent information about the load data:

Descriptive Statistics

Variable N Mean Median Tr Mean StDev SE Mean

Weight 26 191.85 202.50 193.46 26.27 5.15

Variable Min Max Q1 Q3

Weight 121.00 224.00 160.00 210.00

One might choose, supported this information, to divide the load values into 4 groups, like under 150 lbs, 150-175 lbs, 175-200 lbs, and over 200 lbs. Once the info has been categorized (the MINITAB “CODE” command could also be wont to perform this function), the MINITAB “TABLE” command will create two-way tables, as follows:

Rows: Event Columns: Weight_Class

200 All

LW_doubl 0 2 0 0 2

single_s 0 0 0 1 1

eight 0 0 4 4 8

LW_four 0 4 0 0 4

quad 0 0 0 4 4

four 0 0 1 3 4

pair 0 0 1 1 2

coxswain 1 0 0 0 1

All 1 6 6 13 26

Using the “ROWPERCENT” subcommand reproduces this table with the odds of rowers in each weight category by event:

Rows: Event Columns: Weight_Class

0 1 2 3 All

LW_doubl — 100.00 — — 100.00

single_s — — — 100.00 100.00

eight — — 50.00 50.00 100.00

LW_four — 100.00 — — 100.00

quad — — — 100.00 100.00

four — — 25.00 75.00 100.00

pair — — 50.00 50.00 100.00

coxswain 100.00 — — — 100.00

All 3.85 23.08 23.08 50.00 100.00

These results indicate that half all rowers are within the upper weight class, with the rest evenly divided between the 2 middle classes (with the exception of the coxswain, who is that the only team member within the lightest weight group). Similarly, the “COLPERCENT” subcommand provides the share of rowers in each event category by weight.