Categorical variables represent sorts of data which can be divided into groups. Samples of categorical variables are race, sex, age group, and academic level. While the latter two variables can also be considered during a numerical manner by using exact values for age and highest grade completed, it’s often more informative to categorize such variables into a comparatively small number of groups.
Analysis of categorical data generally involves the utilization of knowledge tables. A two-way table presents categorical data by counting the amount of observations that fall under each group for 2 variables, one divided into rows and therefore the other divided into columns. For instance, suppose a survey was conducted of a gaggle of 20 individuals, who were asked to spot their hair and eye color. A two-way table presenting the results might appear as follows:
Eye Color
Hair Color Blue Green Brown Black Total
—————————————————–
Blonde 2 1 2 1 6
Red 1 1 2 0 4
Brown 1 0 4 2 7
Black 1 0 2 0 3
—————————————————–
Total 5 2 10 3 20
The totals for every category, also referred to as marginal distributions, provide the amount of people in each row or column without accounting for the effect of the opposite variable (in the instance above, the entire number of people with blue eyes, no matter hair color, is 5).
Since simple counts are often difficult to research, two-way tables are often converted into percentages. Within the above example, there are 4 individuals with red hair. Since there have been a complete of 20 observations, this suggests that 20% of the individuals survered are redheads. One also might want to research the odds within a given category — of the 4 redheads, 2 (50%) have brown eyes, 1 (25%) has blue eyes, and 1 (25%) has green eyes.
For a more detailed example, consider the subsequent dataset, “Weights of 1996 US Olympic Rowing Team.” the primary column gives the name of the rower, the second gives his event, and therefore the third gives his weight. Alltogether with weight given as numeric data exist 8 different event categories.
Auth LW_double_sculls 154 Klepacki four 205
Beasley single_sculls 224 Koven eight 200
Brown eight 214 Mueller quad 215
Burden eight 195 Murphy eight 220
Carlucci LW_four 160 Murray four 205
Collins,D LW_four 155 Peterson,M pair 210
Collins,P eight 195 Peterson,S LW_double_sculls 160
Gailes quad 205 Pfaendtner LW_four 160
Hall four 195 Schnieder LW_four 158
Holland pair 195 Scott four 208
Honebein eight 200 Segaloff coxswain 121
Jamieson quad 210 Smith eight 207
Kaehler eight 210 Young quad 207
Data source: Team member biographies given on the NBC Olympic internet site. Dataset available through the JSE Dataset Archive.
Before creating a two-way table for events and weights, the analyst must first divide the numeric “weight” column into groups, creating a categorical variable. Using the MINITAB “DESCRIBE” command gives the subsequent information about the load data:
Descriptive Statistics
Variable N Mean Median Tr Mean StDev SE Mean
Weight 26 191.85 202.50 193.46 26.27 5.15
Variable Min Max Q1 Q3
Weight 121.00 224.00 160.00 210.00
One might choose, supported this information, to divide the load values into 4 groups, like under 150 lbs, 150-175 lbs, 175-200 lbs, and over 200 lbs. Once the info has been categorized (the MINITAB “CODE” command could also be wont to perform this function), the MINITAB “TABLE” command will create two-way tables, as follows:
Rows: Event Columns: Weight_Class
200 All
LW_doubl 0 2 0 0 2
single_s 0 0 0 1 1
eight 0 0 4 4 8
LW_four 0 4 0 0 4
quad 0 0 0 4 4
four 0 0 1 3 4
pair 0 0 1 1 2
coxswain 1 0 0 0 1
All 1 6 6 13 26
Using the “ROWPERCENT” subcommand reproduces this table with the odds of rowers in each weight category by event:
Rows: Event Columns: Weight_Class
0 1 2 3 All
LW_doubl — 100.00 — — 100.00
single_s — — — 100.00 100.00
eight — — 50.00 50.00 100.00
LW_four — 100.00 — — 100.00
quad — — — 100.00 100.00
four — — 25.00 75.00 100.00
pair — — 50.00 50.00 100.00
coxswain 100.00 — — — 100.00
All 3.85 23.08 23.08 50.00 100.00
These results indicate that half all rowers are within the upper weight class, with the rest evenly divided between the 2 middle classes (with the exception of the coxswain, who is that the only team member within the lightest weight group). Similarly, the “COLPERCENT” subcommand provides the share of rowers in each event category by weight.