Menu location: Menu location: **Data_Cleaning and Encoding_Categorise**.

This function enables you to categorise any set of data into groups that you specify, for example ages into age groups.

Typically, a continuous variable might be divided into categories or groups. Take the IgM variable in the parametric sheet of the test workbook for example; this has 298 observations which you might want to summarise in ranges of values. In order to do this, simply select the Data_Grouping_Categorise menu item then select the IgM column of data. You are presented with different ways to group your data into bins (intervals) of counts:

- Quartiles: 4 bins (< lower quartile, lower quartile to median, median to upper quartile, >= upper quartile)
- Quintiles: 5 bins (< first quintile… >= fourth quintile)
- Deciles: 10 bins (<first decile… >=ninth decile)
- Age groups: one of four common groupings (<15, 15-19… five yearly bands to 85+; <15, 15-24… ten yearly bands to 85+; <1, 1-4… five yearly bands to 85+; <1, 1-4… ten yearly bands to 75+)
- User-defined: from minimum min, in k intervals of equal size = step (<min + 1 * step, >= min + 1 * step to < min + 2 * step… in k intervals to >= min + k * step)

Using the IgM example in quartiles:

category | count |

< 0.5 | 56 |

>= 0.5; < 0.7 | 67 |

>= 0.7; < 1 | 98 |

>= 1 | 77 |

Using the IgM example in 10 intervals of 0.5 from:

category | count |

< 0.5 | 56 |

>= 0.5; < 1 | 165 |

>= 1; < 1.5 | 54 |

>= 1.5; < 2 | 14 |

>= 2; < 2.5 | 6 |

>= 2.5; < 3 | 2 |

>= 3; < 3.5 | 0 |

>= 3.5; < 4 | 0 |

>= 4; < 4.5 | 0 |

>= 4.5 | 1 |

A quick look at the counts above shows a similar picture to that you would see from a histogram, namely that the data are not evenly spread into ranges of values, i.e. they are skewed. The text-based histogram will give you counts, but note that the bin values in a histogram are the mid-point of the bin and not the cut-off value between bins, i.e. they are the same as a user-defined bin cut-off values minus half of the step size.

__Technical note__

Two different options are presented for calculating quantiles for use as cut points in this categorisation function. The methods are described under the Quantiles page. Method 1 (default) corresponds to the default method used in Stata and Method 2 is equivalent to the alternative definition used in Stata.

Copyright © 2000-2016 StatsDirect Limited, all rights reserved. Download a free trial here.