Different ways to create multicategory variables

Coding of multicategory variables can be done in many different ways. The simplest solution is to use the commands generate and replace to code one category at a time. This works fine for a few categories. You then start by coding an output value using generate, and then use replace commands to moderate values ​​based on conditions (one command line for each value). The disadvantage of this is that you risk ending up with many command lines and long scripts that require resources and take a long time to run.

If you want to code many categories, possibly use complicated conditions, then it is recommended to use the recode() command. This can be used to set up all the code conditions in a single command statement, making scripts more compact and faster to run. Through recode() you can, among other things, enter value intervals and create associated value labels (so that you don’t have to do this afterwards through the define-labels and assign-labels commands).

A third set of tools for setting up code expressions for multicategory variables are the inlist() and inrange() functions. These are ideal if you want to create extensive code conditions, usually in combination with generate and replace, e.g. if you want to make a rough grouping of municipalities, where you need to list larger sets of municipality codes.

For those who want to set up code expressions for many categories, there is a fourth option: Automatic generation of recoding by uploading a punctuation-separated recoding file. For more on this, click here.

You will find more information about using generate, replace, recode(), inlist(), inrange() and automatic recoding in the User Guide chapters 3.1 – 3.2.

This script demonstrates the different ways to code multicategory variables:

require no.ssb.fdb:23 as ds

create-dataset demo

//Lage flere kategorier vha generate og replace
import ds/INNTEKT_BRUTTOFORM 2020-01-01 as formue

generate formueint = 1
replace formueint = 2 if formue > 500000
replace formueint = 3 if formue > 1000000
replace formueint = 4 if formue > 1500000

tabulate formueint


//Lage flere kategorier (verdensregioner) vha recode
create-dataset befolkning
import ds/BEFOLKNING_STATUSKODE 2021-01-01 as statuskode
keep if statuskode == '1'

import ds/BEFOLKNING_FODELAND as fødeland
tabulate fødeland

destring fødeland
recode fødeland (111 120 138 139 140 148 155 156 159/164 = 2 'Europeiske land utenom EU') (101/141 144/158 = 1 'EU/EØS') (203/393 = 3 'Afrika') (143 404/578 = 4 'Asia med Tyrkia') (612 684 = 5 'Nord-Amerika') (601/775 = 6 'Sør- og Mellom-Amerika') (802/840 = 7 'Oseania') (980 = 8 'Statsløse') (990 = 9 'Uoppgitt')
tabulate fødeland


//Lager kode for storby basert på kommunenumre til de fire største kommunene ved bruk av inlist()
import ds/BOSATTEFDT_BOSTED 2021-12-31 as kommune
generate storby = 0
replace storby = 1 if inlist(kommune,'0301','4601','1103','5001')
tabulate kommune if storby, rowsort()


//Grupperer årlig lønnsinntekt i seks grupper ved bruk av inrange()
import ds/INNTEKT_LONN 2021-12-31 as lønn

generate lønn_gr = 0
replace lønn_gr = 1 if inrange(lønn,1,200000)
replace lønn_gr = 2 if inrange(lønn,200001,400000)
replace lønn_gr = 3 if inrange(lønn,400001,600000)
replace lønn_gr = 4 if inrange(lønn,600001,800000)
replace lønn_gr = 5 if lønn > 800000

define-labels lønn_int 0 '0 kr' 1 '1 - 200 000 kr' 2 '200001 - 400 000 kr' 3 '400 001 - 600 000 kr' 4 '600 001 - 800 000 kr' 5 '800 000 kr ->'
assign-labels lønn_gr lønn_int
tabulate lønn_gr

//Alternativ måte å gruppere lønnsinntekt ved bruk av recode
replace lønn = 0 if sysmiss(lønn)
recode lønn (1/200000 = 1)(200001/400000 = 2)(400001/600000 = 3)(600001/800000 = 4)(800001/max = 5)
assign-labels lønn lønn_int
tabulate lønn