Restructure datasets from wide to long format

For statistics and analyzes in microdata.no, datasets created through the command import are normally used. These are data sets of the “wide” type, where information about all units in a population is structured horizontally at a variable level. The new reshape-to-panel command now makes it possible to change the data structure to long-format (panel-format), where information about each unit (individual) is structured vertically at the observation / record level.

Variables that are measured over several times and that you want in long / panel format, must be named through reshape-to-panel with specified prefixes that consist of the letters (prefix) from the original variable in the wide dataset. Other variables for which no prefix is ​​specified, typically information that is only measured once (gender, country of birth, etc), are automatically defined as fixed information and the values ​​for these are repeated for all sub-levels of each unit.

The script below shows how to use reshape-to-panel in practice. First, a standard dataset of the wide type is created, consisting of a 1% random sample of all who were registered as resident in Norway as of 1/1 2018. In addition, the variables regstat (register status), sivstand (marital status) and lønn (annual salary) are imported for the years 2018-2020, as well as the fixed information kjønn (gender). Then the reshape-to panel command is used to restructure the dataset to long-format (panel data). The data we want in long / panel format is specified by entering the prefix of the sets of variables (the letter part preceding the digit that is linked to the year or the like), in this case regstat, sivstand and lønn. Gender is a fixed information, and we therefore do not need to include a variable prefix for this in the command expression. The finished long dataset will include the variable date@panel which contains the value of the sublevel of each unit. In this case, all units will have sub-levels 18, 19 and 20 (i.e. three sub-levels / observations / records each).

require no.ssb.fdb:17 as db

//Create a common type wide format dataset consisting of a 1% random sample of all residents per 1/1 2018
create-dataset wide
import db/BEFOLKNING_STATUSKODE 2018-01-01 as regstat18
keep if regstat18 == '1'
sample 0.01 333
import db/BEFOLKNING_STATUSKODE 2019-01-01 as regstat19
import db/BEFOLKNING_STATUSKODE 2020-01-01 as regstat20
import db/SIVSTANDFDT_SIVSTAND 2018-01-01 as sivstand18
import db/SIVSTANDFDT_SIVSTAND 2019-01-01 as sivstand19
import db/SIVSTANDFDT_SIVSTAND 2020-01-01 as sivstand20
import db/BEFOLKNING_KJOENN as kjønn
import db/INNTEKT_WLONN 2018-01-01 as lønn18
import db/INNTEKT_WLONN 2019-01-01 as lønn19
import db/INNTEKT_WLONN 2020-01-01 as lønn20

//Run some testings of wide format data
tabulate regstat18, missing
tabulate regstat19, missing
tabulate regstat20, missing
tabulate sivstand18, missing
tabulate sivstand19, missing
tabulate sivstand20, missing
tabulate kjønn, missing

summarize lønn18 lønn19 lønn20

//Convert to long format (panel data)
reshape-to-panel regstat sivstand lønn

//Run some new tests of the long format data for comparing purposes
tabulate date@panel, missing

tabulate-panel regstat, missing
tabulate-panel sivstand, missing
tabulate-panel regstat sivstand, missing

tabulate-panel kjønn, missing
tabulate-panel regstat kjønn, missing
tabulate-panel sivstand kjønn, missing

summarize lønn
summarize-panel lønn