Restructure datasets from wide to long format

For statistics and analyses in microdata.no, datasets created through the command import are normally used. These are data sets of the “wide” type, where information about all units in a population is structured horizontally at a variable level. The new reshape-to-panel command now makes it possible to change the data structure to long-format (panel-format), where information about each unit (individual) is structured vertically at the observation / record level.

Variables that are measured over several times and that you want in long / panel format, must be named through reshape-to-panel with specified prefixes that consist of the letters (prefix) from the original variable in the wide dataset. Other variables for which no prefix is specified, typically information that is only measured once (gender, country of birth, etc), are automatically defined as fixed information and the values for these are repeated for all sub-levels of each unit.

The script below shows how to use reshape-to-panel in practice. First, a standard dataset of the wide type is created, consisting of a 1% random sample of everyone who was registered resident in Norway as of 1/1 2018. In addition, the variables regstat (register status), civstat (civil status) and wage (annual salary) are imported for the years 2018-2020, as well as the fixed information gender. Then the command reshape-to-panel is used to restructure the dataset into long format (panel data). The data we want in long/panel format is entered by specifying the prefix to the sets of variables (the letter part before numbers that refers to years etc.), in this case regstat, civstat and wage. Gender is a fixed information, and we therefore do not need to include a variable prefix for this in the command expression. The finished long data set will include the variable date@panel which contains the sublevel value of each unit. In this case, all units will have sublevels 18, 19 and 20 (i.e. three sublevels/observations/records each).

require no.ssb.fdb:23 as db

//First create a regular wide dataset consisting of a 1% sample of all residents per 1/1 2018
create-dataset wide
import db/BEFOLKNING_STATUSKODE 2018-01-01 as regstat18
keep if regstat18 == '1'
sample 0.01 333
import db/BEFOLKNING_STATUSKODE 2019-01-01 as regstat19
import db/BEFOLKNING_STATUSKODE 2020-01-01 as regstat20
import db/SIVSTANDFDT_SIVSTAND 2018-01-01 as civstat18
import db/SIVSTANDFDT_SIVSTAND 2019-01-01 as civstat19
import db/SIVSTANDFDT_SIVSTAND 2020-01-01 as civstat20
import db/BEFOLKNING_KJOENN as gender
import db/INNTEKT_WLONN 2018-01-01 as wage18
import db/INNTEKT_WLONN 2019-01-01 as wage19
import db/INNTEKT_WLONN 2020-01-01 as wage20

//Generate statistics
tabulate regstat18, missing
tabulate regstat19, missing
tabulate regstat20, missing
tabulate civstat18, missing
tabulate civstat19, missing
tabulate civstat20, missing
tabulate gender, missing

summarize wage18 wage19 wage20

//Convert to long format (paneldata)
reshape-to-panel regstat civstat wage

//Test for compliance with the count for the wide dataset
tabulate date@panel, missing

tabulate-panel regstat, missing
tabulate-panel civstat, missing
tabulate-panel regstat civstat, missing

tabulate-panel gender, missing
tabulate-panel regstat gender, missing
tabulate-panel civstat gender, missing

summarize wage
summarize-panel wage