How to define and create populations

The analysis system microdata.no is based on the left-join principle. This means that the first variable that is created determines the size of the population. If this is a universal variable such as gender, your dataset will consist of as many individuals as possible from the total database, including people who have died, emigrated or were not born at the time you want to analyse.

Do you start by importing a variable that only applies to a limited number of individuals, e.g. disability benefits, your dataset will only consist of people in Norway who received disability benefits at the relevant time you are measuring.

You cannot make the population larger than variable no. 1 allows for (unless you use the option outer_join). So you have to think about which variable you start importing, so that you get all the people you want to analyse with you. However, you can make populations smaller if you wish. This is done using the drop if or keep if command.

//Connect to datastore
require no.ssb.fdb:23 as db

//Example 1: Population = all residents of Bergen as of 1/1 2021
create-dataset eks1
import db/BEFOLKNING_KOMMNR_FAKTISK 2021-01-01 as residence
keep if residence == '4601'

//Example 2: Population = all residents of Vestland as of 1/1 2021
create-dataset eks2
import db/BEFOLKNING_KOMMNR_FAKTISK 2021-01-01 as residence
keep if substr(residence, 1, 2) == '46'

//Example 3: Population = everyone with an occupational income in the year 2021 (= everyone with an annual occupational income > 0 in 2021)
create-dataset eks3
import db/INNTEKT_WYRKINNT 2021-12-31 as work_income

//Example 4: When you start with a universal variable but actually want to analyse individuals with given characteristics at a given time
create-dataset eks4
import db/BEFOLKNING_KJOENN as gender
import db/INNTEKT_WYRKINNT 2021-12-31 as work_income
drop if sysmiss(work_income)

//Example 5: When you start with a universal variable but actually want to analyse individuals who were actually resident in Norway at a given time. The variable BEFOLKNING_STATUSKODE is suitable for this purpose as it contains codes for respectively "resident", "dead" and "emigrated". "Resident" has the code '1'.
create-dataset eks5
import db/BEFOLKNING_KJOENN as gender
import db/BEFOLKNING_STATUSKODE 2021-01-01 as regstatus
keep if regstatus == '1'