How to define and create populations

The microdata.no analysis system is based upon the left-join-principle. This means that the first variable imported into your dataset defines the maximum size of the population. If this is a universal variable such as gender, you will ensure that the dataset consists of as many as possible from the total Norwegian population, including persons who are eighter dead, emigrated or not born at the specific time measurement(s) you wish to base your analysis on.

If you start by importing a variable that only measure a limited number of individuals, f.x. data on work disable benefits, your data will only consist of people in Norway that received such benefits at the particular time of measurement.

It is not possible to make the population bigger than what variable number 1 allows. Therefore it is important to evaluate which variable is to imported first, in order to include all the relevant individuals/units in your analysis. It is possible to make your population smaller in later stages by using the  drop if or keep if commands.

//Connect to datastore
require no.ssb.fdb:13 as db

//Example 1: Population = all residents in Bergen per 1/1 2020
create-dataset example1
import db/BEFOLKNING_KOMMNR_FAKTISK 2020-01-01 as municipality
keep if municipality == '4601'

//Example 2: Population = all residents in Vestland county per 1/1 2020
create-dataset example2
import db/BEFOLKNING_KOMMNR_FAKTISK 2020-01-01 as municipality
keep if substr(municipality, 1, 2) == '46'

//Example 3: Population = people with workrelated income during 2019 (= people with yearly workrelated income > 0 in 2019)
create-dataset example3
import db/INNTEKT_WYRKINNT 2019-12-31 as workincome

//Example 4: When starting with a universal variable but really want to analyze individuals with given characteristics at a given time
create-dataset example4
import db/BEFOLKNING_KJOENN as gender
import db/INNTEKT_WYRKINNT 2019-12-31 as workincome
drop if sysmiss(workincome)

//Example 5: When starting with a universal variable but really want to analyze individuals who were actually resident in Norway at a given time. The variable db/BEFOLKNING_STATUSKODE is suitable for this purpose as it contains codes for resp. "resident", "dead" and "emigrated". "Resident" has the code '1'.
create-dataset example5
import db/BEFOLKNING_KJOENN as gender
import db/BEFOLKNING_STATUSKODE 2020-01-01 as regstatus
keep if regstatus == '1'