Perform a random sub-selection from a total population

The example below demonstrates how to perform a random subselection from a total population (large population). The sample command can be used for such purposes.

The first input parameter defines the sample size. If this is a decimal number (0.0-1.0), a percentage share will be extracted. By specifying a positive integer > 1000, it is possible to extract a random sample consisting of this particular number of individuals.

The last input parameter is a custom positive integer, i.e. a seed number. This ensures that the sample individuals are identical when performing consecutive sample executions. By choosing a new seed number, a new selection of individuals will be randomly extracted.

//Connect to datastore
require no.ssb.fdb:13 as db

//Create dataset containing all residents in Norway per 1/1 2019 and then make a 10% sample
create-dataset totalpop
import db/BEFOLKNING_STATUSKODE 2019-01-01 as regstatus19
keep if regstatus19 == '1'
sample 0.1 999

//Create dataset containing all residents in Norway per 1/1 2019 and then make a sample consisting of 5000 individuals
create-dataset totalpop2
import db/BEFOLKNING_STATUSKODE 2019-01-01 as regstatus19
keep if regstatus19 == '1'
sample 5000 888

//Create dataset containing all residents in Norway per 1/1 2019 and then make a sample consisting of 5000 new individuals (different from the previous sample)
create-dataset totalpop3
import db/BEFOLKNING_STATUSKODE 2019-01-01 as regstatus19
keep if regstatus19 == '1'
sample 5000 950