Creating and modifying a dataset

You can create a standard wide-format data set by

  1. creating a link to a database using the require command (only needs to be done once per script),
  2. creating an empty dataset using the create-dataset command,
  3. and importing at least one variable into the empty dataset using the import command.

Unless you have special needs, it is recommended to connect to the latest version of the relevant database. You then get access to all the latest variables and the latest updates. The version number can be found by looking at the top left of the variable overview.

You can only import one variable at a time when creating a wide-format data set through the import command. This does two things:

  • Retrieves data observations for a given time (measurement time is not specified for fixed information such as gender)
  • Links the data to the current population via a unique built-in unit identifier series (on first import, all observations for the given time are retrieved)

It is possible to override the so-called left-join principle by using the import option outer_join. Then you will instead retrieve all observations for the given time, also for those that are not already in the data set population. This can be useful if you want to retrieve data on all individuals over a longer period of time (through repeated measurements for a given variable), and not just for those who had an observation at the first measurement time. Chapter 2.3.1 in the User Guide explains more about this.

After the dataset is created, it can be modified as needed. For example, you can rename datasets or variables, remove variables, or remove observations.

Example:

require no.ssb.fdb:23 as db

create-dataset demografidata
import db/BEFOLKNING_KJOENN as kjønn
import db/BEFOLKNING_FOEDSELS_AAR_MND as faarmnd
import db/SIVSTANDFDT_SIVSTAND 2020-01-01 as sivstand
import db/INNTEKT_BRUTTOFORM 2020-01-01 as formue

// Endrer navn på variabler ved å legge til årsangivelse
rename sivstand sivstand20
rename formue formue20

// Sletter variabelen kjonn fra datasettet
drop kjønn

// Beholder kun gifte personer i datasettet
keep if sivstand20 == '2'