New functionality: Outer join

For variable import in, the standard solution is “left join”, where the first variable that is imported into a dataset determines the population. But now we introduce the possibility to import data using “outer join” so that also new individuals / units that don’t exist in your original population will be added to your dataset.

Until now, it has been the case that if you want to include all units / individuals for a set of variables, then you had to start by importing a universal variable such as “gender”. Then your population will be all individuals with a value for the variable “gender”, i.e. almost everyone in the database, and there is no risk that units / individuals that only have a value for one of the next variables will not be included in the population. You can avoid this now, since you can use “outer join” as an option for import. Then you can concentrate on the variables you are actually interested in, without going a detour via “gender”.

An example of a useful case is when creating a dataset consisting of monthly job information measured over a longer period of time. Then the standard solution with “left join” will mean that only the jobs that are measured at the first measurement time form the dataset’s population. When importing job data for the following months, new jobs will not be added to the dataset since they do not exist for the first measurement date. By using “outer join”, new jobs will also be added to the dataset so that you get all active jobs over the entire time period.

The default solution with “left join” is used as long as you do not specify an “outer join” option. Example of using “outer join”: import db/INNTEKT_WLONN 2019-12-31 as wage19, outer_join

Note that “outer join” is not relevant for the first import command you run. It is only for the subsequent import commands that this functionality can be useful.

The default solution with “left join” will still be preferable if you have created a population and do not want to change it when importing new variables.

Example A: Import of working hours measured in January-March 2021 for jobs that were active in January, using standard left join:

Example B: Import of working hours measured in January-March 2021 for all jobs that were active during the period using outer join: