How to aggregate and create municipal level contextual data

The example below shows how to create aggregated data at municipality level (contextual data) for use in a simple regression analysis.

You begin by creating a regular personal data set with selected information, including a variable to be used as a link key for the aggregated data, in this case the variable municipality (actual municipality of residence). Two new data sets are then created which are used to retrieve wages/wealth and unemployment/residence figures at municipal level.

The method to aggregate is to use the command collapse(), where one uses the option by() to specify the level at which one should aggregate (municipality). Then the command merge is used to connect the aggregated data on the personal data set via the provided link key (municipality).

Remember to rename the aggregated variables before connecting them to the personal data set.

The script is used as part of the thematic course on contextual data, where working with family data is also demonstrated. Click here for more about our courses.


textblock
How to link contextual data into a person dataset
-----------------------------------------------------
endblock

//Connect to database
require no.ssb.fdb:23 as db

//Create person dataset
create-dataset persons
import db/BEFOLKNING_KOMMNR_FAKTISK 2021-01-01 as municipality
import db/BEFOLKNING_KJOENN as gender
import db/INNTEKT_WLONN 2021-01-01 as wage
import db/INNTEKT_BER_BRFORM 2021-01-01 as wealth
summarize wage wealth
histogram wage, freq
histogram wealth, freq

//Create figures for meanwage og meanwealth per municipality and link into person dataset
create-dataset municipalitydata_wage_wealth
import db/BEFOLKNING_KOMMNR_FAKTISK 2021-01-01 as municipality
import db/INNTEKT_WLONN 2021-01-01 as wage
import db/INNTEKT_BER_BRFORM 2021-01-01 as wealth
collapse (mean) wage wealth, by(municipality)
rename wage meanwage_municipality
rename wealth meanwealth_municipality
summarize meanwage_municipality meanwealth_municipality
merge meanwage_municipality meanwealth_municipality into persons on municipality

//Create figures for number of unemployed and residents per municipality and link into person dataset
create-dataset municipalitydata_unempl_resident
import db/BEFOLKNING_KOMMNR_FAKTISK 2021-01-01 as municipality
import db/ARBSOEK2001FDT_HOVED 2021-01-01 as ws_status
generate unemployed = 1 if ws_status == '1'
generate resident = 1
collapse (sum) unemployed resident, by(municipality)
summarize unemployed resident
merge unemployed resident into persons on municipality

//Use the person dataset to perform a simple regression analysis
use persons
generate unempl_pst = (unemployed / resident) * 100

generate male = 0
replace male = 1 if gender == '1'

generate oslo = 0
replace oslo = 1 if municipality == '0301'

generate bigmunicipality = 0
replace bigmunicipality = 1 if resident > 80000

generate rich_municipality = 0
replace rich_municipality = 1 if meanwealth_municipality > 2000000

regress wage male oslo wealth meanwealth_municipality meanwage_municipality unempl_pst resident
regress wage male oslo wealth rich_municipality meanwage_municipality unempl_pst bigmunicipality