How to aggregate and create municipal level contextual data

The example below shows how to create aggregate data at the municipal level (contextual data) for use in a simple regression analysis.

You start by creating a regular person level data set with selected information, including a variable to be used as the merge key for the aggregated data, in this case the variable municipality (actual municipality of residence). Next, you need to create two new datasets that are used to retrieve resp. wage/wealth and unemployment/resident figures at municipal level.

The method for aggregating is to use the collapse() command in combination with the option by() to specify the level of aggregation (municipality). Then the merge command is used to connect the aggregated data into the person level data set via the specified merge key (municipality).

NB! Remember to rename the aggregate variables before merging them into the person level dataset.

//Connect to database
require no.ssb.fdb:15 as db

//Create person level dataset 
create-dataset persons
import db/BEFOLKNING_KOMMNR_FAKTISK 2019-01-01 as municipality
import db/BEFOLKNING_KJOENN as gender
import db/INNTEKT_WLONN 2019-01-01 as wage
import db/INNTEKT_BER_BRFORM 2019-01-01 as wealth
summarize wage wealth
histogram wage, freq
histogram wealth, freq

//Create municipality level dataset containing mean wage and wealth, and merge into person level dataset
create-dataset municipality_wage_wealth
import db/BEFOLKNING_KOMMNR_FAKTISK 2019-01-01 as municipality
import db/INNTEKT_WLONN 2019-01-01 as wage
import db/INNTEKT_BER_BRFORM 2019-01-01 as wealth
collapse (mean) wage wealth, by(municipality)
rename wage meanwage_mun
rename wealth meanwealth_mun
summarize meanwage_mun meanwealth_mun
merge meanwage_mun meanwealth_mun into persons on municipality

//Create municipality level dataset containing number of unemployed and residents, and merge into person level dataset
create-dataset municipality_unempl_residents
import db/BEFOLKNING_KOMMNR_FAKTISK 2019-01-01 as municipality
import db/ARBSOEK2001FDT_HOVED 2019-01-01 as ws_status
generate num_unempl = 1 if ws_status == '1'
generate num_residents = 1
collapse (sum) num_unempl num_residents, by(municipality)
summarize num_unempl num_residents
merge num_unempl num_residents into persons on municipality

//Finally, use person level dataset to perform a simple regression analysis
use persons
generate unempl_pst = (num_unempl / num_residents) * 100

generate male = 0
replace male = 1 if gender == '1'

generate oslo = 0
replace oslo = 1 if municipality == '0301'

generate large_mun = 0
replace large_mun = 1 if num_residents > 80000

generate rich_mun = 0
replace rich_mun = 1 if meanwealth_mun > 2000000

regress wage male oslo wealth meanwealth_mun meanwage_mun unempl_pst num_residents 
regress wage male oslo wealth rich_mun meanwage_mun unempl_pst large_mun