Using instrument variables in regression analysis

The command ivregress can be used to specify instrument variables. This is applicable if you have a hypothesis that multicollinearity exists (correlation between at least two of the independent variables). You define instrument variables inside the parenthesis expression. In the example below, the instrument variable wealth_high is used, and the instrument age. But you can use as many instruments as you like. E.g. if you think that place of residence (= Oslo) also affects the amount of wealth, you can use the parenthetical expression(wealth_high = age oslo). But in principle, ivregress treats all independent variables as instruments, except for the instrument variable.

require no.ssb.fdb:23 as db

create-dataset ivtest

import db/INNTEKT_WLONN 2021-12-31 as wage
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month

generate age = 2020 - int(birth_year_month /100)
drop if age < 18 | age > 60

import db/BEFOLKNING_KJOENN as gender
generate male = 0
replace male = 1 if gender == '1'

import db/INNTEKT_BRUTTOFORM 2020-12-31 as wealth
generate wealth_high = 0
replace wealth_high = 1 if wealth > 1500000

//First run a regular linear regression
regress wage age male wealth_high

//Suspects a correlation between age and wealth. Use a model with a instrument variable (wealth_high)
ivregress wage male (wealth_high = age)

//In addition to comparing the two results, a check is performed for multicollinearity and normaldistributed residuals
correlate wealth_high age
regress-predict wage age male wealth_high, residuals(res1)
ivregress-predict wage male (wealth_high = age), residuals(res2)

histogram res1
histogram res2