Using instrument variables in regression analysis

The ivregress command can be used to specify instrument variables. This is relevant if there is a suspicion of multicollinearity (correlation between at least two of the independent variables). Instrument variables are defined inside the parentheses expression. In the example below, the instrument variable wealth_high and the instrument age are being used. But you can use as many instruments as you want. For example, if you also think that place of residence (= Oslo) affects wealth_high, you can use the parentheses expression (wealth_high = age oslo). But in principle, ivregress treats all independent variables as instruments, except the instrument variable.

require no.ssb.fdb:13 as db

create-dataset ivtest

import db/INNTEKT_WLONN 2019-12-31 as wage
import db/BEFOLKNING_FOEDSELS_AAR_MND as birthdate

generate age = 2018 - int(birthdate /100)
drop if age < 18 | age > 60

import db/BEFOLKNING_KJOENN as gender
generate male = 0
replace male = 1 if gender == '1'

import db/INNTEKT_BRUTTOFORM 2018-12-31 as wealth
generate wealth_high = 0
replace wealth_high = 1 if wealth > 1500000

//Performs a regular linear regression
regress wage age male wealth_high

//Suspects correlation between age and wealth. Use instrument variabel (wealth_high)
ivregress wage male (wealth_high = age)

//In addition to comparing the two outputs, we need to check for multicollinearity and normal distribution
correlate wealth_high age
regress-predict wage age male wealth_high, residuals(res1)
ivregress-predict wage male (wealth_high = age), residuals(res2)

histogram res1
histogram res2