Using instrument variables in regression analysis
The ivregress
command can be used to specify instrument variables. This is relevant if there is a suspicion of multicollinearity (correlation between at least two of the independent variables). Instrument variables are defined inside the parentheses expression. In the example below, the instrument variable wealth_high
and the instrument age
are being used. But you can use as many instruments as you want. For example, if you also think that place of residence (= Oslo) affects wealth_high
, you can use the parentheses expression (wealth_high = age oslo)
. But in principle, ivregress
treats all independent variables as instruments, except the instrument variable.
require no.ssb.fdb:13 as db
create-dataset ivtest
import db/INNTEKT_WLONN 2019-12-31 as wage
import db/BEFOLKNING_FOEDSELS_AAR_MND as birthdate
generate age = 2018 - int(birthdate /100)
drop if age < 18 | age > 60
import db/BEFOLKNING_KJOENN as gender
generate male = 0
replace male = 1 if gender == '1'
import db/INNTEKT_BRUTTOFORM 2018-12-31 as wealth
generate wealth_high = 0
replace wealth_high = 1 if wealth > 1500000
//Performs a regular linear regression
regress wage age male wealth_high
//Suspects correlation between age and wealth. Use instrument variabel (wealth_high)
ivregress wage male (wealth_high = age)
//In addition to comparing the two outputs, we need to check for multicollinearity and normal distribution
correlate wealth_high age
regress-predict wage age male wealth_high, residuals(res1)
ivregress-predict wage male (wealth_high = age), residuals(res2)
histogram res1
histogram res2