Paneldata analysis
Paneldata analysis is a more advanced form of linear regression analysis where variance over time is taken into account for the variables included. This form of analysis has many similarities with ordinary regression analysis (OLS). Among other things, the dependent variable (which is listed first in the regression panel command) must contain values with continuous/metric format, e.g. income.
The main difference from ordinary regression analysis (cf. the command regress
) is that the data must be organized in a panel format where all variables are measured several times depending on how many measurement points are specified through the command import-panel
. A panel dataset will then consist of T x N observations, where T stands for the number of measurement points and N stands for the number of units in the population.
NB! Panel data sets can often become very large as each unit is measured more than once: If you analyse the entire population and use 2 measurement points, the data set will typically consist of approx. 10 million observations (5 million x 2). SO: Preferably use as small populations as possible, and preferably below 1 million units. Otherwise, the system will be pushed so that executions will be very time-consuming.
//Connect to database
require no.ssb.fdb:23 as db
//First create a paneldata population (should be as small as possible)
//Population: Persons who complete their masters studies in the autumn semester of 2015
create-dataset population
import db/NUDB_AAR_FORSTE_FULLF_HOV as compl_master
keep if compl_master > 201507 & compl_master < 201601
//Create a new and empty dataset that consists of the individuals from the dataset population
clone-units population paneldata
//Import a set of variables measured at given measurement dates into the empty dataset
use paneldata
import-panel db/INNTEKT_WLONN db/SIVSTANDFDT_SIVSTAND db/BEFOLKNING_KOMMNR_FAKTISK 2016-01-01 2017-01-01 2018-01-01 2019-01-01
//Recode and run descriptive statistics and regression analysis
rename INNTEKT_WLONN wage
generate married = 0
replace married = 1 if SIVSTANDFDT_SIVSTAND == '2'
generate oslo = 0
replace oslo = 1 if BEFOLKNING_KOMMNR_FAKTISK == '0301'
tabulate-panel married
tabulate-panel oslo
tabulate-panel married oslo
summarize-panel wage
transitions-panel oslo married
//Run paneldata regression with resp. fixed og random effects
regress-panel wage married oslo, fe
regress-panel wage married oslo, re
//Run hausman-test
hausman wage married oslo