Regression analyzes with robust and cluster estimates

Until now, it has only been possible to use robust and cluster estimation on ordinary linear regressions in It will now be possible to do this for all types of regressions found in the analysis system, including logit, probit, multinomial logit, and linear panel regressions.

The options robust and cluster() are used separately to specify the use of resp. robust or cluster estimation, and will as a result present regression estimates with adjusted standard deviations for the estimated coefficients. Associated t-, z- and p-values ​​are also affected. Other values ​​are not affected compared to standard estimation.

Note that robust and cluster cannot be used in combination (cluster implies robust estimation).

Robust estimation can be used on data where problematic outliers or heteroskedasticity are suspected.

Cluster estimation is used when it is suspected that there are systematic dependencies within groups of observations, e.g. within schools or municipalities. The groups are specified through a variable (cluster variable) which is included in the parentheses of the cluster option, e.g. cluster (school) or cluster (municipality). The following assumptions apply, otherwise the system will give an error message:

  • The number of groups must be of a certain size
  • The cluster variable must be numeric
  • The cluster variable cannot be included as a variable in the regression expression.


regress income man married high_education, robust
regress income man married high_education, cluster (municipality)

For more about cluster and robust options, as well as other options, use the following commands:

  • help regress
  • help logit
  • help probit
  • help mlogit
  • help regress panel