The difference is that we have now constrained the variance of u for
group=1 to be the same as the variance of u for group=2.

If you perform this experiment with real data, you will observe the following:

You will obtain the same values for the coefficients either way.

You will obtain different standard errors and therefore
different test statistics and confidence intervals.

If u is known to have the same variance in the two groups, the
standard errors obtained from the pooled regression are better—they
are more efficient. If the variances really are different, however, then
the standard errors obtained from the pooled regression are wrong.

2. Illustration
(See the do-file and the log with the results in section 7)

I have created a dataset (containing made-up data) on y, x1,
and x2. The dataset has 74 observations for group=1 and another 71
observations for group=2. Using these data, I can run the regressions
separately by typing

The coefficients are the same, estimated either way. (The fact that the
coefficients in [3] are a little off from those in [2] is just because I did
not write down enough digits.)

The standard errors for the coefficients are different.

I also wrote down the estimated Var(u), what is reported as RMSE in
Stata’s regression output. In standard deviation terms, u has
s.d. 15.528 in group=1, 6.8793 in group=2, and if we constrain these two
very different numbers to be the same, the pooled s.d. is 12.096.

3. Pooling data without constraining residual variance

We can pool the data and estimate an equation without constraining the
residual variances of the groups to be the same. Previously we typed

In the above, the constant 3 that appears twice is 3 because there
were three coefficients being estimated in each group (an intercept, a
coefficient for x1, and a coefficient for x2). If there were
a different number of coefficients being estimated, that number would
change.

In any case, this will reproduce exactly the standard errors reported by
estimating the two models separately. The advantage is that we can now test
equality of coefficients between the two equations. For instance, we can
now read right off the pooled regression results whether the effect of
x1 is the same in groups 1 and 2 (answer: is _b[g2x1]==0?, because
_b[x1] is the effect in group 1 and _b[x1]+_b[g2x1] is the effect in group
2, so the difference is _b[g2x1]). And, using
test, we can test
other constraints as well.

For instance, if you wanted to prove to yourself that the results of [4] are
the same as typing regress y x1 x2 if group==2, you could type

Those results are the same as [1] and [2]. (Pay no attention to the RMSE
reported by regress at this last step; the reported RMSE is the
standard deviation of neither of the two groups but is instead a weighted
average; see the FAQ on this if you care. If you
want to know the standard errors of the respective residuals, look back at
the output from the summarize statements typed when producing the
weighting variable.)

Technical Note:
In creating the weights, we typed

. sum r if group==1
. gen w = r(Var)*(r(N)-1)/(r(N)-3) if group==1

and similarly for group 2. The 3 that appears in the finite-sample
normalization factor (r(N)-1)/(r(N)-3) appears because there are three
coefficients per group being estimated. If our model had fewer or more
coefficients, that number would change. In fact, the finite-sample
normalization factor changes results very little. In real work, I would
have ignored it and typed

. sum r if group==1
. gen w = r(Var) if group==1

unless the number of observations in one of the groups was very small. The
normalization factor was included here so that [4] would produce the same
results as [1] and [2].

5. The (lack of) importance of not constraining the
variance

Does it matter whether we constrain the variance? Here, it does not matter
much. For instance, if after

If instead we had constrained the variances to be the same, estimating the
model using

[3] . regress y x1 x2 g2 g2x1 g2x2

and then repeated the test, the reported F-statistic would be 309.08.

If there were more groups, and the variance differences were great among the
groups, this could become more important.

6. Another way to fit the variance-unconstrained model

Stata’s xtgls, panels(het) command (see xtgls) fits exactly the
model we have been describing, the only difference being that it does not
make all the finite-sample adjustments, so its standard errors are just a
little different from those produced by the method just described. (To be
clear, xtgls, panels(het) does not make the adjustment described in
the technical note above, and it does not make the
finite-sample adjustments regress itself makes, so
variances are invariable normalized by N, the number of observations,
rather than N-k, observations minus number of estimated
coefficients.)

Anyway, to estimate xtgls, panels(het), you pool the data just as
always,

. gen g2 = (group==2)
. gen g2x1 = g2*x1
. gen g2x2 = g2*x2

and then type

[5] . xtgls y x1 x2 g2 g2x1 g2x2, panels(het) i(group)

to estimate the model. The result of doing that with my fictional data is

The standard errors produced by xtgls, panels(het) here are about 2%
smaller than those produced by [4] and in general will be a little smaller
because xtgls, panels(het) is an asymptotically based estimator. The
two estimators are asymptotically equivalent, however, and in fact quickly
become identical. The only caution I would advise is not to use xtgls,
panels(het) if the number of degrees of freedom (observations minus
number of coefficients) is below 25 in any of the groups. Then, the
weighted OLS approach [4] is better (and you should make the finite-sample
adjustment described in the above technical note).

7. Appendix: do-file and log providing results reported
above

7.1 do-file

The following do-file, named uncv.do, was used. Up until the line reading
“BEGINNING OF DEMONSTRATION’, the do-file is concerned with
constructing the artificial dataset for the demonstration:
uncv.do