# Rosetta store: Generalized Mixed

keywords jamovi, R, SPSS, logistic, generalized mixed, mixed models

2.0.0

Draft version, mistakes may be around

Here you can find comparisons of results obtained in jamovi GAMLj, R, and SPSS. When not explicitly discussed, the code of different software is written with the aim of obtaining identical results across programs (not necessarily with the most efficient strategy).

In this example we compare GAMLj results with R results obtained by IDRE on their tutorial web page, and with SPSS results.

# The research design

In this example, we are going to explore lung cancer remission using
a simulated dataset. A variety of outcomes were collected on patients,
who are nested within doctors, who are in turn nested within hospitals.
There are also a few doctor level variables, such as
*Experience*, that we will use in our example (IDRE).
Data can be downloaded here

# R

The researcher is interested in studying the relationships between
cancer remission and both patients and doctors characteristics. The
analyst uses the R `glmer`

command from the
`lme4 Package`

to estimate a mixed effects logistic
regression model with `Il6`

, `CRP`

, and
`LengthofStay`

as patient level continuous predictors,
`CancerStage`

as a patient level categorical predictor (I,
II, III, or IV), `Experience`

as a doctor level continuous
predictor, and a random intercept by `DID`

, doctor ID.

## Results

```
# estimate the model and store results in m
m <- glmer(remission ~ IL6 + CRP + CancerStage + LengthofStay + Experience +
(1 | DID), data = hdp, family = binomial, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
# print the mod results without correlations among fixed effects
print(m, corr = FALSE)
```

```
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 10) [glmerMod]
## Family: binomial ( logit )
## Formula:
## remission ~ IL6 + CRP + CancerStage + LengthofStay + Experience +
## (1 | DID)
## Data: hdp
## AIC BIC logLik deviance df.resid
## 7397 7461 -3690 7379 8516
## Random effects:
## Groups Name Std.Dev.
## DID (Intercept) 2.01
## Number of obs: 8525, groups: DID, 407
## Fixed Effects:
## (Intercept) IL6 CRP CancerStageII
## -2.0527 -0.0568 -0.0215 -0.4139
## CancerStageIII CancerStageIV LengthofStay Experience
## -1.0035 -2.3370 -0.1212 0.1201
```

The analyst goes on in computing the confidence intervals of the estimates applying the simple formula \(\theta \pm 1.96 \cdot SE_{\theta}\).

```
se <- sqrt(diag(vcov(m)))
# table of estimates with 95% CI
(tab <- cbind(Est = fixef(m), LL = fixef(m) - 1.96 * se, UL = fixef(m) + 1.96 *se))
```

```
## Est LL UL
## (Intercept) -2.05269 -3.09435 -1.011025
## IL6 -0.05677 -0.07935 -0.034196
## CRP -0.02148 -0.04151 -0.001455
## CancerStageII -0.41393 -0.56243 -0.265436
## CancerStageIII -1.00346 -1.19610 -0.810828
## CancerStageIV -2.33704 -2.64683 -2.027241
## LengthofStay -0.12118 -0.18710 -0.055261
## Experience 0.12009 0.06628 0.173895
```

Also the \(e^B\) are computed.

`exp(tab)`

```
## Est LL UL
## (Intercept) 0.12839 0.04530 0.3638
## IL6 0.94481 0.92372 0.9664
## CRP 0.97875 0.95934 0.9985
## CancerStageII 0.66104 0.56982 0.7669
## CancerStageIII 0.36661 0.30237 0.4445
## CancerStageIV 0.09661 0.07088 0.1317
## LengthofStay 0.88587 0.82936 0.9462
## Experience 1.12760 1.06853 1.1899
```

# GAMLj

jamovi
GAMLj module for
`Generalized mixed models`

can be used to estimate a random
coefficient mixed model. First, it requires choose the specific model we
want to run: the `Logistic`

model.

Then we set the variables in the right field depending on their role in the model and their type.

Please note that we put `LengthOfStay`

in covariates
because it is a continuous variable, even though jamovi
recognizes it a categorical. By putting it in Covariates, we implicitly ask the module to treat
it a continuous variable. The opposite holds for `DID`

(doctors ID) which jamovi
recognizes as continuous but the module treats as categorical after we
put it in the Clusters field.

After the first setup, we need to specify the random effects.
Following the IDRE analysis, we specify the intercept as random
coefficient across doctors (`DID`

).

As soon as we include a random effect, the model is estimated

## Results

Let’s us first focus on the fixed parameters estimates, the B coefficients.

For the predictors coefficients we have substantially the same
results as in R, apart from rounding. For the intercept we have
different results. The reason is the different default coding of the
variables in GAMLj and in R. In R
default, factors are coded with the *dummy coding scheme*, that
is

```
> contrasts(hdp$CancerStage)
II III IV
I 0 0 0
II 1 0 0
III 0 1 0
IV 0 0 1
```

and continuous independent variables are not scaled. In GAMLj the default coding system for categorical variable is `simple.

which results in a centered contrast.

Continuous variables are centered by default.

Thus, in GAMLj the intercept is the expected mean of the (transformed) dependent variable, whereas in R is the expected mean of the (transformed) dependent variable for all continuous independent variables equal to zero and the group scoring all zeros in the categorical variable.

Also the exp(B) and their confidence intervals are equivalent in the two analyses, apart from rounding.

As regards the random coefficients variance, `lme4::glmer`

produced a `SD`

of 2.01, thus a variance of 4.0401, which is
noticeably different from the GAMLj results.

The reason of this difference is in the precision of the estimation.
`lme4::glmer`

estimates the model parameters evaluating the
adaptive Gauss-Hermite approximation to the log-likelihood. How good the
approximation will be is decided by the option `nAGQ`

,
specified in the `glmer`

command above. The higher the
number, the better the approximation, with a penalty in speed. Also
GAMLj lets the user decide the precision level of the estimation with
the equivalent parameter `precision/speed`

in the
`Options`

tab.

GAMLj default is a value of 1 (fast but not highly precise
estimation). If we set the `precision/speed`

parameter to 10,
we obtain exactly the same results obtained in R. The estimation will
take some time, however.

## Omnibus tests

The IDRE analysis did not report the omnibus tests. In R, one can
obtain the omnibus tests employing the `car::Anova`

command.
The command lets us specify what Type of analysis to do. We can ask of
the Type III analysis and we get the Chi-square tests.

```
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: remission
Chisq Df Pr(>Chisq)
(Intercept) 14.918 1 0.0001123 ***
IL6 24.293 1 8.273e-07 ***
CRP 4.420 1 0.0355197 *
CancerStage 256.523 3 < 2.2e-16 ***
LengthofStay 12.982 1 0.0003145 ***
Experience 19.135 1 1.218e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

Compared with GAMLj results, there’s substantial agreement even though the figures are not the same, apart from rounding. The reason is that the estimation of the deviance of the model, on which the omnibus tests are based, may be slightly influenced by the scale of the variables. It is well-known that in the general linear model (i.e. regression), the linear coefficients are invariant to linear scaling of the independent variables. That is, if you have centered or not centered the variables, the B’s are the same (when no interaction is present in the model). In generalized mixed models, the scale may slightly influence the estimation, yielding different results for re-scaled variables.

Thus, the small difference in the omnibus tests obtained with R and with GAMLj are due to the fact that in the R analysis the variables are not re-scaled, while in GAMLj they are centered. In fact, if we do not center the variables in GAMLj, results become identical to the R results.

Notice, however, that the conclusions of the analyses are equivalent no matter the scale of the variables.

# SPSS

In SPSS we can run a logistic mixed models with
`Generalized mixed models`

menu. However, from within the
`Generalized mixed models`

module one cannot specify whether
a variable is a continuous or categorical one, so we have to define
variables types in the `Variable view`

tab of the SPSS
spreadsheet window.

We want to be sure that `LengthOfStay`

, `IL6`

,
`CRP`

and `Experience`

are set as
`Scale`

in the `measure`

column, whereas
`DID`

and `CancerStage`

should be
`Nominal`

.

Then we can go to the mixed models menu.

First, we should specify the cluster variable, by dragging
`DID`

into the `Subject`

field, on the right, in
the `Canvas`

panel.

Then we can move to the `Fields & Effects`

tab and
specify the model type (`Binary logistic regression`

) and the
dependent variable (`remission`

).

On the `Select an item`

field we can then move to
`Fixed Effects`

, and set the independent variables.

The random intercept can be specified in the
`Random Effects`

tab.

Here we need to flag the `Include intercept`

option and
select `DID`

from the menu `Subject combination`

.
This means *allow intercepts to vary across DID*.

We can then run the analysis.

## Results

As regards the fixed effects coefficients, we notice that the
continuous predictors show B’s that are equivalent to R and jamovi
estimates, but with opposite sign. The intercept and the categorical
predictor are clearly very differ. Those differences are due to way SPSS
picks the reference level of the dummy coding for categorical variables.
In R and jamovi,
the reference level is the first level in the standard ordering of the
variable levels (such as the cardinal order or the alphabetical one).
SPSS picks the last group! Thus, here we are predicting the probability
of *not remission*, that is `remission=0`

, because
SPSS sets `remission=1`

as the reference level. That explains
the opposite sign. For the categorical independent variable
`CancerStage`

, SPSS picks the group
`CancerStage=IV`

as reference group, whereas GAMLj and R pick
`CancerStage=I`

. That explains the differences in the
parameters.

In SPSS generalized mixed models we can change the reference level in
the `Build Options`

tab, selecting `Descending`

in
the `Sorting Order`

options.

Now the interpretation of the coefficients is equivalent to the one for R and GAMLj estimation, although the numerical estimates are still different in SPSS. Not a big difference, but worth investigating the reason.

As regards the random coefficients variance, it is quite different from the R or GAMLj results.

Finally, as regards the omnibus tests, SPSS outputs the F-tests, not the Chi-square. Nevertheless, the p-values associated with the tests are in line with the ones obtained in GAMLj and R.

Thus, GAMLj and R `lme4::glmer`

produce results that are
numerically very close and practically identical. SPSS produces results
that are numerically different but substantially equivalent to the other
software results.

## Comments?

Got comments, issues or spotted a bug? Please open an issue on GAMLj at github or send me an email