keywords jamovi, GLM, effect size indices, omega-squared, eta-squared, epsilon-squared
GAMLj version ≥ 2.6.1
Standardized Effect size indices produced by GLM module are the following:
All coefficients but the betas are computed with the approapriate function of the R package effectsize, with some adjustment.
For continuous variables, it simply corresponds to the B coefficient obtained after standardizing all variables in the model. The standardization of the continuous variables is done before any transformation is applied, so if a complex model requires interaction or polynomial terms, the terms are computed after standardization, and the \(\beta\) are consistent.
For categorical variables, however, some comments are in order:
Categorical variables are not standardized in GAMLj, so the \(\beta\) should be interpreted in terms of
standardized differences in the dependent variable between the levels
contrasted by the corresponding contrast. Consider the following
example: Two groups (variable groups
) of size 20 and 10
respectively, are compared on a variable Y. If one uses GAMLj default contrast coding
(simple
), the B is the difference in groups means. The
\(\beta\) is the difference between the
average z-scores of the dependent variable between the two groups.
Assume these are the results:
The beta is 0.352, so it means that if we compute the mean difference between groups in the standardized y, we obtain 0.352. In fact.
However, the \(\beta\) you obtain is not the correlation between zy and groups. The correlation is 0.169:
Why is there this discrepancy? Because the groups are not balanced, so when the correlation is computed, the variable groups is standardized, so the contrast coding values depend on the relative size of the groups. The actual groups coding values used by the Pearson’s correlations are the following:
Thus, the correlation corresponds to running a regression with zy as dependent variables and a continuous variable featuring either -.695 or 1.390 as values. The \(\beta\) yielded by GAMLj, instead, is the mean difference between X levels on the standardized Y. Please notice that other software may yield different \(\beta\)’s for categorical variables.
If two groups are balanced and homeschedastic, the \(\beta\) associated with a
deviation
contrast corresponds to the fully standardized
coefficient.
This is the proportion of total variance uniquely explained by the associated effect. Being \(SS_{eff}\) the sum of squares of the effect, \(SS_{res}\) the sum of squares of the residuals or of SS error, and \(SS_{model}\) the sum of sum of squares of the whole model, we have:
\[\eta^2={{SS_{eff} \over {SS_{model}+SS_{res}}}}\]
where \(SS_{model}+SS_{res}=SS_{total}\) and \(SS_{total}=\sum(y_i-\bar{y})^2\) and \(SS_{model}=\sum(\hat{y_i}-\bar{\hat{y}})^2\).
Please notice that although the computation of the effect size
indexes and their confidence intervals is carried out employing effectsize R package,
GAMLj makes a correction to the
computation of \(\eta^2\) and of the
other non-partial indices. effectsize
R package, infact,
defines the total sum of squares as \(SS_{total}^*=\sum{SS_{f}+SS_{res}}\), where
\(f\) is any effect in the model. For
balanced designs and many other models, \(SS_{total}^*=SS_{total}\), so no issue
arises. However, there are certain models in which \(SS_{total}^*\ne SS_{total}\), and so the
index looses some of its properties when computed based on \(SS_{total}^*\). GAMLj operates a correction such that all the
non-partial indeces are always computed based on \(SS_{total}=SS_{model}+SS_{res}\).
With the correction, one property \(\eta^2\) retains even when \(SS_{total}^*\ne SS_{total}\) is that \(\eta^2=r_{sp}^2\), where \(r_{sp}\) is the semi-partial correlation
(Cohen, West, and Aiken 2014). We can use
the exercise
dataset from (Cohen, West, and Aiken
2014) to see this in practice (refer to
GLM: Multiple regression, moderated
regression, and simple slopes for a complete analysis). If we run a
multiple regression yendu~xage+zexer
and ask for the \(\eta^2\)’s, we obtain the following
results:
Going in jamovi, Regression->Partial correlation we can compute the semi-partial correlation of each IV coviariating the other. This gives:
Squaring it gives \(r_{yx.z}^2=-0.230^2=0.053\), which is equal to the corrisponding \(\eta^2=.053\).
Squaring the second \(r_{sp}^2\) gives \(r_{yz.x}^2=0.388^2=0.15\), considering rounding, which is equal to the corrisponding \(\eta^2=.150\).
If we used \(SS_{total}^*=\sum{SS_{f}}+SS_{res}\), results would be:
that are clearly not corresponding to the \(r_{sp}^2\), as they should be. We should note, however, that the two methods of estimation usually give very similar results, even when \(SS_{total}^* \ne SS_{total}\) and exactly the same results when \(SS_{total}^*=SS_{total}\).
The same reasoning holds for all the non-partial indices.
For many people, the fact that \(SS_{model}^* \ne SS_{model}\) may come as a surprise. Maybe because in the ANOVA tradition, with balanced designs, \(\sum{SS_{f}}\) is always equal to \(SS_{model}\), or maybe because it would be nicer if it was always like in the ANOVA. Since we have to accept that in the GLM \(\sum{SS_{f}}\) is not necessarily equal to \(SS_{model}\), let us see when this happens.
There are two cases: The easiest to understand is when \(\sum{SS_{f}} \lt SS_{model}\). This case happens when the independent variables are positively (or all negatively) correlated so each variable explains a unique part of the variance, but the model sum of square involves also some shared variance, which ends up in \(SS_{model}\) but not in \(\sum{SS_{f}}\).
A little trickier is the case when \(\sum{SS_{f}} \gt SS_{model}\), because it seems strange that the sum of effects is larger than the overall combined effect.
Consider a regression \(y=a+b_{yx.z}x+b_{yz.x}z\). Recall the the \(SS_{yx.z}\) (\(x\) explains \(y\) keeping constant \(z\)) is computed as \(SS_{model}-SS_{yz}\), where \(SS_{yz}\) is the sum of squares explained by \(z\) without \(x\) in the model, and \(SS_{yz.x}=SS_{model}-SS_{yx}\). It follows that the \(\sum{SS_f}=2\cdot SS_{model}-SS_{yx}-SS_{yz}\) is larger than \(SS_{model}\) when \(SS_{model} > SS_{yx}+SS_{yz}\). This necessarily means that at least one variable explains more variance while keeping constant the other than alone.
Indeed, in the example above about exercising, the SS of age alone is 452, and exer alone is 3234, which sum to 3686, less than 4751, the multiple regression model SS.
The question is now: when does this happen? Well, it happens when \(B_{yz}\), the coefficient associated with \(z\) in simple regression, is smaller (in absolute value) than the partial coefficient of \(B_{yz.x}\) of a multiple regression. Because \(B_{yz.x}=B_{yz} - B_{yx.z}\cdot B_{xz}\), this happens when \(B_{yz}\) and \(B_{yx.z}\cdot B_{xz}\) have different signs: therefore, we have a suppression effect!
In the example above, focusing on exer (the z variable), we have \(B_{yz.x}=.916\), \(B_{yx.z}=-.257\) and \(B_{xz}=.598\), thus \(B_{yx.z}\cdot B_{xz}=-.257*.598=-0.154\), which has a different sign than \(B_{yz}=.762\). Thus, the effect of exercising is stronger when keeping constant age than alone: age suppresses the effect of exercising on endurance.
This is the proportion of partial variance uniquely explained by the associated effect. That is, the variance uniquely explained by the effect expressed as the proportion of variance not explained by the other effects. Here the variance explained by the other effects in the model is completely partialed out. Its formula is:
\[\eta^2p={{SS_{eff} \over {SS_{eff}+SS_{res}}}}\]
clearly, if there is only one independent variable, \(\eta^2=\eta^2p\)
This is the expected value in the population of the proportion of variance uniquely explained by the associated effect. In other words, it is the unbiased version of \(\eta^2\). There are different formulas to visualize its computation, here is one. If \(df_{res}\) are the degrees of freedon of the residual variance, \(df_{eff}\) are the degrees of freedom of the effect, we have:
\[\omega^2={{SS_{eff}-SS_{res} \cdot ({df_{eff}/df_{res}) \cdot }}\over{ SS_{model}+SS_{res}(df_{res}+1)/df_{res}}}\]
It’s clear that omega is similat to \(\eta^2\), but applies a correction for the denominator.
This is the expected value in the population of the proportion of partial variance uniquely explained by the associated effect. In other words,it is the unbiased version of \(\eta^2p\). With N being the sample size, We have:
\[\omega^2p={{SS_{eff}-SS_{res} \cdot ({df_{eff}/df_{res}) \cdot }}\over{ SS_{eff}+SS_{res} \cdot [{(N-df_{eff})/df_{res}}] }}\]
It’s clear that omega is similat to \(\eta^2p\), but applies a correction for the degress of freedom. In fact, as N increases, the two indices converge.
Epsilon-squared is another correction of \(\eta^2\), but the correction involves only the estimation of the sum of squares of the effect, not the partial variance on which the effect is compared
\[\epsilon^2={{SS_{eff}-SS_{res} \cdot ({df_{eff}/df_{res}) \cdot }}\over{ SS_{model}+SS_{res}}}\]
As for the non-partial Epsilon, the partial Epsilon-squared is a correction of \(\eta^2p\), but the correction involves only the estimation of the sum of squares of the effect, not the partial variance on which the effect is compared
\[\epsilon^2p={{SS_{eff}-SS_{res} \cdot ({df_{eff}/df_{res}) \cdot }}\over{ SS_{eff}+SS_{res}}}\]
From version 2.6.1 on, all the effect size indices are available also
for the simple effects. To compute them, GAMLj extracts the SS of the simple effect
from R emmeans
F-tests. The SS residuals and SS model is
extracted from the model summary, given that both SS do not change when
simple effects are computed. Then the indices are computed using the
previously described formulas.
In particular, if the simple effect is \(se\): \[SS_{res}=\sigma^2\cdot df_{res}\]
where \(\sigma\) is extracted as
sigma(model)
.
\[SS_{model}={{F_{model}\cdot df_{model}} \cdot {SS_{res} \over {df_{res}}}}\]
and
\[SS_{se}={{F_{se}\cdot df_{se}} \cdot {SS_{res} \over {df_{res}}}}\]
In option tab Options
it is possible to ask additional
tables for the effect size indices, containing the effect size indices
and their confidence intervals (here an example with the
exercise
dataset)
Details for the confidence intervals computation can be found in Ben-Shachar, Makowski & Lüdecke (2020). Compute and interpret indices of effect size. CRAN
Got comments, issues or spotted a bug? Please open an issue on GAMLj at github or send me an email