keywords jamovi, SPSS, R, contrasts, comparisons, planned comparisons, LMATRIX, contr.sum, emmeans
GAMLj version ≥ 0.9.5
Here you can find comparisons of results obtained in jamovi GAMLj, jamovi (jmv), pure R, and SPSS. When not explicitely discussed, the code of different software is written with the aim of obtaing identical results across programs.
Example data are here. One continuous dependent variable, one factor with 4 groups.
Unfortunately, contrasts coding schemes get different names in different publications and they are implemented in different ways across software. Things can get a bit confusing because of this. For partially overlapping coding scheme definitions see UCLA idre web site, SPSS manual and Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2013). Applied multiple regression/correlation analysis for the behavioral sciences. Routledge. If you have familiarity with python, you can compare results also with python statmodels results
We use jamovi, SPSS and R. In jamovi we compare the ANOVA module in jamovi jmv
and the
<span class="modulename">GAMLj</span>
module.
In R we use pure R, meaning that we estimate the contrast
comparisons by estimating a linear model (lm()
in
particular), with the categorical variable properly coded. For more
straigthforward estimation we employ emmeans
R module. In
SPSS we use the UNIVARIATE
module.
Contrasts can be computed in two different ways:
As the coefficients of a model where the categorical variable(s) is coded accordingly to a coding scheme. jamovi GAMLj and standard R use this strategy. Understanding this approach opens to the possibility to test contrasts-based hypotheses in complex models, such as when interactions are involved, mediatiated effects or simple effects are of interest.
As a comparison between sets of means tested with model-derived
standard errors. jamovi ANOVA,
emmeans
R and SPSS use this approach. This approach is
simpler and straightforward. In simple models, it works perfectly
fine.
As a comparison between groups means, with group specific standard errors. None of the software used here uses this approach, so we do not deal with it here.
The first two approaches give the same results if the contrasts are defined in the same way. However, the contrast coding displayed by the software may be different. The majority of software prints out the contrast matrix, which contains the coding scheme describing the tested comparisons. R model estimation requires the model matrix, which is generally different from the contrast matrix and represents the weights associated with the dummy variables needed to cast a categorical variable into the linear model estimating the contrasts. How they are related and how to obtain one from the other is explained in contrast vs model matrix details
Furthermore, in simple models (such as one-way ANOVA), the two first approaches give exactly the same results. When models get more complex, results may diverge across software because they estimate different things. The present results apply to one-way ANOVA, balanced factorial ANOVA with centered coding scheme, ANCOVA models with continuous covariates centered to their means.
To interpret a contrast, it is often necessary to refer to “the first group” or “subsequent groups”. The order is always defined as the alphanumeric order.
Furthermore, different software packages have different rounding rules, so we say “the same results” when results agree apart for decimal rounding.
Definition: Compares each group mean to the grand mean. The grand mean is the mean of the groups means. The first group is omitted
Setting the contrasts to deviation
Results:
Results:
The expected means of the four groups are:
<img src=“rosetta/contrasts/r.c1.GAMLj2.png” class=“img-responsive” alt=““>
Please note that SPSS default sets the last group as reference group
(the omitted group), thus to obtain the same results as before, we
should set /CONTRAST(Group)=Deviation(1)
, mind the “1”“,
which corresponds to”first” in the GUI options.
UNIANOVA Score BY Group
/CONTRAST(Group)=Deviation(1)
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/PRINT=TEST(LMATRIX)
/CRITERIA=ALPHA(.05)
/DESIGN=Group.
In R, deviation contrast is called contr.sum
. The
function defaults to omitting the last group, so we reverse it to omit
the first.
file<-"../data/rosetta.contrasts.csv"
dat<-read.csv2(file,stringsAsFactors = F)
dat$Group<-factor(dat$Group)
dat$Score<-as.numeric(dat$Score)
### reverse contrast sum ###
contrasts(dat$Group)<-matrix(rev(contr.sum(4)),ncol=3)
contrasts(dat$Group)
## [,1] [,2] [,3]
## 1 -1 -1 -1
## 2 1 0 0
## 3 0 1 0
## 4 0 0 1
model<-lm(Score~Group,data=dat)
summary(model)
##
## Call:
## lm(formula = Score ~ Group, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0631 -0.2795 -0.0011 0.2853 1.6573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1562 0.0394 3.97 0.00011 ***
## Group1 -0.2467 0.0719 -3.43 0.00076 ***
## Group2 0.0969 0.0667 1.45 0.14809
## Group3 0.2965 0.0667 4.44 0.000016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 166 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.132
## F-statistic: 9.56 on 3 and 166 DF, p-value: 0.0000074
## means
(mm<-tapply(dat$Score, dat$Group, mean))
## 1 2 3 4
## 0.00955 -0.09056 0.25311 0.45267
To employ emmeans
package, we first estimate the model
(which coding system we use does not matter), then we run the function
emm<-emmeans(model,specs)
, where specs
is
the factor for which we want to compare the means, and then we apply
contrast(emm,contrast_type)
to the emmeans object.
contrast_type
is a string equal to the contrast function we
want to use. We add adjust="none"
to obtain the same
p-values as in the previous analyses, but any adjustment supported by
emmeans
can be applied here. If we omit the
adjust
option, the default multiplicity adjustment method
is “fdr” , cf. emmeans
package.
The deviation
contrast it is named eff
in
emmeans.
library(emmeans)
model<-lm(Score~Group,data=dat)
emm<-emmeans(model,~Group)
contrast(emm,"eff",reverse=T,adjust = "none")
## contrast estimate SE df t.ratio p.value
## Group1 effect -0.1466 0.0672 166 -2.180 0.0305
## Group2 effect -0.2467 0.0719 166 -3.430 0.0008
## Group3 effect 0.0969 0.0667 166 1.450 0.1481
## Group4 effect 0.2965 0.0667 166 4.440 <.0001
Please notice that emmeans
computes all possible
comparisons: group 1 against the grand mean, group 2 against the grand
mean, etc.
Definition: Compares each group with the first group.
We set the contrasts to simple
Results:
This is GAMLj default (since version 1.5). The contrasts in GAMLj is identical to the ANOVA module.
Please note that SPSS default sets the last group as reference group,
thus to obtain the same results as before, we should set
/CONTRAST(Group)=Simple(1)
, mind the “1”“, which
corresponds to”first” in the GUI options.
UNIANOVA Score BY Group
/CONTRAST(Group)=Simple(1)
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/PRINT=TEST(LMATRIX)
/CRITERIA=ALPHA(.05)
/DESIGN=Group.
Results:
In R, simple contrast can be obtained as follows:
#### contrast=simple #######
dat$Group<-factor(dat$Group)
k<-4 # number of levels
contrasts(dat$Group)<-contr.treatment(k)-(1/k)
contrasts(dat$Group)
## 2 3 4
## 1 -0.25 -0.25 -0.25
## 2 0.75 -0.25 -0.25
## 3 -0.25 0.75 -0.25
## 4 -0.25 -0.25 0.75
model<-lm(Score~Group,data=dat)
summary(model)
##
## Call:
## lm(formula = Score ~ Group, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0631 -0.2795 -0.0011 0.2853 1.6573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1562 0.0394 3.97 0.00011 ***
## Group2 -0.1001 0.1148 -0.87 0.38454
## Group3 0.2436 0.1083 2.25 0.02585 *
## Group4 0.4431 0.1083 4.09 0.000067 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 166 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.132
## F-statistic: 9.56 on 3 and 166 DF, p-value: 0.0000074
If one need the uncentered dummy coding, it can be obtained
in R using contr.treatment(k)
.
The simple
contrast can be achieved in emmeans with
dunnett
.
model<-lm(Score~Group,data=dat)
emm<-emmeans(model,~Group)
contrast(emm,"dunnett",adjust = "none")
## contrast estimate SE df t.ratio p.value
## Group2 - Group1 -0.100 0.115 166 -0.870 0.3850
## Group3 - Group1 0.244 0.108 166 2.250 0.0260
## Group4 - Group1 0.443 0.108 166 4.090 <.0001
GAMLj distinguishes between
simple and dummy coding schemes. They give equivalent
results in means comparisons, simple effects, and contrast coefficients.
The only difference is that the Dummy
is not centered, and
so when the codes are involved in interactions, the effects of the other
variables have different meaning. In the presence of interactions, when
simple is used, the other variables effects are computed
averaging across the sample; when dummy is used, the other
variables effects are computed for the reference group (the first group)
defined by the dummy coding.
Definition: Compares each group with the subsequent group
Results:
UNIANOVA Score BY Group
/CONTRAST(Group)=REPEATED
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/PRINT=TEST(LMATRIX)
/CRITERIA=ALPHA(.05)
/DESIGN=Group.
R does not have an out-of-the-box contrast equivalent to
repeated
contrast. One can, howerver, uses the
contr.sdif()
function in MASS
package. As
compared with jamovi and SPSS the
comparisons are in the opposite direction, so we simply multiply the
code by -1 (t-tests and p-values do not depend on this):
library(MASS)
repeated<-(-1)*contr.sdif(4)
repeated
## 2-1 3-2 4-3
## 1 0.75 0.5 0.25
## 2 -0.25 0.5 0.25
## 3 -0.25 -0.5 0.25
## 4 -0.25 -0.5 -0.75
contrasts(dat$Group)<-repeated
summary(lm(Score~Group,data=dat))
##
## Call:
## lm(formula = Score ~ Group, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0631 -0.2795 -0.0011 0.2853 1.6573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1562 0.0394 3.97 0.00011 ***
## Group2-1 0.1001 0.1148 0.87 0.38454
## Group3-2 -0.3437 0.1142 -3.01 0.00304 **
## Group4-3 -0.1996 0.1077 -1.85 0.06568 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 166 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.132
## F-statistic: 9.56 on 3 and 166 DF, p-value: 0.0000074
… and we’re just fine.
The repeated
contrast can be achieved in emmeans with
consec
. To align the results to the jamovi SPSS, we should reverse the contrast
coding, with the option reverse
.
model<-lm(Score~Group,data=dat)
emm<-emmeans(model,~Group)
contrast(emm,"consec",adjust = "none",reverse=T)
## contrast estimate SE df t.ratio p.value
## Group1 - Group2 0.100 0.115 166 0.872 0.3850
## Group2 - Group3 -0.344 0.114 166 -3.008 0.0030
## Group3 - Group4 -0.200 0.108 166 -1.853 0.0660
Definition: Test polynomial (linear, quadratic, cubic, etc.) trends in the means pattern
Results:
UNIANOVA Score BY Group
/CONTRAST(Group)=POLYNOMIAL
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/PRINT=TEST(LMATRIX)
/CRITERIA=ALPHA(.05)
/DESIGN=Group.
R has an out-of-the-box contrast named contr.poly()
contrast. It works like a sharm.
contrasts(dat$Group)<-contr.poly(4)
summary(lm(Score~Group,data=dat))
##
## Call:
## lm(formula = Score ~ Group, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0631 -0.2795 -0.0011 0.2853 1.6573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1562 0.0394 3.97 0.00011 ***
## Group.L 0.3741 0.0770 4.86 0.0000027 ***
## Group.Q 0.1498 0.0787 1.90 0.05870 .
## Group.C -0.1315 0.0804 -1.64 0.10381
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 166 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.132
## F-statistic: 9.56 on 3 and 166 DF, p-value: 0.0000074
… and we can sleep like babies.
The poly
contrast has a direct implementation in
emmeans
with poly
.
model<-lm(Score~Group,data=dat)
emm<-emmeans(model,~Group)
contrast(emm,"poly",adjust = "none")
## contrast estimate SE df t.ratio p.value
## linear 1.673 0.344 166 4.860 <.0001
## quadratic 0.300 0.157 166 1.900 0.0587
## cubic -0.588 0.359 166 -1.640 0.1038
Difference and helmert contrasts are great source of confusion when one is comparing different software results. SPSS and jamovi use the same definition for them, but R has a different implementation. Different authors uses different definitions, so be aware of which comparisons are implemented.
Definition: Compares each group with the average of previous groups.
This is sometimes named Reverse Helmert Coding , cf. UCLA idre web site.
Results:
UNIANOVA Score BY Group
/CONTRAST(Group)=DIFFERENCE
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/PRINT=TEST(LMATRIX)
/CRITERIA=ALPHA(.05)
/DESIGN=Group.
The t-tests and p-values associated with the difference
contrast can be obtained in R using the
contr.helmert()
function. Despite the name,
contr.helmert()
implements what is usually named “Reversed
Helmert contrast”. Obviously, this is perfectly fine (cf. An
R and S-Plus Companion to Applied Regression), but one should be
aware of this difference. The contrast weights in R are scaled
differently than in SPSS or jamovi, thus the
contrast values (the coefficients) are not identical to the ones
obtained in jamovi and SPSS, but they are
proportional to them (that is why the t-tests are the same).
dat$Group<-factor(dat$Group)
contrasts(dat$Group)<-contr.helmert(4)
contrasts(dat$Group)
## [,1] [,2] [,3]
## 1 -1 -1 -1
## 2 1 -1 -1
## 3 0 2 -1
## 4 0 0 3
model<-lm(Score~Group,data=dat)
summary(model)
##
## Call:
## lm(formula = Score ~ Group, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0631 -0.2795 -0.0011 0.2853 1.6573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1562 0.0394 3.97 0.00011 ***
## Group1 -0.0501 0.0574 -0.87 0.38454
## Group2 0.0979 0.0318 3.08 0.00243 **
## Group3 0.0988 0.0222 4.44 0.000016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 166 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.132
## F-statistic: 9.56 on 3 and 166 DF, p-value: 0.0000074
To obtain the jamovi/SPSS estimate values in R, we should multiply the coefficients obtained in R by the index of the column of the contrasts (starting from 2 because 1 is for the intercept). To make things a bit more general, we can define a conversion vector that works both for deviation and helmert contrasts. Let the conversion vector be \(\mathbf{cv}=max(c_i)+1\), where \(c_i\) is a column of the contrast matrix.
cv<-apply(contr.helmert(4),2,max)+1
cv
## [1] 2 3 4
To obtain the same coefficients in R and in jamovi/SPSS, we can multiply the model coefficients by \(\mathbf{cv}\)
coef(model)[c(2:4)]*cv
## Group1 Group2 Group3
## -0.100 0.294 0.395
Or divide the contrast weights by \(\mathbf{cv}\)
### make a diagonal matrix with cv in the diagonal ##########
diag_cv<-matrix(rep(cv,each=4),ncol=3)
contrasts(dat$Group)<-contr.helmert(4)/diag_cv
contrasts(dat$Group)
## [,1] [,2] [,3]
## 1 -0.5 -0.333 -0.25
## 2 0.5 -0.333 -0.25
## 3 0.0 0.667 -0.25
## 4 0.0 0.000 0.75
model<-lm(Score~Group,data=dat)
summary(model)
##
## Call:
## lm(formula = Score ~ Group, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0631 -0.2795 -0.0011 0.2853 1.6573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1562 0.0394 3.97 0.00011 ***
## Group1 -0.1001 0.1148 -0.87 0.38454
## Group2 0.2936 0.0954 3.08 0.00243 **
## Group3 0.3953 0.0889 4.44 0.000016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 166 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.132
## F-statistic: 9.56 on 3 and 166 DF, p-value: 0.0000074
… and we get the same values, t-tests and p-values as in jamovi and SPSS.
Difference and helmert are not implemented in emmeans package, but
emmeans
has a very clever way to implement custom contrast
coding. We need to define a function whose name ends with
.emmc
and returns the coding as a data.frame, and then pass
it to emmeans
contrasts
function. The coding
weights are the one of the contrast matrix used in SPSS (see contrast matrix vs
model matrix for details).
library(MASS) ## needed for ginv(), see below
difference.emmc<<-function(levs) {
n<-length(levs)
#### model matrix ########
con_weights<-contr.helmert(n)
cv<-apply(con_weights,2,max)+1
diag_cv<-matrix(rep(cv,each=n),ncol=n-1)
model_mat<-con_weights/diag_cv
##### contrast matrix ########
cont_mat<-ginv(t(model_mat))
M <- as.data.frame(cont_mat)
names(M) <- paste(levs[-1],"vs previous")
attr(M, "desc") <- "Difference contrasts"
M
}
#### contrast matrix #########
round(difference.emmc(levels(dat$Group)),digits = 3)
## 2 vs previous 3 vs previous 4 vs previous
## 1 -1 -0.5 -0.333
## 2 1 -0.5 -0.333
## 3 0 1.0 -0.333
## 4 0 0.0 1.000
#### results ##############
contrast(emm,method="difference",adjust="none")
## contrast estimate SE df t.ratio p.value
## Group2 vs previous -0.100 0.1148 166 -0.870 0.3850
## Group3 vs previous 0.294 0.0954 166 3.080 0.0020
## Group4 vs previous 0.395 0.0889 166 4.440 <.0001
Same results as before.
Definition: Compares each group with the average of subsequent groups
Results:
UNIANOVA Score BY Group
/CONTRAST(Group)=HELMERT
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/PRINT=TEST(LMATRIX)
/CRITERIA=ALPHA(.05)
/DESIGN=Group.
R defines helmert contrast in the reverse order as compared with
jamovi and SPSS, and uses a different
scaling. To obtain the helmert contrast as it is was defined above, we
should reverse contr.helmert()
and rescale the coding as we
did for the difference
contrasts. Without rescaling, the
t-test and p-values are the same as in jamovi/spss, but the estimated values are
different (but proportional).
dat$Group<-factor(dat$Group)
contrasts(dat$Group)<-matrix(rev(contr.helmert(4)),ncol=3)
con_weights<-contrasts(dat$Group)
#### not scaled contrast weights ######
con_weights
## [,1] [,2] [,3]
## 1 3 0 0
## 2 -1 2 0
## 3 -1 -1 1
## 4 -1 -1 -1
model<-lm(Score~Group,data=dat)
summary(model)
##
## Call:
## lm(formula = Score ~ Group, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0631 -0.2795 -0.0011 0.2853 1.6573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1562 0.0394 3.97 0.00011 ***
## Group1 -0.0489 0.0224 -2.18 0.03048 *
## Group2 -0.1478 0.0336 -4.40 0.000019 ***
## Group3 -0.0998 0.0539 -1.85 0.06568 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 166 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.132
## F-statistic: 9.56 on 3 and 166 DF, p-value: 0.0000074
To obtain the jamovi/SPSS values in R, we should compute \(\mathbf{cv}=max(c_i)+1\), and multiply it by the model coefficients.
cv<-apply(con_weights,2,max)+1
cv
## [1] 4 3 2
coef(model)[c(2:4)]*cv
## Group1 Group2 Group3
## -0.196 -0.443 -0.200
or divide the contrast weights by \(\mathbf{cv}\)
### make a diagonal matrix with cv in the diagonal ##########
diag_cv<-matrix(rep(cv,each=4),ncol=3)
######## divide the contrast weights ##########
helm<-con_weights/diag_cv
### scaled contrast weights ############
helm
## [,1] [,2] [,3]
## 1 0.75 0.000 0.0
## 2 -0.25 0.667 0.0
## 3 -0.25 -0.333 0.5
## 4 -0.25 -0.333 -0.5
contrasts(dat$Group)<-helm
model<-lm(Score~Group,data=dat)
summary(model)
##
## Call:
## lm(formula = Score ~ Group, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0631 -0.2795 -0.0011 0.2853 1.6573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1562 0.0394 3.97 0.00011 ***
## Group1 -0.1955 0.0896 -2.18 0.03048 *
## Group2 -0.4434 0.1007 -4.40 0.000019 ***
## Group3 -0.1996 0.1077 -1.85 0.06568 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 166 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.132
## F-statistic: 9.56 on 3 and 166 DF, p-value: 0.0000074
… and we’re happy campers.
We need to define a funtion whose name ends with .emmc
which returns the contrast matrix, and then pass it to emmeans
contrasts
. The contrast matrix is the same used in SPSS and
can be obtained from the model matrix used in standard R (see contrast matrix vs model
matrix for details).
library(MASS) # needed for ginv(), see below
helmert.emmc<<-function(levs) {
n<-length(levs)
#### build model matricx ######
con_weights<-matrix(rev(contr.helmert(n)),ncol=n-1)
cv<-apply(con_weights,2,max)+1
diag_cv<-matrix(rep(cv,each=n),ncol=n-1)
model_mat<-con_weights/diag_cv
#### obtain contrast matrix #######
cont_mat<-ginv(t(model_mat))
M <- as.data.frame(cont_mat)
names(M) <- paste(levs[1:(length(levs)-1)],"vs subsequent")
attr(M, "desc") <- "Helmert contrasts"
M
}
helmert.emmc(levels(dat$Group))
## 1 vs subsequent 2 vs subsequent 3 vs subsequent
## 1 1.000 -0.0000000000000000634 -0.000000000000000110
## 2 -0.333 1.0000000000000000000 -0.000000000000000126
## 3 -0.333 -0.5000000000000001110 0.999999999999999556
## 4 -0.333 -0.5000000000000001110 -0.999999999999999778
contrast(emm,"helmert",adjust="none")
## contrast estimate SE df t.ratio p.value
## Group1 vs subsequent -0.196 0.0896 166 -2.180 0.0305
## Group2 vs subsequent -0.443 0.1007 166 -4.400 <.0001
## Group3 vs subsequent -0.200 0.1077 166 -1.850 0.0657
Same results!
We have seen that jamovi, SPSS and
emmeans
provide descriptions of the comparisons being
estimated as labels of the contrast. However, if one needs to deepen the
understanding of the contrast at hand, one needs to examine the contrast
coding scheme. SPSS and emmeans
can show the coding scheme
in the form of contrast matrix.
To visualize the contrast matrix in in SPSS one uses the option
/PRINT=TEST(LMATRIX)
, and in emmeans
one can
use the function coef()
on the contrast object output by
the contrast()
function.
Let’s see an example using the repeated
contrast:
This is the LMATRIX
one gets in SPSS
and this is the contrast matrix one gets from
emmeans
:
library(emmeans)
model<-lm(Score~Group,data=dat)
emm<-emmeans(model,~Group)
cont<-contrast(emm,"consec",reverse=T,adjust = "none")
coef(cont)
## Group c.1 c.2 c.3
## Group1 1 1 0 0
## Group2 2 -1 1 0
## Group3 3 0 -1 1
## Group4 4 0 0 -1
When it comes to R model estimation ( lm()
,
lmer()
, glm()
), however, we should notice
that to obtain the same comparisons obtained by the other packages, one
needs to code the comparisons in form of the model
matrix, not the contrast matrix. This may be confusing, but
that’s the way it is.
In fact, upon inspecting the coding scheme generated by the
contr.*
functions, one see that the coding does not
correspond to the contrast matrix seen before.
For repeated
contrasts in R we use:
(-1)*contr.sdif(4)
## 2-1 3-2 4-3
## 1 0.75 0.5 0.25
## 2 -0.25 0.5 0.25
## 3 -0.25 -0.5 0.25
## 4 -0.25 -0.5 -0.75
Recall (cf. the repeated contrast results above) that the results are
identical between the R, SPSS and emmeans
, even thought the
coding seems different.
Now, if you try to use the SPSS LMATRIX in R, you are not going to
get the results expected by the repeated
contrast. Indeed
(coefficients were .100,-.344,-.200
) :
ones<-matrix(c(1,-1,0,0,0,1,-1,0,0,0,1,-1),ncol=3)
ones
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] -1 1 0
## [3,] 0 -1 1
## [4,] 0 0 -1
contrasts(dat$Group)<-ones
summary(model)
##
## Call:
## lm(formula = Score ~ Group, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0631 -0.2795 -0.0011 0.2853 1.6573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1562 0.0394 3.97 0.00011 ***
## Group1 -0.1955 0.0896 -2.18 0.03048 *
## Group2 -0.4434 0.1007 -4.40 0.000019 ***
## Group3 -0.1996 0.1077 -1.85 0.06568 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 166 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.132
## F-statistic: 9.56 on 3 and 166 DF, p-value: 0.0000074
The reason of this discrepancy is that the LMATRIX is the contrast matrix, a matrix that shows the linear hypotheses implied by the contrast and represents the starting point of the coding system, not the model matrix itself. Thus, the constrast matrix (SPSS LMATRIX) is useful to understand what is going on in the comparisons: by inspecting the contrast matrix one can easily see that in column 1 groups 1 and 2 are compared, in column 2 groups 2 vs 3, and so on.
A clear technical explanation is provided in Venables 2017, where you find the definition of the contrast matrix (\(C\) in the Venable’s vignette) and the model matrix (\(B\) in the vignette).
Ok, but what is the relation between the contrast matrix and the R
model matrix? or even better: How do I get from the contrast matrix to
the R model matrix? Just take the inverse of the contrast matrix
and traspose it. In R, one can take the inverse of a matrix by
using MASS::ginv()
function. Let’s see:
##### SPSS LMATRIX ####
ones
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] -1 1 0
## [3,] 0 -1 1
## [4,] 0 0 -1
### take the inverse ####
iones<-ginv(ones)
### transpose it ###
t(iones)
## [,1] [,2] [,3]
## [1,] 0.75 0.5 0.25
## [2,] -0.25 0.5 0.25
## [3,] -0.25 -0.5 0.25
## [4,] -0.25 -0.5 -0.75
and there you are: the R model matrix.
Does it work in general? Yes! Take deviation
contrast,
with group 1 omitted.
SPSS LMATRIX is:
and R model matrix we have seen before:
(deviation<-matrix(rev(contr.sum(4)),ncol=3))
## [,1] [,2] [,3]
## [1,] -1 -1 -1
## [2,] 1 0 0
## [3,] 0 1 0
## [4,] 0 0 1
They seem different, but taking the inverse of the LMATRIX and transposing it, we get the R coding system.
##### SPSS LMATRIX ####
lmat<-matrix(rep(-.25,12),ncol=3)
lmat[2,1]<-.75
lmat[3,2]<-.75
lmat[4,3]<-.75
lmat
## [,1] [,2] [,3]
## [1,] -0.25 -0.25 -0.25
## [2,] 0.75 -0.25 -0.25
## [3,] -0.25 0.75 -0.25
## [4,] -0.25 -0.25 0.75
### take the inverse ####
ilmat<-ginv(lmat)
### transpose it ###
round(t(ilmat),digits = 3)
## [,1] [,2] [,3]
## [1,] -1 -1 -1
## [2,] 1 0 0
## [3,] 0 1 0
## [4,] 0 0 1
There you have the actual model matrix.
Let us be sure and check polynomial contrasts.
SPSS LMATRIX is:
R model matrix:
(polynom<-contr.poly(4))
## .L .Q .C
## [1,] -0.671 0.5 -0.224
## [2,] -0.224 -0.5 0.671
## [3,] 0.224 -0.5 -0.671
## [4,] 0.671 0.5 0.224
They are the same this time! Does it means that in this case the “inversion” does not apply? Well, it does apply, but because the polynomial contrast codes are orthogonal, the inverse of the contrast matrix is equal to the transpose of the contrast matrix, thus taking the inverse and then transpose it gives back the same matrix (cf orthogonal matrix)
ipoly<-ginv(polynom)
t(ipoly)
## [,1] [,2] [,3]
## [1,] -0.671 0.5 -0.224
## [2,] -0.224 -0.5 0.671
## [3,] 0.224 -0.5 -0.671
## [4,] 0.671 0.5 0.224
We have seen that emmeans
allows to define custom
contrast coding. When we do that, we should remember that we need to
define the contrast matrix, not the model matrix. Thus,
for repeated contrast, in R we used (-1)*contr.sdif(n)
:
(rcodes<-(-1)*contr.sdif(4))
## 2-1 3-2 4-3
## 1 0.75 0.5 0.25
## 2 -0.25 0.5 0.25
## 3 -0.25 -0.5 0.25
## 4 -0.25 -0.5 -0.75
and emmeans
uses:
cont<-contrast(emm,"consec",referse=T,adjust = "none")
(weigths<-coef(cont)[,2:4])
## c.1 c.2 c.3
## Group1 -1 0 0
## Group2 1 -1 0
## Group3 0 1 -1
## Group4 0 0 1
To get the contrast matrix from the model matrix we neet to get the inverse of the transpose:
#### model matrix ######
rcodes
## 2-1 3-2 4-3
## 1 0.75 0.5 0.25
## 2 -0.25 0.5 0.25
## 3 -0.25 -0.5 0.25
## 4 -0.25 -0.5 -0.75
#### transpose the R model matrix ######
tcodes<-t(rcodes)
#### invert it #####
round(ginv(tcodes),digits = 3)
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] -1 1 0
## [3,] 0 -1 1
## [4,] 0 0 -1
Now we can build a custom contrast function for emmeans
:
let’s do helmert
as we defined above.
Let’s start with the model matrix used in R.
con_weights<-matrix(rev(contr.helmert(4)),ncol=3)
cv<-apply(con_weights,2,max)+1
diag_cv<-matrix(rep(cv,each=4),ncol=3)
helm<-con_weights/diag_cv
helm
## [,1] [,2] [,3]
## [1,] 0.75 0.000 0.0
## [2,] -0.25 0.667 0.0
## [3,] -0.25 -0.333 0.5
## [4,] -0.25 -0.333 -0.5
Take the inverse of the transpose of it
(contr_mat<-ginv(t(helm)))
## [,1] [,2] [,3]
## [1,] 1.000 -0.0000000000000000634 -0.000000000000000110
## [2,] -0.333 1.0000000000000000000 -0.000000000000000126
## [3,] -0.333 -0.5000000000000001110 0.999999999999999556
## [4,] -0.333 -0.5000000000000001110 -0.999999999999999778
This is the contrast matrix we can pass to emmeans
. We
just need to wrap it up in a function with .emmc()
reference class and levels
as input.
helmert.emmc<<-function(levs) {
n<-length(levs)
con_weights<-matrix(rev(contr.helmert(n)),ncol=n-1)
cv<-apply(con_weights,2,max)+1
diag_cv<-matrix(rep(cv,each=n),ncol=n-1)
helm<-con_weights/diag_cv
model_mat<-ginv(t(helm))
M <- as.data.frame(model_mat)
names(M) <- paste(levs[1:(length(levs)-1)],"vs subsequent")
attr(M, "desc") <- "Helmert contrasts"
M
}
helmert.emmc(levels(dat$Group))
## 1 vs subsequent 2 vs subsequent 3 vs subsequent
## 1 1.000 -0.0000000000000000634 -0.000000000000000110
## 2 -0.333 1.0000000000000000000 -0.000000000000000126
## 3 -0.333 -0.5000000000000001110 0.999999999999999556
## 4 -0.333 -0.5000000000000001110 -0.999999999999999778
contrast(emm,"helmert",adjust="none")
## contrast estimate SE df t.ratio p.value
## Group1 vs subsequent -0.196 0.0896 166 -2.180 0.0305
## Group2 vs subsequent -0.443 0.1007 166 -4.400 <.0001
## Group3 vs subsequent -0.200 0.1077 166 -1.850 0.0657
and we get the contrast estimates as expected. For the
difference
contrast we can use
con_weights<-contr.helmert(n)
in the function and we get
the correct codes.
Got comments, issues or spotted a bug? Please open an issue on GAMLj at github or send me an email