keywords Non-linear regression, Polynomial model, non-linear effects, linear model

GALMj version ≥ 1.5.0

In this example we work out an example of polynomial regression in the GLM, using jamovi GAMLj. Data are (simulated data) here.

The dataset has three variables of interest. Imagine we measured athletes performace in a match using a standard scale and the number of hours they trained in a week. The idea is to study the relationship between hours of training and performance. Because training can be good for performance but training too much may have detrimental effects on performance, we foresee a non-linear effect of training hours on performance. Furthermore, athletes are divided in two groups (variable `types`

), professionals (`P`

) and amateurs (`A`

), and we want to check if the effects of training is different in the two groups.

We first set up a linear regression with only linear effects. We launch `General Linear Model`

from the `Linear Models`

menu. We put `performance`

in the dependent variable field and `hours`

in the `covariate`

.

By defining the variables we obtain a simple regression in the output, but we want to specify a quadratic effect of `hours`

, so we go to the `Model`

panel. As soon as we select `hours`

in the `Components`

field, we can see on the right of the variable a little `1`

appearing.

That little number indicates the order of the effect that we want to insert in the model, that is, the exponent of the term we want to include. The number `1`

(default) means `linear`

effects. To include a quadratic effect (second order), we should increase the number to `2`

, as in the Figure. We can then push the arrow to move the quadratic term into the model.

If we want, for instance, also the cubic term, we should increase the number to `3`

and move it to the model as well.

Results show that the polinomial (linear+quaratic+cubic) effects of `hours`

on `performance`

explain about 50% of the variance \(R^2=.486\).

BY inspecting the F-tests and the estimates (B coefficients) we can see that we have a *linear* (\(hours\)) and a *quadratic* (\(hours^2\)) effect of `hours`

to `performance`

, whereas the *cubic* effect (\(hours^3\)) is trivial and can be disregarded (the \(\eta^2p\) is practically zero).

When it comes to polynomial models, the best way to figure out the relationship between variables is to plot the effects. We can do that by selecting the `Plot`

panel and by putting `hours`

in the `Horizontal Axis`

field (mind that in GAMLj default the IV is centered to its mean, to obtain a nice plot I changed the IV scaling to `none`

in `Covariates scaling`

panel).

We can see that, on average, up to 10 hours, one more hour of training is good for the performance, but after 10 hours, increasing training is not advantageous in terms of performance. That is, we have a curvilinear effect of the IV on the DV.

We can now analyze possible differences due the the type of athletes by introducing `type`

as a factor in the model.

When we go to the `Model`

panel, we see that the main effect of `type`

is automatically inserted in the model terms.

However, we want to see if the effect of `hours`

depends on `type`

so we need to include the interactions. We need two interactions: the interaction *linear hours* by *type*, and *quadratic hours* by *type* (I removed *cubic hours* based on the previous analysis).

For the *linear* by *type* interaction, we select both `type`

and `hours`

and we press the `arrow`

to move the interaction term to the `Model Terms`

field.

For the *quadratic* by *type* interaction, we select both `type`

and `hours`

, and we increase the exponent of `hours`

to signal that we want the quadratic term to interact with `type`

. We press the `arrow`

to move the interaction term to the `Model Terms`

field.

We have done setting the new model.

The model info table shows the actual R-syntax model we estimated and the \(R^2\), the latter clearly larger than the \(R^2\) of the previous model.

As regards the effects, we can see that we do have a *quadratic hours* by *type* interaction, so we can say that the effect of `hours`

on `performance`

has a different shape depending on the type of athlete.

Inspecting the plot makes the interpretation easier.

For professional athletes (`P`

), the performace increases along hours of training almost linearly, thus the more hours they train, the better the performance. For amateur athletes (`A`

) the performance is positively linked to training hours up to 9 hours, after which more training means a strong decrease in performance. Thus, for amateurs there’s a U-shaped effect of training on performance, whereas for pro’s the relationship is practically linear.

Assume we want to test groups differences along the training hours continuum. That is, we want to test the difference between the two groups at different levels of training length. To do that, it is convinient to rescale the variables: We standardize the independent variable and code the factor with `simple`

coding, which yields coefficients associated with the factor equal to the groups difference in the expected value of the dependent variable.

We then ask for the simple effects of `type`

for different levels (mean and mean plus/minus one SD) of `hours`

.

The simple effects tables show that for low (-1SD) and high (+1SD) training the groups are statistically different and the difference is to the advantage of the Professional group (\(P-A=2.977\) and \(P-A=2.667\)), whereas for average training the performance does not seem to be different between the two groups (\(P-A=-0.0403\)). By changing the covariate conditioning in the `Covariate Scaling`

panel one can test these differences for all values of `hours`

that one wishes.

To visualize what we are doing, let’s see the plot after standardizing the IV.

In practice, the simple effects tests we have seen tested the difference between the blue and the yellow curve at `hours`

equal to -1, 0, and 1 . Because we standardized `hours`

, those values correspond to -1SD, mean, and +1SD of training hours.