Different imput styles fo logistic analysis

keywords jamovi, logistic models, generalized linear models, weighted data, success/fails

3.4.0

Data suitable for running a logistic model may come in different shapes. Statistical software may allow to analyze the data the way they are or a reshape of the data might be necessary. In GAMLj, logistic models may be estimated with different types of data shapes, so here we show how to. We focus on the Logistic Model, but exactely the same procedures can be used for the Probit model.

There are five ways one can setup a logistic model in GAMLj. To demonstrate how they are, we are going to use a different versions of datasets of an hypothetical study in which there is an independent variable \(X\) and one dichotomous dependent variable indicating success with \(Y=1\) and fail with \(Y=0\). All datasets contain the same data, only formatted in different ways.

The head of the file looks like this:

We have two variables, \(X\) and \(Y\), that contain the individual values for the independent and dependent variable, respectively. There are 5500 cases in the 2693 of which show \(Y=1\), implying that 49% of the sample has a \(Y=1\).

For instance, the first row shows that \(Y=1\) and \(X=5\) occurred 100 times in the sample, while the combination \(Y=0\) and \(X=5\) (row 2) occurred 400 times. It is worth noting that the sum of the weights for each level of \(X\) always equals 500. This constant sum of weights is not a requirement of logistic regression; it just happens to be the case in this file.

For example, the first row shows that \(Y=1\) and \(X=5\) occurred 100 times in the sample, while \(Y=0\) and \(X=5\) (row 2) occurred 400 times. The \(N\) variable is not relevant here, but we will use it later. The same pattern applies to the other rows in the dataset.

Long format data

File wlogistic_verylong.omv

For long format data, we simply estimate the model setting the dependent and the independent variable.

Notice at the bottom of the UI the Input Method option. By default is Responses, indating that the dependent variable represent each case response to the dichotomous variable.

Let’s take a note of some results, so we can compare them with the results of other input methods. Here we have

an \(R^2=.114\), \(\chi^=871\), and \(exp(B)=1.145\).

finally, we can appreciate that the sample size is indeed 5500 cases

Weighted Long format

File wlogistic_long.omv

The weighted long format features a dichotomous dependent variable, with weights used to assign the frequency of responses. To run the model, we need to utilize the jamovi weighting tool. First, we set up the dataset to be weighted by the variable \(Counts\).

Next, we run the model exactly as we did before:

And we obtain the same results:


Note that the sample size is now 22 (the number of rows), but the software recognizes that the analysis is equivalent to one run on 5,500 cases.

success/ failure format

File wlogistic_wide.omv

The success/failure format works with two key columns: one for the number of successes (\(Y=1\)) and the other for the number of failures (\(Y=0\)), represented as counts. To run the model, you don’t need to apply weights; instead, select the Success/Failure option in Input Method. This will provide two input fields for the dependent variable, where you assign the columns that represent the counts of successes and failures.

Once that’s done, you’ll get the same reliable results:


What’s worth noting is the footnote in the Model Fit table, which reminds you that the \(R^2\) is calculated as if the full long format were available. This is a crucial distinction because, in other software, calculating the \(R^2\) using this format can sometimes lead to discrepancies—often producing an incorrect value.

Probability over totals

Alternative, one can use the probability of \(Y=1\) along with the total number of cases represented in each row. File wlogistic_wide.omv. As compare with the example above, it entails simply to select P(success)/Totals in the Input Method option.

Results are as expected.


Comments?

Got comments, issues or spotted a bug? Please open an issue on GAMLj at github or send me an email

Return to main help pages

Main page Generalized linear models