Exploring Emptiness | 2.Small prefixes | 3. Semantic profiles | 4. Constructional profiles | 5. Prefix variation | 6. Aspectual triplets |
The rest of this webpage will describe the database and give some guidance on how to interpret the results of the analysis that you will get from running the R script.
CONSTRUCTION | VERB | REDUCED | PARTICIPLE |
---|---|---|---|
goal : 871 | _zero:393 | no :1353 | no : 895 |
theme:1049 | na :368 | yes: 567 | yes:1025 |
po :703 | |||
za :456 |
Next comes the logistic regression analysis, which you find under the heading “Logistic Regression Model” in the R output. We used a procedure (following Baayen 2008 and Gries 2009) for discovering the minimal adequate model for our data. This means that we started with a hypothetical model in which all independent variables serve as both main effects and have interactions with each other, and then we progressively stripped away those that were not significant until we arrived at a model that represented only significant relationships. We will not walk you through this whole procedure, but just show you the optimal model. This model has all of the independent variables as main effects, plus an interaction between the VERB and PARTICIPLE variables. The formula for this model is represented this way in your R output (and in the R script):
lrm(formula = CONSTRUCTION ~ VERB + REDUCED + PARTICIPLE + VERB:PARTICIPLE, data = loaddata, x = T, y = T, linear.predictors = T) |
This can be stated in prose thus: “CONSTRUCTION varies according to VERB, REDUCED, and PARTICIPLE as main effects, and an interaction between VERB and PARTICIPLE.”
Next comes a little table telling you the overall number of items for each value for the dependent variable CONSTRUCTION: goal has 871, and theme has 1049.
Next come some figures that indicate how well the model performs. We will interpret only some of them. For more discussion, see Cohen et al. 2003.
Obs | |
1920 | |
This is the number of observations = lines in the dataset. |
Model L.R. | |
1738.47 | |
This is the LL-ratio χ2, the difference between the two deviance values, with and without predictors. In other words, this is a test comparing how our model performs compared to a default model without any predictors at all. Our predictors do a good job, and we get a high value here. |
d.f. | |
8 | |
This tells us that there are eight degrees of freedom in our model. |
P | |
0 | |
This tells us that the overall p-value for our model is zero. In other words, this is a calculation of the likelihood that we would find a sample with this strong a deviation from a random pattern or even stronger if there were no pattern at all in a potentially infinite population of examples of ‘load’ verbs in Russian. |
C | |
0.964 | |
This is the coefficient of concordance, which according to Gries (2009) should ideally be 0.8 or higher. The maximum here is 1.0 so this is a high value. |
Dxy | |
0.928 | |
This is Somer’s Dxy , the rank correlation between predicted and observed responses. The maximum here is 1.0 so this is a high value. |
R2 | |
0.796 | |
This is Nagelkerke’s R2, which tells us the correlational strength in terms of the amount of variance that is accounted for by the model. The maximum here is 1.0 so this is a high value |
Next comes a table that has these headings: Coef, S.E., Wald Z, P. The rightmost column here lists the p-value for each predictor variable. Most of these are zero, which indicates that they are highly significant, but this doesn't give us a lot of detail, which is why we will also use another model for the logistic regression analysis below, which gives us some additional information.
Next comes a table with the headings Factor, Chi.Square, d.f., P. This is an ANOVA analysis comparing how the various factors perform. We see that VERB is strongest, next comes PARTICIPLE, next is VERB*PARTICIPLE (the interaction), and next is REDUCED.
Next comes an alternative way of calculating the logistic regression model, by using the binomial version of the general linear model (glm). This gives many of the same values, but also gives us some different insights into the data. Under Coefficients, in the column for Pr(>|z|) we get p-values for correlations with the various values of the variables.
Under "These are the confidence interval values:", we get a 95% confidence interval for all of the variable values. Note that none of these confidence intervals spans 0.0, which indicates that they are all possible predictors.
Under "These are the odds of success for each predictor variable:", we get the odds that each predictor value has of predicting a correct outcome in our model. This is another way of ranking the predictor variables.