4.3 At-Home Exercises

This week, we will take our first steps into the world of factor analysis. We’ll begin by fitting some CFA models using personality scale data.


4.3.1 Setup

Data:

You will use the following dataset for these exercises.

This dataset contains 1000 observations of 16X personality items collected as part of the International Personality Item Pool. These items are meant to assess Raymond Cattell’s 16 personality factors (16PF):

  • Warmth
  • Reasoning
  • Emotional Stability
  • Dominance
  • Liveliness
  • Rule-Consciousness
  • Social Boldness
  • Sensitivity
  • Vigilance
  • Abstractness
  • Privateness
  • Apprehension
  • Openness to Change
  • Self-Reliance
  • Perfectionism
  • Tension

The 16PF Wikipedia page provides a nice summary of the scale.

Packages:

To complete these exercises, you will need to install the package described below.

Package Description
semPlot Programmatically draw path diagrams in R

Use the install.packages() function to install this package.

Click to show code
install.packages("semPlot", dependencies = TRUE)

4.3.2

Load the cattell data.

  • The relevant data are contained in the [canttell.rds][cattell_data] file.
Click to show code
dataDir <- "data"
cattell <- readRDS(here::here(dataDir, "cattell.rds"))

We’ll first estimate a simple one-dimensional CFA of the Warmth factor. Our target model is defined as follows:

  • One reflective latent factor representing the Warmth dimension
  • 10 observed indicators: {A1, A2, …, A10}
  • 10 residual variances but no residual covariances

Basically, our CFA should estimate the default measurement model through which one latent factor generates 10 observed indicator variables.


4.3.3

Sketch a path diagram of the CFA model described above.

  • Use pencil and paper (or the equivalent); don’t generate the diagram programmatically.

4.3.4

Calculate the following quantities for the Warmth CFA described above.

  1. The number of parameters in the model.
  2. The pieces of unique information provided by the data.
  3. The number of estimated parameters after applying the necessary scaling constraints.
  4. The degrees of freedom for the estimated model.
Click to show the answer
  1. The model contains 21 parameters
    • 10 factor loadings
    • 10 residual variances
    • One latent variance
  2. Our model will include 10 observed variables, so the data provide 10 (10 + 1) / 2 = 55 pieces of information.
  3. To identify the covariance structure model, we need to fix one parameter. So, we will estimate 20 parameters.
  4. The estimated model will have 55 - 20 = 35 degrees of freedom.

4.3.5

Define the lavaan model syntax for the unidimensional CFA of Warmth described above.

  • Do not specify any mean structure.
  • Save this model syntax as an object in your environment.
Click to show code
mod1 <- '
warmth =~ A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10
'
Click for explanation

At this point, we only need to define the factor loading map. We can specify all the other options when we estimate the model.


4.3.6

Estimate the CFA model you defined in 4.3.5, and summarize the results.

  • Use the lavaan::cfa() function to estimate the model.
  • Use the default settings for the cfa() function.

Check the results, and answer the following questions:

  • What is the estimated variance of the Warmth factor?
  • How is the model identified when using the default settings?
Click to show code
## Load the lavaan package:
library(lavaan)

## Estimate the CFA model:
outH1.1 <- cfa(mod1, data = cattell)

## Summarize the fitted model:
summary(outH1.1)
## lavaan 0.6-19 ended normally after 23 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        20
## 
##   Number of observations                          1000
## 
## Model Test User Model:
##                                                       
##   Test statistic                               336.181
##   Degrees of freedom                                35
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   warmth =~                                           
##     A1                1.000                           
##     A2                0.908    0.049   18.670    0.000
##     A3                0.796    0.052   15.342    0.000
##     A4                0.786    0.049   15.952    0.000
##     A5                0.893    0.049   18.362    0.000
##     A6                0.831    0.048   17.422    0.000
##     A7                0.783    0.047   16.505    0.000
##     A8               -0.516    0.057   -8.996    0.000
##     A9               -0.860    0.053  -16.269    0.000
##     A10              -0.418    0.052   -7.979    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .A1                0.571    0.030   18.989    0.000
##    .A2                0.419    0.023   18.565    0.000
##    .A3                0.680    0.033   20.572    0.000
##    .A4                0.582    0.029   20.325    0.000
##    .A5                0.438    0.023   18.847    0.000
##    .A6                0.472    0.024   19.548    0.000
##    .A7                0.513    0.026   20.067    0.000
##    .A8                1.156    0.053   21.921    0.000
##    .A9                0.652    0.032   20.182    0.000
##    .A10               0.993    0.045   22.025    0.000
##     warmth            0.475    0.042   11.229    0.000
Click for explanation

The estimated latent variance is \(\psi_{11} = 0.475\).

The cfa() function is just a wrapper for the lavaan() function with several options set at the defaults you would want for a standard CFA.

  • By default, the model is identified by fixing the first factor loading of each factor to 1 (due to the argument auto.fix.first = TRUE).

To see a full list of the (many) options you can specify to tweak the behavior of lavaan estimation functions run ?lavOptions.


4.3.7

Do the number of model parameters and degrees of freedom reported in the lavaan output agree with the values you calculated in 4.3.4?

Click to show answer

Yes, the values should agree. This first part of the lavaan model summary reports the number of model parameters and the degrees of freedom.

## lavaan 0.6-19 ended normally after 23 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        20
## 
##   Number of observations                          1000
## 
## Model Test User Model:
##                                                       
##   Test statistic                               336.181
##   Degrees of freedom                                35
##   P-value (Chi-square)                           0.000

In this case, we see that we’ve estimated 20 parameters and we have df = 35.


4.3.8

Use the semPaths() function from the semPlot package to draw a path diagram of your estimated CFA.

  • Include the parameter estimates in the diagram.
Click to show code
library(semPlot)
semPaths(outH1.1, whatLabels = "estimates")

Now, we’re going to complicate matters slightly by adding another dimension to our model. We’ll now estimate a two-dimensional CFA that contains correlated Warmth and Dominance factors.

Defined this new model as follows:

  • Two correlated, reflective latent factors: one Warmth factor and one Dominance factor
  • 10 observed indicators of Warmth: {A1, A2, …, A10}
  • 10 observed indicators of Dominance: {D1, D2, …, D10}
  • No cross-loadings
  • No residual correlations

Again, this CFA should estimate the default measurement model though which two correlated latent factors generate a set of 20 observed indicators.


4.3.9

Sketch a path diagram of the CFA model described above.

  • Use pencil and paper (or the equivalent); don’t generate the diagram programmatically.

4.3.10

Define the lavaan model syntax for a two-factor model of the Warmth and Dominance items.

  • Save this syntax as an object in your environment.
Click to show code
mod2 <- '
warmth    =~ A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10
dominance =~ D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8 + D9 + D10
'

4.3.11

Estimate the two-factor model you specified in 4.3.10, and summarize the results.

  • Identify the model by fixing the first factor loading of each construct to 1.
  • Do not estimate any mean structure.
Click to show code
## Estimate the two factor model:
outH2.1 <- cfa(mod2, data = cattell)

## Summarize the results:
summary(outH2.1)
## lavaan 0.6-19 ended normally after 26 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        41
## 
##   Number of observations                          1000
## 
## Model Test User Model:
##                                                       
##   Test statistic                              1158.738
##   Degrees of freedom                               169
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   warmth =~                                           
##     A1                1.000                           
##     A2                0.908    0.048   18.880    0.000
##     A3                0.780    0.051   15.188    0.000
##     A4                0.771    0.049   15.829    0.000
##     A5                0.900    0.048   18.679    0.000
##     A6                0.835    0.047   17.684    0.000
##     A7                0.773    0.047   16.468    0.000
##     A8               -0.508    0.057   -8.921    0.000
##     A9               -0.847    0.052  -16.199    0.000
##     A10              -0.414    0.052   -7.954    0.000
##   dominance =~                                        
##     D1                1.000                           
##     D2                0.774    0.040   19.507    0.000
##     D3                0.500    0.041   12.244    0.000
##     D4                0.462    0.042   11.078    0.000
##     D5                0.841    0.034   24.730    0.000
##     D6                0.449    0.034   13.166    0.000
##     D7               -0.875    0.040  -22.044    0.000
##     D8               -0.426    0.036  -11.774    0.000
##     D9               -0.712    0.039  -18.223    0.000
##     D10              -0.630    0.046  -13.799    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   warmth ~~                                           
##     dominance         0.156    0.023    6.920    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .A1                0.567    0.030   18.988    0.000
##    .A2                0.414    0.022   18.548    0.000
##    .A3                0.689    0.033   20.683    0.000
##    .A4                0.590    0.029   20.443    0.000
##    .A5                0.428    0.023   18.734    0.000
##    .A6                0.465    0.024   19.494    0.000
##    .A7                0.518    0.026   20.166    0.000
##    .A8                1.159    0.053   21.939    0.000
##    .A9                0.659    0.032   20.288    0.000
##    .A10               0.994    0.045   22.035    0.000
##    .D1                0.307    0.021   14.578    0.000
##    .D2                0.677    0.033   20.272    0.000
##    .D3                0.877    0.040   21.695    0.000
##    .D4                0.935    0.043   21.827    0.000
##    .D5                0.376    0.021   17.825    0.000
##    .D6                0.602    0.028   21.576    0.000
##    .D7                0.606    0.031   19.352    0.000
##    .D8                0.696    0.032   21.750    0.000
##    .D9                0.689    0.033   20.626    0.000
##    .D10               1.064    0.050   21.486    0.000
##     warmth            0.480    0.042   11.314    0.000
##     dominance         0.677    0.045   15.133    0.000

4.3.12

Report the following parameter estimates from the model you estimated in 4.3.11.

  1. The warmth \(\rightarrow\) A3 factor loading
  2. The dominance \(\rightarrow\) D2 factor loading
  3. The latent covariance between warmth and dominance
Click to show answer
  1. \(\lambda_{3,1} = 0.78\)
  2. \(\lambda_{12,2} = 0.774\)
  3. \(\psi_{2,1} = 0.156\)

4.3.13

Based on the CFA you estimated in 4.3.11, can we infer a significant linear association between Warmth and Dominance?

  • Use the statistics from your estimated model to justify your conclusion.
Click to show answer

Yes, warmth and dominance are significantly correlated (\(\psi_{2,1} = 0.156\), \(SE = 0.02\), \(z = 6.92\), \(p < 0.001\)).


4.3.14

Use the semPaths() function to draw a path diagram of your estimated CFA.

  • Include the parameter estimates in the diagram.
Click to show code
semPaths(outH2.1, whatLabels = "estimates")


4.3.15

Estimate a modified version of the two-factor CFA from 4.3.11, and summarize the results.

  • Estimate the mean structure.
  • Identify the model with the fixed factor method (i.e., standardize the latent variables).
Click to show code
outH2.2 <- cfa(mod2, data = cattell, meanstructure = TRUE, std.lv = TRUE)
summary(outH2.2)
## lavaan 0.6-19 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        61
## 
##   Number of observations                          1000
## 
## Model Test User Model:
##                                                       
##   Test statistic                              1158.738
##   Degrees of freedom                               169
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   warmth =~                                           
##     A1                0.692    0.031   22.629    0.000
##     A2                0.629    0.027   23.601    0.000
##     A3                0.540    0.031   17.309    0.000
##     A4                0.534    0.029   18.281    0.000
##     A5                0.624    0.027   23.203    0.000
##     A6                0.578    0.027   21.349    0.000
##     A7                0.535    0.028   19.290    0.000
##     A8               -0.352    0.038   -9.296    0.000
##     A9               -0.586    0.031  -18.859    0.000
##     A10              -0.286    0.035   -8.216    0.000
##   dominance =~                                        
##     D1                0.823    0.027   30.267    0.000
##     D2                0.636    0.032   20.101    0.000
##     D3                0.411    0.033   12.379    0.000
##     D4                0.380    0.034   11.178    0.000
##     D5                0.692    0.026   26.161    0.000
##     D6                0.370    0.028   13.335    0.000
##     D7               -0.720    0.031  -22.953    0.000
##     D8               -0.351    0.029  -11.894    0.000
##     D9               -0.586    0.031  -18.697    0.000
##     D10              -0.519    0.037  -13.994    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   warmth ~~                                           
##     dominance         0.273    0.035    7.906    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .A1                3.748    0.032  115.859    0.000
##    .A2                3.895    0.028  136.858    0.000
##    .A3                3.865    0.031  123.414    0.000
##    .A4                3.726    0.030  125.967    0.000
##    .A5                3.876    0.029  135.635    0.000
##    .A6                3.701    0.028  130.883    0.000
##    .A7                3.919    0.028  138.175    0.000
##    .A8                2.973    0.036   83.024    0.000
##    .A9                2.146    0.032   67.772    0.000
##    .A10               2.423    0.033   73.864    0.000
##    .D1                3.522    0.031  112.305    0.000
##    .D2                3.383    0.033  102.831    0.000
##    .D3                3.545    0.032  109.611    0.000
##    .D4                3.548    0.033  107.977    0.000
##    .D5                3.686    0.029  126.029    0.000
##    .D6                3.618    0.027  133.174    0.000
##    .D7                2.701    0.034   80.578    0.000
##    .D8                2.111    0.029   73.779    0.000
##    .D9                2.910    0.032   90.589    0.000
##    .D10               2.462    0.037   67.444    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .A1                0.567    0.030   18.988    0.000
##    .A2                0.414    0.022   18.548    0.000
##    .A3                0.689    0.033   20.683    0.000
##    .A4                0.590    0.029   20.443    0.000
##    .A5                0.428    0.023   18.734    0.000
##    .A6                0.465    0.024   19.494    0.000
##    .A7                0.518    0.026   20.166    0.000
##    .A8                1.159    0.053   21.939    0.000
##    .A9                0.659    0.032   20.288    0.000
##    .A10               0.994    0.045   22.035    0.000
##    .D1                0.307    0.021   14.578    0.000
##    .D2                0.677    0.033   20.272    0.000
##    .D3                0.877    0.040   21.695    0.000
##    .D4                0.935    0.043   21.827    0.000
##    .D5                0.376    0.021   17.825    0.000
##    .D6                0.602    0.028   21.576    0.000
##    .D7                0.606    0.031   19.352    0.000
##    .D8                0.696    0.032   21.750    0.000
##    .D9                0.689    0.033   20.626    0.000
##    .D10               1.064    0.050   21.486    0.000
##     warmth            1.000                           
##     dominance         1.000

4.3.16

Report the following parameter estimates from the model you estimated in 4.3.15.

  1. The item intercept for A1
  2. The item intercept for D5
  3. The mean of the Warmth factor
Click to show answer
  1. \(\tau_1 = 3.75\)
  2. \(\tau_{15} = 3.69\)
  3. This is a bit of a trick question. We fixed the latent means to zero for model identification, so we don’t have an estimated latent mean for the warmth factor.

4.3.17

Based on the CFA you estimated in 4.3.15, how strong is the association between Warmth and Dominance?

  • Use the statistics from your estimated model to justify your conclusion.
Click to show answer

We set the scale by standardizing the latent factors, so we can interpret the latent covariance as a correlation. Hence, the estimated latent covariance of \(\psi_{2,1} = 0.273\) suggests a weak linear association between Warmth and Dominance.


4.3.18

Use the semPaths() function to draw a path diagram of the CFA you estimated in 4.3.15.

  • Include the parameter estimates in the diagram.
Click to show code
semPaths(outH2.2, whatLabels = "estimates")


End of At-Home Exercises