4.2 Reading

When doing CFA, we’re primarily interested in estimating the measurement model through which a set of unobserved latent factors give rise to our observed data. In other words, we estimate a statistical model to operationalize some hypothetical constructs as estimated latent factors. The reading for this week doesn’t focus directly on CFA, but it does cover ideas that are vitally important for CFA modeling. This paper discusses common mistakes that researchers make when operationalizing the constructs that go into their models.

Reference

Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement practices and how to avoid them, Advances in Methods and Practices in Psychological Science, 3(4), 456–465. https://doi.org/10.1177/2515245920952393

Questions

What are researcher degrees of freedom?
What are questionable measurement practices, as described by Flake and Fried?
What are the six questions Flake & Fried recommend we ask ourselves if we want to promote good measurement practices?
For each question you listed above, briefly explain how the question is meant to improve measurement practices.

Keep an eye out for questionable measurement practices when reading other papers. They’ll pop up more frequently than you might expect.

Answers

Q1:

The term “researcher degrees of freedom” describes the impactful decisions researchers must make in a statistical analysis. All statistical analyses require myriad design decisions, and each of these decisions can impact the outcome of the analysis. Hence, we can view these branching paths of analytic decisions as a type of flexibility in the analysis.

Q2:

“[The authors] define questionable measurement practices as decisions researchers make that raise doubts about the validity of the measures used in a study, and ultimately the validity of the final conclusion.” (Flake & Fried, 2020, p. 458)

Q3:

What is your construct?
Why and how did you select your measure?
What measure did you use to operationalize the construct?
How did you quantify your measure?
Did you modify the scale? And if so, how and why?
Did you create a measure on the fly?

Q4:

We should be certain of what we’re trying to measure. What hypothetical construct are we trying to model?
We should choose our scales carefully. Is this really the most suitable scale for my study? Is this scale sufficiently validated?
We need to fully identify and adequately cite the scales we use. Provide correct citations. Give the version of multi-version scales.
We have many options when it comes to transforming observed scale responses into the scores/variables that we analyze in our model. Creating a naive sum score implies a very different operationalizion of the hypothetical construct than using CFA to create a latent factor. So, we need to be transparent about all data processing, transformation, and scoring procedures we apply.
Validated scales are only valid in their original form. If we modify a validated scale, we can’t really claim to be measuring the construct the original scale assessed. So, we should only modify scales when we have very strong reasons to do so, and we need to explicitly state and justify those reasons.
Any scales we create for one-off use are almost certainly bad. Without rigorous statistical validation, a measurement instrument can only be face valid (and, often, we don’t even surmount this low bar). Deep down, we all tend to think that we won’t make the same mistakes as others. So, it’s common to over-estimate our scale development skills and tell ourselves that the face validity of our scale will certainly generalize to other validities, but we’re probably wrong. Scale development is very hard. If we could all whip together a valid scale, their would be no need for the fields of psychometrics or educational testing.