Skip to content

confound_set("dvars") selects both dvars and std_dvars, yielding collinear nuisance regressors #67

@bbuchsbaum

Description

@bbuchsbaum

Summary

confound_set("dvars") appears to select both dvars and std_dvars from fMRIPrep confounds. In at least one fMRIPrep dataset these columns are perfectly correlated (r = 1) in every run, which creates rank-deficient nuisance designs downstream.

This surfaced in a first-level fMRI GLM using bidser::read_confounds() plus fmridesign::baseline_model() / fmrireg::fmri_lm(). The fit failed deep in linear algebra because the design matrix was singular.

Reprex / diagnostic

library(bidser)

proj <- bids_project(
  "/project/rrg-brad/dsets/sdam",
  derivatives = "auto",
  strict_participants = FALSE
)

cvars <- c(
  confound_set("cosine"),
  confound_set("dvars"),
  confound_set("acompcor", n = 3),
  confound_set("tcompcor", n = 3)
)

cf <- read_confounds(
  proj,
  subid = "2001",
  task = "autobio",
  session = "2",
  cvars = cvars,
  nest = TRUE
)

names(cf$data[[1]])

The selected columns included both:

dvars
std_dvars

A correlation check showed exact duplication:

sub-2001 run-01: dvars vs std_dvars, r = 1
sub-2001 run-02: dvars vs std_dvars, r = 1
sub-2002 run-01: dvars vs std_dvars, r = 1
sub-2002 run-02: dvars vs std_dvars, r = 1

The selected nuisance columns looked like:

cosine00
cosine01
cosine02
cosine03
cosine04
cosine05
dvars
std_dvars
a_comp_cor_00
a_comp_cor_01
a_comp_cor_02
t_comp_cor_00
t_comp_cor_01
t_comp_cor_02

Current behavior

confound_set("dvars") expands to a selector broad enough to return both dvars and std_dvars.

This is risky because users reasonably expect a named confound family selector to avoid obvious duplicate alternatives or to document that it returns multiple DVARS variants.

Expected behavior

Possible fixes:

  1. Make confound_set("dvars") select only one DVARS variant, preferably std_dvars because it is the standardized fMRIPrep column often used as a nuisance/censoring metric.

  2. Add explicit selectors for each variant, e.g.:

confound_set("dvars")       # maybe std_dvars by default
confound_set("raw_dvars")   # dvars
confound_set("std_dvars")   # std_dvars
  1. If keeping current behavior, document clearly that confound_set("dvars") returns multiple DVARS columns and may introduce collinearity.

  2. Optionally provide a helper or option in read_confounds() to drop zero-variance or rank-dependent columns by run.

Why this matters

Collinear nuisance regressors are common with fMRIPrep outputs and can produce opaque downstream failures in GLM tools. In this case, fmrireg::fmri_lm() failed with:

Error in chol2inv(qr.lm(lmfit)$qr) :
  element (31, 31) is zero, so the inverse cannot be computed

The root cause was nuisance matrix rank deficiency. dvars/std_dvars duplication was one clear contributor, and some selected columns were also zero-variance in individual runs.

Workaround

For now, downstream code can avoid confound_set("dvars") and select one DVARS column explicitly, e.g. std_dvars, then apply per-run rank/zero-variance cleaning before modeling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions