Modify eQTL effect for eGenes / non-eGenes
Chris Dong
Department of Statistics and Data Science, University of California, Los Angelescycd@g.ucla.edu
Yihui Cen
Department of Computational Medicine, University of California, Los Angelesyihuicen@g.ucla.edu
13 February 2026
Source:vignettes/scDesignPop-modify-eQTL-effects.Rmd
scDesignPop-modify-eQTL-effects.RmdIntroduction
scDesignPop enables user-specified modification of eQTL effect sizes, allowing systematic benchmarking of various eQTL mapping methods with different ground truth.
Library and data preparation
Here, we use an example SingleCellExperiment object
example_sce with 1000 genes and 7811 cells and an example
eQTL genotype dataframe example_eqtlgeno to demonstrate the
tutorial. These two objects contains the gene expression and SNP
genotypes of 40 anonymized individuals while the eQTL genotype dataframe
provides 2826 putative cell-type-specific eQTLs.
library(scDesignPop)
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#> table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
library(SummarizedExperiment)
library(ggplot2)
data("example_sce")
data("example_eqtlgeno")Model fitting
Step 1: construct a data list
To run scDesignPop, a list of data is required as input. This is done
using the constructDataPop function. A
SingleCellExperiment object and an eqtlgeno
dataframe are the two main inputs needed. The eqtlgeno
dataframe consists of eQTL annotations (it must have cell state, gene,
SNP, chromosome, and position columns at a minimum), and genotypes
across individuals (columns) for every SNP (rows). The structure of an
example eqtlgeno dataframe is given below.
data_list <- constructDataPop(
sce = example_sce,
eqtlgeno_df = example_eqtlgeno,
new_covariate = as.data.frame(colData(example_sce)),
overlap_features = NULL,
sampid_vec = NULL,
copula_variable = "cell_type",
slot_name = "counts",
snp_mode = "single",
celltype_colname = "cell_type",
feature_colname = "gene_id",
snp_colname = "snp_id",
loc_colname = "POS",
chrom_colname = "CHR",
indiv_colname = "indiv",
prune_thres = 0.9
)
#> Constructing eqtlgeno list...Step 2: fit marginal model
Next, a marginal model is specified to fit each gene using the
fitMarginalPop function.
Here we use a Negative Binominal as the parametric model using
"nb", and we specify
mean_formula = "(1|indiv) + cell_type.
marginal_list_sel <- fitMarginalPop(
data_list = data_list,
mean_formula = "(1|indiv) + cell_type",
model_family = "nb",
interact_colnames = "cell_type",
parallelization = "pbmcapply",
n_threads = 20L,
loc_colname = "POS",
snp_colname = "snp_id",
celltype_colname = "cell_type",
indiv_colname = "indiv",
filter_snps = TRUE,
snpvar_thres = 0,
force_formula = FALSE,
data_maxsize = 1
)Modifying eQTL effects
To modify the eQTL effects fitted, we use the
modifyMarginalModels function.
marginal_mod <- scDesignPop::modifyMarginalModels(
marginal_list = marginal_list_sel["ENSG00000163221"],
eqtlgeno_list = data_list[["eqtl_geno_list"]],
features = "ENSG00000163221",
celltype = "cd4nc",
mean_log2fc = 0,
eqtl_log2fc = 2,
neg_ctrl = TRUE,
mean_baseline = NULL,
eqtl_baseline = NULL,
mean_baseline_only = FALSE,
eqtl_baseline_only = FALSE,
disp_scaling = "linear",
celltype_colname = "cell_type",
snp_colname = "snp_id",
verbose = TRUE,
debug = TRUE,
mod_scale = "link")
#> Modifying parameters for ENSG00000163221 in cd4nc celltype...
#> Per-allele link-scale eQTL effect: -0.19084 ===> 0.00000
#> celltype effect: -6.44683 ===> new value: -6.63767
#> interaction effect: -0.16812 ===> new value: 0.02273
#> << Setting conditional means to be negative controls (no eQTL effect) >>
#> conditional means at geno 0,1,2: 0.00963, 0.00796, 0.00657 ===> 0.00796, 0.00796, 0.00796
#> eQTL slope between geno 1 and 0: -0.00167 ===> new value: 0.00000
#> Phi parameter: 1.14919 ===> new value: 1.14919