Power analysis based on user specified eQTL effect sizes
Chris Dong
Department of Statistics and Data Science, University of California, Los Angelescycd@g.ucla.edu
Yihui Cen
Department of Computational Medicine, University of California, Los Angelesyihuicen@g.ucla.edu
13 February 2026
Source:vignettes/scDesignPop-power-analysis-ES-specification.Rmd
scDesignPop-power-analysis-ES-specification.RmdIntroduction
scDesignPop provides power analysis tools at cell-type-specific level. Users can specify cell-type-specific eQTL effect sizes to check hypothetical scenarios and how power curve will change based on eQTL effect size specifications.
Library and data preparation
For data preparation, a list of data is required as input. This is
done using the constructDataPop function. A
SingleCellExperiment object and an eqtlgeno
dataframe are the two main inputs needed. The eqtlgeno
dataframe consists of eQTL annotations (it must have cell state, gene,
SNP, chromosome, and position columns at a minimum), and genotypes
across individuals (columns) for every SNP (rows). The structure of an
example eqtlgeno dataframe is given below.
library(scDesignPop)
library(SingleCellExperiment)
data("example_sce")
data("example_eqtlgeno")
example_sce_sel <- example_sce[c("ENSG00000163221","ENSG00000135218"),]
example_eqtlgeno_sel <- example_eqtlgeno[
which(example_eqtlgeno$gene_id%in%c("ENSG00000163221","ENSG00000135218")),]
data_list_sel <- constructDataPop(
sce = example_sce_sel,
eqtlgeno_df = example_eqtlgeno_sel,
new_covariate = as.data.frame(colData(example_sce_sel)),
overlap_features = NULL,
sampid_vec = NULL,
copula_variable = "cell_type",
slot_name = "counts",
snp_mode = "single",
celltype_colname = "cell_type",
feature_colname = "gene_id",
snp_colname = "snp_id",
loc_colname = "POS",
chrom_colname = "CHR",
indiv_colname = "indiv",
prune_thres = 0.9
)Fitting the marginal model
Next, a marginal model is specified to fit each gene using the
fitMarginalPop function.
Here we use a Negative Binominal as the parametric model using
"nb".
marginal_list_sel <- fitMarginalPop(
data_list = data_list_sel,
mean_formula = "(1|indiv) + cell_type",
model_family = "nb",
interact_colnames = "cell_type",
parallelization = "pbmcapply",
n_threads = 1L,
loc_colname = "POS",
snp_colname = "snp_id",
celltype_colname = "cell_type",
indiv_colname = "indiv",
filter_snps = TRUE,
snpvar_thres = 0,
force_formula = FALSE,
data_maxsize = 1
)Performing power analysis with user specified eQTL effect size at cell-type level
Given fitted marginal model, scDesignPop can perform simulation-based
power analysis for a specific gene-SNP pair across selected cell types
using the runPowerAnalysis function. Based on the previous
naming of covariates, we specify the fitted snpid as
"1:153337943", the name of the column for fixed cell state
effect and random individual effect as "cell_type" and
"indiv" in the input parameters. To check these namings, we
can call the covariate data frame using
marginal_list_sel[["ENSG00000163221"]]$fit$frame. The
selected cell types for testing are specified in cellstate_vector and
have to be consistent with the covariate data frame.
Particarly, parameters snp_number and
gene_number are used to account for multiple testing
correction with Bonferroni correction. Parameter methods is
used to specify the marginal eQTL model from
c("nb", "poisson", "gaussian", "pseudoBulkLinear").
Parameter nindivs and ncells are used to
specify the number of individuals and number of cells per individual,
from which we can analyze the performance of power analysis and find the
optimal setting. Here, we set power_nsim = 1000 to increase
the number of simulations so we can calculate power with a higher
resolution. Using power_nsim = 100 in default or smaller
values can reduce the computation time cost.
Specifically, users can specify different eQTL effect sizes in each
cell type using the input parameter
celltype_specific_ES_list as follows, where each element of
the list represents one scenario allowing eQTL effect sizes in multiple
cell types can be specified at the same time. (bmem cell type is also
included but without user specification)
set.seed(123)
power_data <- runPowerAnalysis(marginal_list = marginal_list_sel,
marginal_model = "nb",
geneid = "ENSG00000163221",
snpid = "1:153337943",
celltype_colname = "cell_type",
celltype_vector = c("monoc"),
celltype_specific_ES_list = list(
c("monoc" = 0.05),
c("monoc" = 0.1),
c("monoc" = 0.2)),
indiv_colname = "indiv",
methods = c("poisson","pseudoBulkLinear"),
nindivs = c(50, 200),
ncells = c(10, 50),
alpha = 0.05,
power_nsim = 1000,
snp_number = 10,
gene_number = 800,
CI_nsim = 1000,
CI_conf = 0.05,
ncores = 25)
#> [1] 1.803924
#> [1] 0.05
#> [1] 1.803924
#> [1] 0.1
#> [1] 1.803924
#> [1] 0.2
#> [1] 1.803924
#> [1] 0.05
#> [1] 1.803924
#> [1] 0.1
#> [1] 1.803924
#> [1] 0.2Visualization of power results
The power analysis results can be visualized using the
visualizePowerCurve function. The cell type names in the
cellstate_vector must be included in the above power
analysis.
visualizePowerCurve(power_result = power_data,
celltype_vector = c("monoc"),
x_axis = "specifiedES",
x_facet="nindiv",
y_facet = "ncell",
col_group = "method",
geneid = "ENSG00000163221",
snpid = "1:153337943")
By swaping the x and y axis, we can show the result in a different way.
visualizePowerCurve(power_result = power_data,
celltype_vector = c("monoc"),
x_axis = "specifiedES",
x_facet="ncell",
y_facet = "nindiv",
col_group = "method",
geneid = "ENSG00000163221",
snpid = "1:153337943")