Power analysis based on user specified eQTL effect sizes

Chris Dong

Department of Statistics and Data Science, University of California, Los Angeles
cycd@g.ucla.edu

Yihui Cen

Department of Computational Medicine, University of California, Los Angeles
yihuicen@g.ucla.edu

13 February 2026

Source: vignettes/scDesignPop-power-analysis-ES-specification.Rmd

scDesignPop-power-analysis-ES-specification.Rmd

Introduction

scDesignPop provides power analysis tools at cell-type-specific level. Users can specify cell-type-specific eQTL effect sizes to check hypothetical scenarios and how power curve will change based on eQTL effect size specifications.

Library and data preparation

For data preparation, a list of data is required as input. This is done using the constructDataPop function. A SingleCellExperiment object and an eqtlgeno dataframe are the two main inputs needed. The eqtlgeno dataframe consists of eQTL annotations (it must have cell state, gene, SNP, chromosome, and position columns at a minimum), and genotypes across individuals (columns) for every SNP (rows). The structure of an example eqtlgeno dataframe is given below.

library(scDesignPop)
library(SingleCellExperiment)

data("example_sce")
data("example_eqtlgeno")
example_sce_sel <- example_sce[c("ENSG00000163221","ENSG00000135218"),]
example_eqtlgeno_sel <- example_eqtlgeno[
    which(example_eqtlgeno$gene_id%in%c("ENSG00000163221","ENSG00000135218")),]

data_list_sel <- constructDataPop(
    sce = example_sce_sel,
    eqtlgeno_df = example_eqtlgeno_sel,
    new_covariate = as.data.frame(colData(example_sce_sel)),
    overlap_features = NULL,
    sampid_vec = NULL,
    copula_variable = "cell_type",
    slot_name = "counts",
    snp_mode = "single",
    celltype_colname = "cell_type",
    feature_colname = "gene_id",
    snp_colname = "snp_id",
    loc_colname = "POS",
    chrom_colname = "CHR",
    indiv_colname = "indiv",
    prune_thres = 0.9
    )

Fitting the marginal model

Next, a marginal model is specified to fit each gene using the fitMarginalPop function.
Here we use a Negative Binominal as the parametric model using "nb".

marginal_list_sel <- fitMarginalPop(
    data_list = data_list_sel,
    mean_formula = "(1|indiv) + cell_type",
    model_family = "nb",
    interact_colnames = "cell_type",
    parallelization = "pbmcapply",
    n_threads = 1L,
    loc_colname = "POS",
    snp_colname = "snp_id",
    celltype_colname = "cell_type",
    indiv_colname = "indiv",
    filter_snps = TRUE,
    snpvar_thres = 0,
    force_formula = FALSE,
    data_maxsize = 1
    )

Performing power analysis with user specified eQTL effect size at cell-type level

Given fitted marginal model, scDesignPop can perform simulation-based power analysis for a specific gene-SNP pair across selected cell types using the runPowerAnalysis function. Based on the previous naming of covariates, we specify the fitted snpid as "1:153337943", the name of the column for fixed cell state effect and random individual effect as "cell_type" and "indiv" in the input parameters. To check these namings, we can call the covariate data frame using marginal_list_sel[["ENSG00000163221"]]$fit$frame. The selected cell types for testing are specified in cellstate_vector and have to be consistent with the covariate data frame.

Particarly, parameters snp_number and gene_number are used to account for multiple testing correction with Bonferroni correction. Parameter methods is used to specify the marginal eQTL model from c("nb", "poisson", "gaussian", "pseudoBulkLinear"). Parameter nindivs and ncells are used to specify the number of individuals and number of cells per individual, from which we can analyze the performance of power analysis and find the optimal setting. Here, we set power_nsim = 1000 to increase the number of simulations so we can calculate power with a higher resolution. Using power_nsim = 100 in default or smaller values can reduce the computation time cost.

Specifically, users can specify different eQTL effect sizes in each cell type using the input parameter celltype_specific_ES_list as follows, where each element of the list represents one scenario allowing eQTL effect sizes in multiple cell types can be specified at the same time. (bmem cell type is also included but without user specification)

set.seed(123)
power_data <- runPowerAnalysis(marginal_list = marginal_list_sel,
                               marginal_model = "nb",
                               geneid = "ENSG00000163221",
                               snpid = "1:153337943",
                               celltype_colname = "cell_type",
                               celltype_vector = c("monoc"),
                               celltype_specific_ES_list = list(
                                   c("monoc" = 0.05),
                                   c("monoc" = 0.1),
                                   c("monoc" = 0.2)),
                               indiv_colname = "indiv",
                               methods = c("poisson","pseudoBulkLinear"),
                               nindivs = c(50, 200),
                               ncells = c(10, 50),
                               alpha = 0.05,
                               power_nsim = 1000,
                               snp_number = 10,
                               gene_number = 800,
                               CI_nsim = 1000,
                               CI_conf = 0.05,
                               ncores = 25)
#> [1] 1.803924
#> [1] 0.05
#> [1] 1.803924
#> [1] 0.1
#> [1] 1.803924
#> [1] 0.2
#> [1] 1.803924
#> [1] 0.05
#> [1] 1.803924
#> [1] 0.1
#> [1] 1.803924
#> [1] 0.2

Visualization of power results

The power analysis results can be visualized using the visualizePowerCurve function. The cell type names in the cellstate_vector must be included in the above power analysis.

visualizePowerCurve(power_result = power_data,
                    celltype_vector = c("monoc"),
                    x_axis = "specifiedES",
                    x_facet="nindiv",
                    y_facet = "ncell",
                    col_group = "method",
                    geneid = "ENSG00000163221",
                    snpid = "1:153337943")

By swaping the x and y axis, we can show the result in a different way.

visualizePowerCurve(power_result = power_data,
                    celltype_vector = c("monoc"),
                    x_axis = "specifiedES",
                    x_facet="ncell",
                    y_facet = "nindiv",
                    col_group = "method",
                    geneid = "ENSG00000163221",
                    snpid = "1:153337943")