Skip to contents

Introduction

scDesignPop enables user-specified modification of eQTL effect sizes, allowing systematic benchmarking of various eQTL mapping methods with different ground truth.

Library and data preparation

Here, we use an example SingleCellExperiment object example_sce with 1000 genes and 7811 cells and an example eQTL genotype dataframe example_eqtlgeno to demonstrate the tutorial. These two objects contains the gene expression and SNP genotypes of 40 anonymized individuals while the eQTL genotype dataframe provides 2826 putative cell-type-specific eQTLs.

library(scDesignPop)
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#>     table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(SummarizedExperiment)
library(ggplot2)

data("example_sce")
data("example_eqtlgeno")

Model fitting

Step 1: construct a data list

To run scDesignPop, a list of data is required as input. This is done using the constructDataPop function. A SingleCellExperiment object and an eqtlgeno dataframe are the two main inputs needed. The eqtlgeno dataframe consists of eQTL annotations (it must have cell state, gene, SNP, chromosome, and position columns at a minimum), and genotypes across individuals (columns) for every SNP (rows). The structure of an example eqtlgeno dataframe is given below.

data_list <- constructDataPop(
    sce = example_sce,
    eqtlgeno_df = example_eqtlgeno,
    new_covariate = as.data.frame(colData(example_sce)),
    overlap_features = NULL,
    sampid_vec = NULL,
    copula_variable = "cell_type",
    slot_name = "counts",
    snp_mode = "single",
    celltype_colname = "cell_type",
    feature_colname = "gene_id",
    snp_colname = "snp_id",
    loc_colname = "POS",
    chrom_colname = "CHR",
    indiv_colname = "indiv",
    prune_thres = 0.9
    )
#> Constructing eqtlgeno list...

Step 2: fit marginal model

Next, a marginal model is specified to fit each gene using the fitMarginalPop function.
Here we use a Negative Binominal as the parametric model using "nb", and we specify mean_formula = "(1|indiv) + cell_type.

marginal_list_sel <- fitMarginalPop(
    data_list = data_list,
    mean_formula = "(1|indiv) + cell_type",
    model_family = "nb",
    interact_colnames = "cell_type",
    parallelization = "pbmcapply",
    n_threads = 20L,
    loc_colname = "POS",
    snp_colname = "snp_id",
    celltype_colname = "cell_type",
    indiv_colname = "indiv",
    filter_snps = TRUE,
    snpvar_thres = 0,
    force_formula = FALSE,
    data_maxsize = 1
    )

Modifying eQTL effects

To modify the eQTL effects fitted, we use the modifyMarginalModels function.

marginal_mod <- scDesignPop::modifyMarginalModels(
  marginal_list = marginal_list_sel["ENSG00000163221"],
  eqtlgeno_list = data_list[["eqtl_geno_list"]],
  features = "ENSG00000163221",
  celltype = "cd4nc",
  mean_log2fc = 0,
  eqtl_log2fc = 2,
  neg_ctrl = TRUE,
  mean_baseline = NULL,
  eqtl_baseline = NULL,
  mean_baseline_only = FALSE,
  eqtl_baseline_only = FALSE,
  disp_scaling = "linear",
  celltype_colname = "cell_type",
  snp_colname = "snp_id",
  verbose = TRUE,
  debug = TRUE,
  mod_scale = "link")
#> Modifying parameters for ENSG00000163221 in cd4nc celltype...
#> Per-allele link-scale eQTL effect: -0.19084 ===> 0.00000
#> celltype effect: -6.44683 ===> new value: -6.63767
#> interaction effect: -0.16812 ===> new value: 0.02273
#> << Setting conditional means to be negative controls (no eQTL effect) >>
#> conditional means at geno 0,1,2: 0.00963, 0.00796, 0.00657 ===> 0.00796, 0.00796, 0.00796
#> eQTL slope between geno 1 and 0: -0.00167 ===> new value: 0.00000
#> Phi parameter: 1.14919 ===> new value: 1.14919