Title: | Multiple Hypothesis Testing on an Aggregation Tree Method |
---|---|
Description: | An implementation of the TEAM algorithm to identify local differences between two (e.g. case and control) independent, univariate distributions, as described in J Pura, C Chan, and J Xie (2019) <arXiv:1906.07757>. The algorithm is based on embedding a multiple-testing procedure on a hierarchical structure to identify high-resolution differences between two distributions. The hierarchical structure is designed to identify strong, short-range differences at lower layers and weaker, but long-range differences at increasing layers. TEAM yields consistent layer-specific and overall false discovery rate control. |
Authors: | John Pura, Cliburn Chan, and Jichun Xie |
Maintainer: | John Pura <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2024-10-29 03:30:26 UTC |
Source: | https://github.com/cran/TEAM |
Rolling Sum over distinct chunks
chunk.sum(v, n, na.rm = TRUE)
chunk.sum(v, n, na.rm = TRUE)
v |
Numeric Vector |
n |
Size of chunk |
na.rm |
Remove NAs (default=TRUE) |
Estimate threshold to control FDR in multiple testing procedure
est.c.hat(l, n, theta0, x.l, c.hats, alpha, m.l)
est.c.hat(l, n, theta0, x.l, c.hats, alpha, m.l)
l |
Layer |
n |
Number of pooled case and control observations in each layer 1 bin |
theta0 |
Nominal boundary level for binomial parameter at layer 1 |
x.l |
Vector of case counts in each bin |
c.hats |
Previous c.hats calculated from layers 1 to l-1 |
alpha |
Nominal FDR level |
m.l |
Number of leaf hypotheses at layer l |
Step-down multiple-testing procedure
est.FDR.hat.l(min.x, max.x, c.prev, n.l, x.l, theta0, l)
est.FDR.hat.l(min.x, max.x, c.prev, n.l, x.l, theta0, l)
min.x |
lower limit of searching range |
max.x |
upper limit of searching range |
c.prev |
Previous c.hat from layer l-1 |
n.l |
Vector of number of pooled observations in layer l bins |
x.l |
Vector of case counts in each bin at layer l |
theta0 |
Nominal boundary level for binomial parameter at layer 1 |
l |
Layer |
Enumerate possible counts for calculating binomial probability
expand.mat(mat, vec)
expand.mat(mat, vec)
mat |
Matrix |
vec |
Numeric Vector |
Split a vector into distinct chunks of specified size
splitNoOverlap(vec, seg.length)
splitNoOverlap(vec, seg.length)
vec |
Numeric Vector |
seg.length |
Number of distinct chunks to split vec |
This function performs multiple testing embedded in a hierarchical structure in order to identify local differences between two independent distributions (e.g. case and control).
TEAM(x1, x2, theta0 = length(x2)/length(c(x1, x2)), K = 14, alpha = 0.05, L = 3)
TEAM(x1, x2, theta0 = length(x2)/length(c(x1, x2)), K = 14, alpha = 0.05, L = 3)
x1 |
Numeric vector of N1 control observations |
x2 |
Numeric vector of N2 case observations |
theta0 |
Nominal boundary level for binomial parameter - default is N2/(N1+N2) |
K |
log2 number of bins |
alpha |
Nominal false discovery rate (FDR) level |
L |
Number of layers in the aggregation tree |
List containing the discoveries (S.list) in each layer and the estimated layer-specific thresholds (c.hats)
Pura J. Chan C. Xie J. Multiple Testing Embedded in an Aggregation Tree to Identify where Two Distributions Differ. https://arxiv.org/abs/1906.07757
set.seed(1) # Simulate local shift difference for each population from mixture of normals N1 <- N2 <- 1e6 require(ks) #loads rnorm.mixt function #Controls x1 <- rnorm.mixt(N1,mus=c(0.2,0.89),sigmas=c(0.04,0.01),props=c(0.97,0.03)) #Cases x2 <- rnorm.mixt(N2,mus=c(0.2,0.88),sigmas=c(0.04,0.01),props=c(0.97,0.03)) res <- TEAM(x1,x2,K=14,alpha=0.05,L=3) #Discoveries in each layer - Each element is an growing set of #indices captured at each layer res$S.list #Map back final discoveries in layer 3 to corresponding regions levels(res$dat$quant)[res$S.list[[3]]]
set.seed(1) # Simulate local shift difference for each population from mixture of normals N1 <- N2 <- 1e6 require(ks) #loads rnorm.mixt function #Controls x1 <- rnorm.mixt(N1,mus=c(0.2,0.89),sigmas=c(0.04,0.01),props=c(0.97,0.03)) #Cases x2 <- rnorm.mixt(N2,mus=c(0.2,0.88),sigmas=c(0.04,0.01),props=c(0.97,0.03)) res <- TEAM(x1,x2,K=14,alpha=0.05,L=3) #Discoveries in each layer - Each element is an growing set of #indices captured at each layer res$S.list #Map back final discoveries in layer 3 to corresponding regions levels(res$dat$quant)[res$S.list[[3]]]
Enumerate matrix of valid counts for a vector of values
valid.counts(x, c.prev)
valid.counts(x, c.prev)
x |
Vector |
c.prev |
Calculated chat from layer l-1 |