Title: | Various Methods for the Two Sample Problem |
---|---|
Description: | The routine twosample_test() in this package runs the two sample test using various test statistic. The p values are found via permutation or large sample theory. The routine twosample_power() allows the calculation of the power in various cases, and plot_power() draws the corresponding power graphs. The routine run.studies allows a user to quickly study the power of a new method and how it compares to some of the standard ones. |
Authors: | Wolfgang Rolke [aut, cre]
|
Maintainer: | Wolfgang Rolke <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.1.0 |
Built: | 2025-03-09 13:42:26 UTC |
Source: | https://github.com/cran/R2sample |
This function finds the p values of several tests based on large sample theory
asymptotic_pvalues(x, n, m)
asymptotic_pvalues(x, n, m)
x |
a vector of test statistics |
n |
size of sample 1 |
m |
size of sample 2 |
A vector of p values.
This function creates the functions needed to run the various case studies.
case.studies(which, nsample = 500)
case.studies(which, nsample = 500)
which |
name of the case study. |
nsample |
=500, sample size. |
a list of functions
This function runs the chi-square test for continuous or discrete data
chi_power( rxy, alpha = 0.05, B = 1000, xparam, yparam, nbins = c(50, 10), minexpcount = 5, typeTS )
chi_power( rxy, alpha = 0.05, B = 1000, xparam, yparam, nbins = c(50, 10), minexpcount = 5, typeTS )
rxy |
a function to generate data |
alpha |
=0.05 type I error probability of test |
B |
=1000 number of simulation runs |
xparam |
vector of parameter values |
yparam |
vector of parameter values |
nbins |
=c(50, 10) number of desired bins |
minexpcount |
=5 smallest number of counts required in each bin |
typeTS |
type of problem, continuous/discrete, with/without weights |
A matrix of power values
a local function needed for the vignette
myTS2(x, y, vals)
myTS2(x, y, vals)
x |
An integer vector. |
y |
An integer vector. |
vals |
A numeric vector with the values of the discrete rv. |
A vector with test statistics
This function draws the power graph, with curves sorted by the mean power and smoothed for easier reading.
plot_power(pwr, xname = " ", title = " ", Smooth = TRUE, span = 0.25)
plot_power(pwr, xname = " ", title = " ", Smooth = TRUE, span = 0.25)
pwr |
a matrix of power values, usually from the twosample_power command |
xname |
Name of variable on x axis |
title |
(Optional) title of graph |
Smooth |
=TRUE lines are smoothed for easier reading |
span |
=0.25bandwidth of smoothing method |
plt, an object of class ggplot.
Find the power of built-in continuous two sample tests using Rcpp and parallel computing.
power_cont_R( rxy, xparam, yparam, TS, typeTS, TSextra, alpha = 0.05, B = 1000, maxProcessor )
power_cont_R( rxy, xparam, yparam, TS, typeTS, TSextra, alpha = 0.05, B = 1000, maxProcessor )
rxy |
function to generate a list with data sets x, y and (optional) vals, weights |
xparam |
first argument passed to rxy |
yparam |
second argument passed to rxy |
TS |
test statistic |
typeTS |
which format has TS? |
TSextra |
list of items passed TS |
alpha |
=0.05, the level of the hypothesis test |
B |
= 1000 number of simulation runs |
maxProcessor |
maximum number of cores to use. If maxProcessor=1 no parallel computing is used. |
A numeric vector of power values.
Find the power of built-in continuous two sample tests using Rcpp and parallel computing.
power_disc_R( rxy, xparam, yparam, TS, typeTS, TSextra, alpha = 0.05, samplingmethod = 1, B = 1000, maxProcessor )
power_disc_R( rxy, xparam, yparam, TS, typeTS, TSextra, alpha = 0.05, samplingmethod = 1, B = 1000, maxProcessor )
rxy |
function to generate a list with data sets x, y and (optional) vals, weights |
xparam |
first argument passed to rxy |
yparam |
second argument passed to rxy |
TS |
test statistic |
typeTS |
which format has TS? |
TSextra |
list of items passed TS |
alpha |
=0.05, the level of the hypothesis test |
samplingmethod |
=independence or MCMC in discrete data case |
B |
= 1000 number of simulation runs |
maxProcessor |
maximum number of cores to use. If maxProcessor=1 no parallel computing is used. |
A numeric vector of power values.
This function estimates the power of test routines that calculate p value(s)
power_newtest(TS, f, param_alt, TSextra, alpha = 0.05, B = 1000)
power_newtest(TS, f, param_alt, TSextra, alpha = 0.05, B = 1000)
TS |
routine to calculate test statistics. |
f |
routine that generates data. |
param_alt |
values of parameter under the alternative hypothesis. |
TSextra |
list passed to TS. |
alpha |
=0.05 type I error. |
B |
= 1000 number of simulation runs to estimate the power. |
A matrix of power values
the results of the included power studies
power_studies_results
power_studies_results
A list of matrices with powers
data to draw a graph in vignette
pvaluecdf
pvaluecdf
A matrix
Runs the shiny app associated with R2sample package
run_shiny()
run_shiny()
No return value, called for side effect of opening a shiny app
This function runs the case studies included in the package
run.studies( TS, study, TSextra, With.p.value = FALSE, BasicComparison = TRUE, nsample = 500, alpha = 0.05, param_alt, maxProcessor, B = 1000 )
run.studies( TS, study, TSextra, With.p.value = FALSE, BasicComparison = TRUE, nsample = 500, alpha = 0.05, param_alt, maxProcessor, B = 1000 )
TS |
routine to calculate test statistics. |
study |
either the name of the study, or its number. If missing all the studies are run. |
TSextra |
list passed to TS. |
With.p.value |
=FALSE does user supplied routine return p values? |
BasicComparison |
=TRUE if true compares tests on one default value of parameter of the alternative distribution. |
nsample |
= 500, desired sample size. |
alpha |
=0.05 type I error |
param_alt |
(list of) values of parameter under the alternative hypothesis. If missing included values are used. |
maxProcessor |
number of cores to use for parallel programming |
B |
= 1000 |
A (list of ) matrices of p.values
#The new test is a simple chisquare test: chitest = function(x, y, TSextra) { nbins=TSextra$nbins nx=length(x);ny=length(y);n=nx+ny xy=c(x,y) bins=quantile(xy, (0:nbins)/nbins) Ox=hist(x, bins, plot=FALSE)$counts Oy=hist(y, bins, plot=FALSE)$counts tmp=sqrt(sum(Ox)/sum(Oy)) chi = sum((Ox/tmp-Oy*tmp)^2/(Ox+Oy)) pval=1-pchisq(chi, nbins-1) out=ifelse(TSextra$statistic,chi,pval) names(out)="ChiSquare" out } TSextra=list(nbins=5,statistic=FALSE) # Use 5 bins and calculate p values run.studies(chitest,TSextra=TSextra, With.p.value=TRUE, B=100)
#The new test is a simple chisquare test: chitest = function(x, y, TSextra) { nbins=TSextra$nbins nx=length(x);ny=length(y);n=nx+ny xy=c(x,y) bins=quantile(xy, (0:nbins)/nbins) Ox=hist(x, bins, plot=FALSE)$counts Oy=hist(y, bins, plot=FALSE)$counts tmp=sqrt(sum(Ox)/sum(Oy)) chi = sum((Ox/tmp-Oy*tmp)^2/(Ox+Oy)) pval=1-pchisq(chi, nbins-1) out=ifelse(TSextra$statistic,chi,pval) names(out)="ChiSquare" out } TSextra=list(nbins=5,statistic=FALSE) # Use 5 bins and calculate p values run.studies(chitest,TSextra=TSextra, With.p.value=TRUE, B=100)
This function does some rounding to nice numbers
## S3 method for class 'digits' signif(x, d = 4)
## S3 method for class 'digits' signif(x, d = 4)
x |
a list of two vectors |
d |
=4 number of digits to round to |
A list with rounded vectors
test function
timecheck(dta, TS, typeTS, TSextra)
timecheck(dta, TS, typeTS, TSextra)
dta |
data set |
TS |
test statistics |
typeTS |
format of TS |
TSextra |
additional info TS |
Mean computation time
Find the power of various two sample tests using Rcpp and parallel computing.
twosample_power( f, ..., TS, TSextra, alpha = 0.05, B = 1000, nbins = c(50, 10), minexpcount = 5, UseLargeSample, samplingmethod = "independence", maxProcessor )
twosample_power( f, ..., TS, TSextra, alpha = 0.05, B = 1000, nbins = c(50, 10), minexpcount = 5, UseLargeSample, samplingmethod = "independence", maxProcessor )
f |
function to generate a list with data sets x, y and (optional) vals, weights |
... |
additional arguments passed to f, up to 2 |
TS |
routine to calculate test statistics for non-chi-square tests |
TSextra |
additional info passed to TS, if necessary |
alpha |
=0.05, the level of the hypothesis test |
B |
=1000, number of simulation runs. |
nbins |
=c(50,10), number of bins for chi large and chi small. |
minexpcount |
=5 minimum required count for chi square tests |
UseLargeSample |
should p values be found via large sample theory if n,m>10000? |
samplingmethod |
=independence or MCMC in discrete data case |
maxProcessor |
maximum number of cores to use. If maxProcessor=1 no parallel computing is used. |
A numeric vector of power values.
f=function(mu) list(x=rnorm(25), y=rnorm(25, mu)) twosample_power(f, mu=c(0,2), B=100, maxProcessor = 1) f=function(n, p) list(x=table(sample(1:5, size=1000, replace=TRUE)), y=table(sample(1:5, size=n, replace=TRUE, prob=c(1, 1, 1, 1, p))), vals=1:5) twosample_power(f, n=c(1000, 2000), p=c(1, 1.5), B=100, maxProcessor = 1)
f=function(mu) list(x=rnorm(25), y=rnorm(25, mu)) twosample_power(f, mu=c(0,2), B=100, maxProcessor = 1) f=function(n, p) list(x=table(sample(1:5, size=1000, replace=TRUE)), y=table(sample(1:5, size=n, replace=TRUE, prob=c(1, 1, 1, 1, p))), vals=1:5) twosample_power(f, n=c(1000, 2000), p=c(1, 1.5), B=100, maxProcessor = 1)
This function runs a number of two sample tests using Rcpp and parallel computing.
twosample_test( x, y, vals = NA, TS, TSextra, wx = rep(1, length(x)), wy = rep(1, length(y)), B = 5000, nbins = c(50, 10), minexpcount = 5, maxProcessor, UseLargeSample, samplingmethod = "independence", doMethods = "all" )
twosample_test( x, y, vals = NA, TS, TSextra, wx = rep(1, length(x)), wy = rep(1, length(y)), B = 5000, nbins = c(50, 10), minexpcount = 5, maxProcessor, UseLargeSample, samplingmethod = "independence", doMethods = "all" )
x |
a vector of numbers if data is continuous or of counts if data is discrete. |
y |
a vector of numbers if data is continuous or of counts if data is discrete. |
vals |
=NA, a vector of numbers, the values of a discrete random variable. NA if data is continuous data. |
TS |
routine to calculate test statistics for non-chi-square tests |
TSextra |
additional info passed to TS, if necessary |
wx |
A numeric vector of weights of x. |
wy |
A numeric vector of weights of y. |
B |
=5000, number of simulation runs for permutation test |
nbins |
=c(50,10), number of bins for chi square tests. |
minexpcount |
=5, minimum required expected counts for chi-square tests. |
maxProcessor |
maximum number of cores to use. If missing (the default) no parallel processing is used. |
UseLargeSample |
should p values be found via large sample theory if n,m>10000? |
samplingmethod |
="independence" or "MCMC" for discrete data |
doMethods |
="all" Which methods should be included? If missing all methods are used. |
A list of two numeric vectors, the test statistics and the p values.
R2sample::twosample_test(rnorm(1000), rt(1000, 4), B=1000) myTS=function(x,y) {z=c(mean(x)-mean(y),sd(x)-sd(y));names(z)=c("M","S");z} R2sample::twosample_test(rnorm(1000), rt(1000, 4), TS=myTS, B=1000) vals=1:5 x=table(sample(vals, size=100, replace=TRUE)) y=table(sample(vals, size=100, replace=TRUE, prob=c(1,1,3,1,1))) R2sample::twosample_test(x, y, vals)
R2sample::twosample_test(rnorm(1000), rt(1000, 4), B=1000) myTS=function(x,y) {z=c(mean(x)-mean(y),sd(x)-sd(y));names(z)=c("M","S");z} R2sample::twosample_test(rnorm(1000), rt(1000, 4), TS=myTS, B=1000) vals=1:5 x=table(sample(vals, size=100, replace=TRUE)) y=table(sample(vals, size=100, replace=TRUE, prob=c(1,1,3,1,1))) R2sample::twosample_test(x, y, vals)
This function runs a number of two sample tests using Rcpp and parallel computing and then finds the correct p value for the combined tests.
twosample_test_adjusted_pvalue( x, y, vals = NA, TS, TSextra, wx = rep(1, length(x)), wy = rep(1, length(y)), B = c(5000, 1000), nbins = c(50, 10), minexpcount = 5, samplingmethod = "independence", doMethods )
twosample_test_adjusted_pvalue( x, y, vals = NA, TS, TSextra, wx = rep(1, length(x)), wy = rep(1, length(y)), B = c(5000, 1000), nbins = c(50, 10), minexpcount = 5, samplingmethod = "independence", doMethods )
x |
a vector of numbers if data is continuous or of counts if data is discrete. |
y |
a vector of numbers if data is continuous or of counts if data is discrete. |
vals |
=NA, a vector of numbers, the values of a discrete random variable. NA if data is continuous data. |
TS |
routine to calculate test statistics for non-chi-square tests |
TSextra |
additional info passed to TS, if necessary |
wx |
A numeric vector of weights of x. |
wy |
A numeric vector of weights of y. |
B |
=c(5000, 1000), number of simulation runs for permutation test |
nbins |
=c(50,10), number of bins for chi square tests. |
minexpcount |
= 5, minimum required expected counts for chi-square tests |
samplingmethod |
="independence" or "MCMC" for discrete data |
doMethods |
Which methods should be included? |
A list of two numeric vectors, the test statistics and the p values.
x=rnorm(100) y=rt(200, 4) R2sample::twosample_test_adjusted_pvalue(x, y, B=c(500, 500)) vals=1:5 x=table(c(1:5, sample(1:5, size=100, replace=TRUE)))-1 y=table(c(1:5, sample(1:5, size=100, replace=TRUE, prob=c(1,1,3,1,1))))-1 R2sample::twosample_test_adjusted_pvalue(x, y, vals, B=c(500, 500))
x=rnorm(100) y=rt(200, 4) R2sample::twosample_test_adjusted_pvalue(x, y, B=c(500, 500)) vals=1:5 x=table(c(1:5, sample(1:5, size=100, replace=TRUE)))-1 y=table(c(1:5, sample(1:5, size=100, replace=TRUE, prob=c(1,1,3,1,1))))-1 R2sample::twosample_test_adjusted_pvalue(x, y, vals, B=c(500, 500))