Package 'R2sample' reference manual

Title:	Various Methods for the Two Sample Problem
Description:	The routine twosample_test() in this package runs the two sample test using various test statistic. The p values are found via permutation or large sample theory. The routine twosample_power() allows the calculation of the power in various cases, and plot_power() draws the corresponding power graphs. The routine run.studies allows a user to quickly study the power of a new method and how it compares to some of the standard ones.
Authors:	Wolfgang Rolke [aut, cre]
Maintainer:	Wolfgang Rolke <[email protected]>
License:	GPL (>= 2)
Version:	3.1.0
Built:	2025-03-09 13:42:26 UTC
Source:	https://github.com/cran/R2sample

This function finds the p values of several tests based on large sample theory

Description

This function finds the p values of several tests based on large sample theory

Usage

asymptotic_pvalues(x, n, m)
asymptotic_pvalues(x, n, m)

Arguments

`x`	a vector of test statistics
`n`	size of sample 1
`m`	size of sample 2

Value

A vector of p values.

This function creates the functions needed to run the various case studies.

Description

This function creates the functions needed to run the various case studies.

Usage

case.studies(which, nsample = 500)
case.studies(which, nsample = 500)

Arguments

`which`	name of the case study.
`nsample`	=500, sample size.

Value

a list of functions

This function runs the chi-square test for continuous or discrete data

Description

This function runs the chi-square test for continuous or discrete data

Usage

chi_power(
  rxy,
  alpha = 0.05,
  B = 1000,
  xparam,
  yparam,
  nbins = c(50, 10),
  minexpcount = 5,
  typeTS
)
chi_power(
  rxy,
  alpha = 0.05,
  B = 1000,
  xparam,
  yparam,
  nbins = c(50, 10),
  minexpcount = 5,
  typeTS
)

Arguments

`rxy`	a function to generate data
`alpha`	=0.05 type I error probability of test
`B`	=1000 number of simulation runs
`xparam`	vector of parameter values
`yparam`	vector of parameter values
`nbins`	=c(50, 10) number of desired bins
`minexpcount`	=5 smallest number of counts required in each bin
`typeTS`	type of problem, continuous/discrete, with/without weights

Value

A matrix of power values

a local function needed for the vignette

Description

a local function needed for the vignette

Usage

myTS2(x, y, vals)
myTS2(x, y, vals)

Arguments

`x`	An integer vector.
`y`	An integer vector.
`vals`	A numeric vector with the values of the discrete rv.

Value

A vector with test statistics

This function draws the power graph, with curves sorted by the mean power and smoothed for easier reading.

Description

This function draws the power graph, with curves sorted by the mean power and smoothed for easier reading.

Usage

plot_power(pwr, xname = " ", title = " ", Smooth = TRUE, span = 0.25)
plot_power(pwr, xname = " ", title = " ", Smooth = TRUE, span = 0.25)

Arguments

`pwr`	a matrix of power values, usually from the twosample_power command
`xname`	Name of variable on x axis
`title`	(Optional) title of graph
`Smooth`	=TRUE lines are smoothed for easier reading
`span`	=0.25bandwidth of smoothing method

Value

plt, an object of class ggplot.

Find the power of built-in continuous two sample tests using Rcpp and parallel computing.

Description

Find the power of built-in continuous two sample tests using Rcpp and parallel computing.

Usage

power_cont_R(
  rxy,
  xparam,
  yparam,
  TS,
  typeTS,
  TSextra,
  alpha = 0.05,
  B = 1000,
  maxProcessor
)
power_cont_R(
  rxy,
  xparam,
  yparam,
  TS,
  typeTS,
  TSextra,
  alpha = 0.05,
  B = 1000,
  maxProcessor
)

Arguments

`rxy`	function to generate a list with data sets x, y and (optional) vals, weights
`xparam`	first argument passed to rxy
`yparam`	second argument passed to rxy
`TS`	test statistic
`typeTS`	which format has TS?
`TSextra`	list of items passed TS
`alpha`	=0.05, the level of the hypothesis test
`B`	= 1000 number of simulation runs
`maxProcessor`	maximum number of cores to use. If maxProcessor=1 no parallel computing is used.

Value

A numeric vector of power values.

Find the power of built-in continuous two sample tests using Rcpp and parallel computing.

Description

Find the power of built-in continuous two sample tests using Rcpp and parallel computing.

Usage

power_disc_R(
  rxy,
  xparam,
  yparam,
  TS,
  typeTS,
  TSextra,
  alpha = 0.05,
  samplingmethod = 1,
  B = 1000,
  maxProcessor
)
power_disc_R(
  rxy,
  xparam,
  yparam,
  TS,
  typeTS,
  TSextra,
  alpha = 0.05,
  samplingmethod = 1,
  B = 1000,
  maxProcessor
)

Arguments

`rxy`	function to generate a list with data sets x, y and (optional) vals, weights
`xparam`	first argument passed to rxy
`yparam`	second argument passed to rxy
`TS`	test statistic
`typeTS`	which format has TS?
`TSextra`	list of items passed TS
`alpha`	=0.05, the level of the hypothesis test
`samplingmethod`	=independence or MCMC in discrete data case
`B`	= 1000 number of simulation runs
`maxProcessor`	maximum number of cores to use. If maxProcessor=1 no parallel computing is used.

Value

A numeric vector of power values.

This function estimates the power of test routines that calculate p value(s)

Description

This function estimates the power of test routines that calculate p value(s)

Usage

power_newtest(TS, f, param_alt, TSextra, alpha = 0.05, B = 1000)
power_newtest(TS, f, param_alt, TSextra, alpha = 0.05, B = 1000)

Arguments

`TS`	routine to calculate test statistics.
`f`	routine that generates data.
`param_alt`	values of parameter under the alternative hypothesis.
`TSextra`	list passed to TS.
`alpha`	=0.05 type I error.
`B`	= 1000 number of simulation runs to estimate the power.

Value

A matrix of power values

power_studies_results

Description

the results of the included power studies

Usage

power_studies_results
power_studies_results

Format

'power_studies_results'

A list of matrices with powers

pvaluecdf

Description

data to draw a graph in vignette

Usage

pvaluecdf
pvaluecdf

Format

'pvaluecdf'

A matrix

Runs the shiny app associated with R2sample package

Description

Runs the shiny app associated with R2sample package

Usage

run_shiny()
run_shiny()

Value

No return value, called for side effect of opening a shiny app

This function runs the case studies included in the package

Description

This function runs the case studies included in the package

Usage

run.studies(
  TS,
  study,
  TSextra,
  With.p.value = FALSE,
  BasicComparison = TRUE,
  nsample = 500,
  alpha = 0.05,
  param_alt,
  maxProcessor,
  B = 1000
)
run.studies(
  TS,
  study,
  TSextra,
  With.p.value = FALSE,
  BasicComparison = TRUE,
  nsample = 500,
  alpha = 0.05,
  param_alt,
  maxProcessor,
  B = 1000
)

Arguments

`TS`	routine to calculate test statistics.
`study`	either the name of the study, or its number. If missing all the studies are run.
`TSextra`	list passed to TS.
`With.p.value`	=FALSE does user supplied routine return p values?
`BasicComparison`	=TRUE if true compares tests on one default value of parameter of the alternative distribution.
`nsample`	= 500, desired sample size.
`alpha`	=0.05 type I error
`param_alt`	(list of) values of parameter under the alternative hypothesis. If missing included values are used.
`maxProcessor`	number of cores to use for parallel programming
`B`	= 1000

Value

A (list of ) matrices of p.values

Examples

#The new test is a simple chisquare test:
chitest = function(x, y, TSextra) {
   nbins=TSextra$nbins
   nx=length(x);ny=length(y);n=nx+ny
   xy=c(x,y)
   bins=quantile(xy, (0:nbins)/nbins)
   Ox=hist(x, bins, plot=FALSE)$counts
   Oy=hist(y, bins, plot=FALSE)$counts
   tmp=sqrt(sum(Ox)/sum(Oy))
   chi = sum((Ox/tmp-Oy*tmp)^2/(Ox+Oy))
   pval=1-pchisq(chi, nbins-1)
   out=ifelse(TSextra$statistic,chi,pval)
   names(out)="ChiSquare"
   out
}
TSextra=list(nbins=5,statistic=FALSE) # Use 5 bins and calculate p values
run.studies(chitest,TSextra=TSextra, With.p.value=TRUE, B=100)
#The new test is a simple chisquare test:
chitest = function(x, y, TSextra) {
   nbins=TSextra$nbins
   nx=length(x);ny=length(y);n=nx+ny
   xy=c(x,y)
   bins=quantile(xy, (0:nbins)/nbins)
   Ox=hist(x, bins, plot=FALSE)$counts
   Oy=hist(y, bins, plot=FALSE)$counts
   tmp=sqrt(sum(Ox)/sum(Oy))
   chi = sum((Ox/tmp-Oy*tmp)^2/(Ox+Oy))
   pval=1-pchisq(chi, nbins-1)
   out=ifelse(TSextra$statistic,chi,pval)
   names(out)="ChiSquare"
   out
}
TSextra=list(nbins=5,statistic=FALSE) # Use 5 bins and calculate p values
run.studies(chitest,TSextra=TSextra, With.p.value=TRUE, B=100)

This function does some rounding to nice numbers

Description

This function does some rounding to nice numbers

Usage

## S3 method for class 'digits'
signif(x, d = 4)
## S3 method for class 'digits'
signif(x, d = 4)

Arguments

`x`	a list of two vectors
`d`	=4 number of digits to round to

Value

A list with rounded vectors

test function

Description

test function

Usage

timecheck(dta, TS, typeTS, TSextra)
timecheck(dta, TS, typeTS, TSextra)

Arguments

`dta`	data set
`TS`	test statistics
`typeTS`	format of TS
`TSextra`	additional info TS

Value

Mean computation time

Find the power of various two sample tests using Rcpp and parallel computing.

Description

Find the power of various two sample tests using Rcpp and parallel computing.

Usage

twosample_power(
  f,
  ...,
  TS,
  TSextra,
  alpha = 0.05,
  B = 1000,
  nbins = c(50, 10),
  minexpcount = 5,
  UseLargeSample,
  samplingmethod = "independence",
  maxProcessor
)
twosample_power(
  f,
  ...,
  TS,
  TSextra,
  alpha = 0.05,
  B = 1000,
  nbins = c(50, 10),
  minexpcount = 5,
  UseLargeSample,
  samplingmethod = "independence",
  maxProcessor
)

Arguments

`f`	function to generate a list with data sets x, y and (optional) vals, weights
`...`	additional arguments passed to f, up to 2
`TS`	routine to calculate test statistics for non-chi-square tests
`TSextra`	additional info passed to TS, if necessary
`alpha`	=0.05, the level of the hypothesis test
`B`	=1000, number of simulation runs.
`nbins`	=c(50,10), number of bins for chi large and chi small.
`minexpcount`	=5 minimum required count for chi square tests
`UseLargeSample`	should p values be found via large sample theory if n,m>10000?
`samplingmethod`	=independence or MCMC in discrete data case
`maxProcessor`	maximum number of cores to use. If maxProcessor=1 no parallel computing is used.

Value

A numeric vector of power values.

Examples

 f=function(mu) list(x=rnorm(25), y=rnorm(25, mu))
 twosample_power(f, mu=c(0,2), B=100, maxProcessor = 1)
 f=function(n, p) list(x=table(sample(1:5, size=1000, replace=TRUE)), 
       y=table(sample(1:5, size=n, replace=TRUE, 
       prob=c(1, 1, 1, 1, p))), vals=1:5)
 twosample_power(f, n=c(1000, 2000), p=c(1, 1.5), B=100, maxProcessor = 1)
f=function(mu) list(x=rnorm(25), y=rnorm(25, mu))
 twosample_power(f, mu=c(0,2), B=100, maxProcessor = 1)
 f=function(n, p) list(x=table(sample(1:5, size=1000, replace=TRUE)), 
       y=table(sample(1:5, size=n, replace=TRUE, 
       prob=c(1, 1, 1, 1, p))), vals=1:5)
 twosample_power(f, n=c(1000, 2000), p=c(1, 1.5), B=100, maxProcessor = 1)

This function runs a number of two sample tests using Rcpp and parallel computing.

Description

This function runs a number of two sample tests using Rcpp and parallel computing.

Usage

twosample_test(
  x,
  y,
  vals = NA,
  TS,
  TSextra,
  wx = rep(1, length(x)),
  wy = rep(1, length(y)),
  B = 5000,
  nbins = c(50, 10),
  minexpcount = 5,
  maxProcessor,
  UseLargeSample,
  samplingmethod = "independence",
  doMethods = "all"
)
twosample_test(
  x,
  y,
  vals = NA,
  TS,
  TSextra,
  wx = rep(1, length(x)),
  wy = rep(1, length(y)),
  B = 5000,
  nbins = c(50, 10),
  minexpcount = 5,
  maxProcessor,
  UseLargeSample,
  samplingmethod = "independence",
  doMethods = "all"
)

Arguments

`x`	a vector of numbers if data is continuous or of counts if data is discrete.
`y`	a vector of numbers if data is continuous or of counts if data is discrete.
`vals`	=NA, a vector of numbers, the values of a discrete random variable. NA if data is continuous data.
`TS`	routine to calculate test statistics for non-chi-square tests
`TSextra`	additional info passed to TS, if necessary
`wx`	A numeric vector of weights of x.
`wy`	A numeric vector of weights of y.
`B`	=5000, number of simulation runs for permutation test
`nbins`	=c(50,10), number of bins for chi square tests.
`minexpcount`	=5, minimum required expected counts for chi-square tests.
`maxProcessor`	maximum number of cores to use. If missing (the default) no parallel processing is used.
`UseLargeSample`	should p values be found via large sample theory if n,m>10000?
`samplingmethod`	="independence" or "MCMC" for discrete data
`doMethods`	="all" Which methods should be included? If missing all methods are used.

Value

A list of two numeric vectors, the test statistics and the p values.

Examples

 R2sample::twosample_test(rnorm(1000), rt(1000, 4), B=1000)
 myTS=function(x,y) {z=c(mean(x)-mean(y),sd(x)-sd(y));names(z)=c("M","S");z}
 R2sample::twosample_test(rnorm(1000), rt(1000, 4), TS=myTS, B=1000)
 vals=1:5
 x=table(sample(vals, size=100, replace=TRUE))
 y=table(sample(vals, size=100, replace=TRUE, prob=c(1,1,3,1,1)))
 R2sample::twosample_test(x, y, vals)
R2sample::twosample_test(rnorm(1000), rt(1000, 4), B=1000)
 myTS=function(x,y) {z=c(mean(x)-mean(y),sd(x)-sd(y));names(z)=c("M","S");z}
 R2sample::twosample_test(rnorm(1000), rt(1000, 4), TS=myTS, B=1000)
 vals=1:5
 x=table(sample(vals, size=100, replace=TRUE))
 y=table(sample(vals, size=100, replace=TRUE, prob=c(1,1,3,1,1)))
 R2sample::twosample_test(x, y, vals)

This function runs a number of two sample tests using Rcpp and parallel computing and then finds the correct p value for the combined tests.

Description

This function runs a number of two sample tests using Rcpp and parallel computing and then finds the correct p value for the combined tests.

Usage

twosample_test_adjusted_pvalue(
  x,
  y,
  vals = NA,
  TS,
  TSextra,
  wx = rep(1, length(x)),
  wy = rep(1, length(y)),
  B = c(5000, 1000),
  nbins = c(50, 10),
  minexpcount = 5,
  samplingmethod = "independence",
  doMethods
)
twosample_test_adjusted_pvalue(
  x,
  y,
  vals = NA,
  TS,
  TSextra,
  wx = rep(1, length(x)),
  wy = rep(1, length(y)),
  B = c(5000, 1000),
  nbins = c(50, 10),
  minexpcount = 5,
  samplingmethod = "independence",
  doMethods
)

Arguments

`x`	a vector of numbers if data is continuous or of counts if data is discrete.
`y`	a vector of numbers if data is continuous or of counts if data is discrete.
`vals`	=NA, a vector of numbers, the values of a discrete random variable. NA if data is continuous data.
`TS`	routine to calculate test statistics for non-chi-square tests
`TSextra`	additional info passed to TS, if necessary
`wx`	A numeric vector of weights of x.
`wy`	A numeric vector of weights of y.
`B`	=c(5000, 1000), number of simulation runs for permutation test
`nbins`	=c(50,10), number of bins for chi square tests.
`minexpcount`	= 5, minimum required expected counts for chi-square tests
`samplingmethod`	="independence" or "MCMC" for discrete data
`doMethods`	Which methods should be included?

Value

A list of two numeric vectors, the test statistics and the p values.

Examples

 x=rnorm(100)
 y=rt(200, 4)
 R2sample::twosample_test_adjusted_pvalue(x, y, B=c(500, 500))
 vals=1:5
 x=table(c(1:5, sample(1:5, size=100, replace=TRUE)))-1
 y=table(c(1:5, sample(1:5, size=100, replace=TRUE, prob=c(1,1,3,1,1))))-1
 R2sample::twosample_test_adjusted_pvalue(x, y, vals, B=c(500, 500))
x=rnorm(100)
 y=rt(200, 4)
 R2sample::twosample_test_adjusted_pvalue(x, y, B=c(500, 500))
 vals=1:5
 x=table(c(1:5, sample(1:5, size=100, replace=TRUE)))-1
 y=table(c(1:5, sample(1:5, size=100, replace=TRUE, prob=c(1,1,3,1,1))))-1
 R2sample::twosample_test_adjusted_pvalue(x, y, vals, B=c(500, 500))

Package 'R2sample'

Help Index

This function finds the p values of several tests based on large sample theory

Description

Usage

Arguments

Value

This function creates the functions needed to run the various case studies.

Description

Usage

Arguments

Value

This function runs the chi-square test for continuous or discrete data

Description

Usage

Arguments

Value

a local function needed for the vignette

Description

Usage

Arguments

Value

This function draws the power graph, with curves sorted by the mean power and smoothed for easier reading.

Description

Usage

Arguments

Value

Find the power of built-in continuous two sample tests using Rcpp and parallel computing.

Description

Usage

Arguments

Value

Find the power of built-in continuous two sample tests using Rcpp and parallel computing.

Description

Usage

Arguments

Value

This function estimates the power of test routines that calculate p value(s)

Description

Usage

Arguments

Value

power_studies_results

Description

Usage

Format

'power_studies_results'

pvaluecdf

Description

Usage

Format

'pvaluecdf'

Runs the shiny app associated with R2sample package

Description

Usage

Value

This function runs the case studies included in the package

Description

Usage

Arguments

Value

Examples

This function does some rounding to nice numbers

Description

Usage

Arguments

Value

test function

Description

Usage

Arguments

Value

Find the power of various two sample tests using Rcpp and parallel computing.

Description

Usage

Arguments

Value

Examples

This function runs a number of two sample tests using Rcpp and parallel computing.

Description