Title: | Meta-Analysis of RNA-Seq Data |
---|---|
Description: | Implementation of two p-value combination techniques (inverse normal and Fisher methods). A vignette is provided to explain how to perform a meta-analysis from two independent RNA-seq experiments. |
Authors: | Guillemette Marot [aut, cre], Andrea Rau [aut, cre], Florence Jaffrezic [aut], Samuel Blanck [ctb] |
Maintainer: | samuel Blanck <[email protected]> |
License: | GPL |
Version: | 1.0.7 |
Built: | 2025-02-16 05:17:43 UTC |
Source: | https://github.com/cran/metaRNASeq |
Implementation of two p-value combination techniques (inverse normal and Fisher methods). A vignette is provided to explain how to perform a meta-analysis from two independent RNA-seq experiments.
Package: | metaRNASeq |
Type: | Package |
Version: | 1.0.2 |
Date: | 2015-01-26 |
License: | GPL |
Andrea Rau, Guillemette Marot, Florence Jaffrezic
Maintainer: Guillemette Marot <[email protected]>
A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91
#An User's guide with detailed examples can be downloaded in interactive R sessions if(interactive()){ vignette("metaRNASeq") }
#An User's guide with detailed examples can be downloaded in interactive R sessions if(interactive()){ vignette("metaRNASeq") }
The adjusted p-values provided here result from the following procedure:
1) simulation of two RNA-seq experiments with four replicates in each condition via the sim.function
, 2) analysis of differentially expressed tags using the DESeq package.
data(adjpval)
data(adjpval)
List of length 2, where each list is a vector containing the adjusted p-values for 14,456 genes from individual differential analyses (obtained using DESeq v1.8.3) of each of the simulated RNA-seq datasets.
It is possible to reproduce these adjusted p-values using the procedure described in the package vignette.
A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91
data(adjpval) ## Maybe str(adjpval)
data(adjpval) ## Maybe str(adjpval)
Gamma regression parameters describing the mean-dispersion relationship for each of the two real datasets considered in the associated paper, as estimated using the DESeq package version 1.8.3 (Anders and Huber, 2010).
data(dispFuncs)
data(dispFuncs)
List of length 2, where each list is a vector containing the two estimated coefficients ( and
) for the gamma regression in each study (see details below).
The dispFuncs
object contains the estimated coefficients from the parametric gamma regressions
describing the mean-dispersion relationship for the two real datasets considered in the associated paper.
The gamma regressions were estimated using the DESeq package version 1.8.3 (Anders and Huber, 2010).
Briefly, after estimating a per-gene mean expression and dispersion values, the DESeq package fits a curve
through these estimates. These fitted values correspond to an estimation of the typical relationship between
mean expression values and dispersions
within a given dataset. By default, this relationship
is estimated using a gamma-family generalized linear model (GLM), where two coefficients
and
are found to parameterize the fit as
.
For the first dataset (F078), the estimated mean-dispersion relationship is described using the following gamma-family GLM:
For the second dataset (F088), the estimated mean-dispersion relationship is described using the following gamma-family GLM:
These gamma-family GLMs describing the mean-dispersions relationship in each of the two datasets are used in this package to simulate data using dispersion parameters that are as realistic as possible.
A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91
S. Anders and W. Huber (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106.
data(dispFuncs)
data(dispFuncs)
The FC provided here result from the following procedure:
1) simulation of two RNA-seq experiments with four replicates in each condition via the sim.function
, 2) analysis of differentially expressed tags using the DESeq package.
data(FC)
data(FC)
List of length 2, where each list is a vector containing the FC for 14,456 genes from individual differential analyses (obtained using DESeq v1.8.3) of each of the simulated RNA-seq datasets.
It is possible to reproduce these FC using the procedure described in the package vignette.
A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91
data(FC) ## Maybe str(FC)
data(FC) ## Maybe str(FC)
Combines one sided p-values using Fisher's method.
fishercomb(indpval, BHth = 0.05)
fishercomb(indpval, BHth = 0.05)
indpval |
List of vectors of one sided p-values to be combined. |
BHth |
Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%. |
The test statistic for each gene g is defined as
where corresponds to the raw p-value obtained for gene g in a differential
analysis for study s (assumed to be uniformly distributed under the null hypothesis). Under the
null hypothesis, the test statistic
follows a
distribution with 2S
degrees of freedom. Classical procedures for the correction of multiple testing, such as that of Benjamini
and Hochberg (1995) may subsequently be applied to the obtained p-values to control the false
discovery rate at a desired rate
.
DEindices |
Indices of differentially expressed genes at the chosen Benjamini Hochberg threshold. |
TestStatistic |
Vector with test statistics for differential expression in the meta-analysis. |
rawpval |
Vector with raw p-values for differential expression in the meta-analysis. |
adjpval |
Vector with adjusted p-values for differential expression in the meta-analysis. |
Y. Benjamini and Y. Hochberg (1995). Controlling the false discovery rate: a pratical and powerful approach to multiple testing. JRSS B (57): 289-300.
M. Brown (1975). A method for combining non-independent, one-sided tests of significance. Biometrics 31(4): 987-992.
A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91
data(rawpval) fishcomb <- fishercomb(rawpval, BHth = 0.05) DE <- ifelse(fishcomb$adjpval<=0.05,1,0) hist(fishcomb$rawpval,nclass=100) ## A more detailed example is given in the vignette of the package: ## vignette("metaRNASeq")
data(rawpval) fishcomb <- fishercomb(rawpval, BHth = 0.05) DE <- ifelse(fishcomb$adjpval<=0.05,1,0) hist(fishcomb$rawpval,nclass=100) ## A more detailed example is given in the vignette of the package: ## vignette("metaRNASeq")
Calculates the gain or the loss of differentially expressed genes due to meta-analysis compared to individual studies.
IDD.IRR(meta_de, ind_de)
IDD.IRR(meta_de, ind_de)
meta_de |
Vector of differentially expressed tags (or indices of these tags) with the meta-analysis |
ind_de |
List of vectors storing differentially expressed tags (or indices of these tags) in each individual study |
DE |
Number of Differentially Expressed (DE) genes |
IDD |
Integration Driven Discoveries: number of genes that are declared DE in the meta-analysis that were not identified in any of the individual studies alone. |
Loss |
Number of genes that are declared DE in individual studies but not in meta-analysis. |
IDR |
Integration-driven Discovery Rate: proportion of genes that are identified as DE in the meta-analysis that were not identified in any of the individual studies alone. |
IRR |
Integration-driven Revision Rate: percentage of genes that are declared DE in individual studies but not in meta-analysis. |
Guillemette Marot
Marot, G., Foulley, J.-L., Mayer, C.-D., Jaffrezic, F. (2009) Moderated effect size and p-value combinations for microarray meta-analyses. Bioinformatics. 25 (20): 2692-2699.
data(rawpval) adjpval<-lapply(rawpval, FUN=function(x) p.adjust(x, method="BH")) ind_smalladjp<-lapply(adjpval, FUN=function(x) which(x <= 0.05)) #indicators corresponding to the inverse normal p-value combination invnormcomb <- invnorm(rawpval,nrep=c(8,8), BHth = 0.05) IDD.IRR(invnormcomb$DEindices,ind_smalladjp) #indicators corresponding to the p-value combination with Fisher's method fishcomb <- fishercomb(rawpval, BHth = 0.05) IDD.IRR(fishcomb$DEindices,ind_smalladjp)
data(rawpval) adjpval<-lapply(rawpval, FUN=function(x) p.adjust(x, method="BH")) ind_smalladjp<-lapply(adjpval, FUN=function(x) which(x <= 0.05)) #indicators corresponding to the inverse normal p-value combination invnormcomb <- invnorm(rawpval,nrep=c(8,8), BHth = 0.05) IDD.IRR(invnormcomb$DEindices,ind_smalladjp) #indicators corresponding to the p-value combination with Fisher's method fishcomb <- fishercomb(rawpval, BHth = 0.05) IDD.IRR(fishcomb$DEindices,ind_smalladjp)
Combines one sided p-values using the inverse normal method.
invnorm(indpval, nrep, BHth = 0.05)
invnorm(indpval, nrep, BHth = 0.05)
indpval |
List of vectors of one sided p-values to be combined. |
nrep |
Vector of numbers of replicates used in each study to calculate the previous one-sided p-values. |
BHth |
Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%. |
For each gene g, let
where corresponds to the raw p-value obtained for gene g in a differential
analysis for study s (assumed to be uniformly distributed under the null hypothesis),
the
cumulative distribution function of the standard normal distribution, and
a set of weights.
We define the weights
as in Marot and Mayer (2009):
where is the total number of biological replicates in study s. This allows
studies with large numbers of biological replicates to be attributed a larger weight than smaller studies.
Under the null hypothesis, the test statistic follows a N(0,1) distribution. A unilateral
test on the righthand tail of the distribution may then be performed, and classical procedures for the
correction of multiple testing, such as that of Benjamini and Hochberg (1995), may subsequently be applied to
the obtained p-values to control the false discovery rate at a desired level
.
DEindices |
Indices of differentially expressed genes at the chosen Benjamini Hochberg threshold. |
TestStatistic |
Vector with test statistics for differential expression in the meta-analysis. |
rawpval |
Vector with raw p-values for differential expression in the meta-analysis. |
adjpval |
Vector with adjusted p-values for differential expression in the meta-analysis. |
This function resembles the function directpvalcombi
in the metaMA R package; there is, however, one
important difference in the calculation of p-values. In particular, for microarray data, it is typically
advised to separate under- and over-expressed genes prior to the meta-analysis. In the case of RNA-seq data,
differential analyses from individual studies typically make use of negative binomial models and exact tests,
which lead to one-sided, rather than two-sided, p-values. As such, we suggest performing a meta-analysis over
the full set of genes, followed by an a posteriori check, and if necessary filter, of genes with conflicting
results (over vs. under expression) among studies.
Y. Benjamini and Y. Hochberg (1995). Controlling the false discovery rate: a pratical and powerful approach to multiple testing. JRSS B (57): 289-300.
Hedges, L. and Olkin, I. (1985). Statistical Methods for Meta-Analysis. Academic Press.
Marot, G. and Mayer, C.-D. (2009). Sequential analysis for microarray data based on sensitivity and meta-analysis. SAGMB 8(1): 1-33.
A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91
data(rawpval) ## 8 replicates simulated in each study invnormcomb <- invnorm(rawpval,nrep=c(8,8), BHth = 0.05) DE <- ifelse(invnormcomb$adjpval<=0.05,1,0) hist(invnormcomb$rawpval,nclass=100) ## A more detailed example is given in the vignette of the package: ## vignette("metaRNASeq")
data(rawpval) ## 8 replicates simulated in each study invnormcomb <- invnorm(rawpval,nrep=c(8,8), BHth = 0.05) DE <- ifelse(invnormcomb$adjpval<=0.05,1,0) hist(invnormcomb$rawpval,nclass=100) ## A more detailed example is given in the vignette of the package: ## vignette("metaRNASeq")
Mean simulation parameters obtained from the analysis of a real dataset
data(param)
data(param)
A data frame with 26408 observations on the following 3 variables.
mucond1
a numeric vector with mean parameters for condition 1
mucond2
a numeric vector with mean parameters for condition 2
DE
a logical vector indicating which tags are differentially expressed (value 1)
Mean parameters provided in this package are empirical means (obtained after normalization for library size differences) of real data described in the following references.
Supplementary material of (Dillies et al., 2013) paper.
M.A. Dillies, A. Rau, J. Aubert, C. Hennequet-Antier, M. Jeanmougin, N. Servant, C. Keime, G. Marot, D. Castel, J. Estelle, G. Guernec, B. Jagla, L. Jouneau, D. Laloe, C. Le Gall, B. Schaeffer, S. Le Crom, M. Guedj and F. Jaffrezic, on behalf of the French StatOmique Consortium (2013) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Briefings in Bioinformatics 14(6):671-83 .
T. Strub, S. Giuliano, T. Ye, et al. (2011) Essential role of microphthalmia transcription factor for DNA replication, mitosis and genomic stability in melanoma. emphOncogene 30:2319-32.
data(param)
data(param)
The p-values provided here result from the following procedure:
1) simulation of two RNA-seq experiments with four replicates in each condition via the sim.function
, 2) analysis of differentially expressed tags using the DESeq package.
data(rawpval)
data(rawpval)
List of length 2, where each list is a vector containing the raw p-values for 14,456 genes from individual differential analyses (obtained using DESeq v1.8.3) of each of the simulated RNA-seq datasets.
It is possible to reproduce these p-values using the procedure described in the package vignette.
A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91
data(rawpval) ## Maybe str(rawpval)
data(rawpval) ## Maybe str(rawpval)
Simulate data arising from multiple independent RNA-seq experiments
sim.function(param, dispFuncs, nrep = 4, classes = NULL, inter.sd = 0.3)
sim.function(param, dispFuncs, nrep = 4, classes = NULL, inter.sd = 0.3)
param |
Mean expression levels: |
dispFuncs |
List of length equal to the number of studies to be simulated, containing the gamma regression parameters describing the mean-dispersion relationship for each one; these are the mean-dispersion functions linking mean and intra-study variability for each independent experiment. |
nrep |
Number of replicates to be simulated in each condition in each study. Ignored if |
classes |
List of class memberships, one per study (maximum 20 studies). Each vector or factor of the list can only
contain two levels which correspond to the two conditions studied. If NULL, |
inter.sd |
Inter-study variability. By default, |
Details about the simulation procedure are given in the following paper:
A matrix with simulated expression levels, one row per gene and one column per replicate. Names of studies are given in the column names of the matrix.
If the param
data provided in this package are not used to simulate data, one should check that the
per-condition means in param
are reasonable. Note also that for genes to be simulated as non-differentially
expressed, the values of "mucond1" and "mucond2" in param
should be equal.
A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91
## Load simulation parameters data(param) data(dispFuncs) ## Simulate data matsim <- sim.function(param = param, dispFuncs = dispFuncs) sim.conds <- colnames(matsim) rownames(matsim) <-paste("tag", 1:dim(matsim)[1],sep="") # extract simulated data from one study simstudy1 <- extractfromsim(matsim,"study1") head(simstudy1$study) simstudy1$pheno
## Load simulation parameters data(param) data(dispFuncs) ## Simulate data matsim <- sim.function(param = param, dispFuncs = dispFuncs) sim.conds <- colnames(matsim) rownames(matsim) <-paste("tag", 1:dim(matsim)[1],sep="") # extract simulated data from one study simstudy1 <- extractfromsim(matsim,"study1") head(simstudy1$study) simstudy1$pheno