nf-core/differentialabundance
Differential abundance analysis for feature/ observation matrices from platforms such as RNA-seq
Define where the pipeline should find input data and save output data.
A string to identify results in the output directory
string
study
A string identifying the technology used to produce the data
string
Path to comma-separated file containing information about the samples in the experiment.
string
^\S+\.(csv|tsv)$
A CSV file describing sample contrasts
string
^\S+\.(csv|tsv)$
A YAML file describing sample contrasts
string
^\S+\.(yaml|yml)$
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Type of abundance measure used, platform-dependent
string
counts
To how many digits should numeric output in different modules be rounded? If -1 or null, will not round.
integer
4
Ways of providing your abundance values
TSV/CSV-format abundance matrix
string
^\S+\.(tsv|csv)$|\S*proteinGroups\.txt$
(RNA-seq only): optional transcript length matrix with samples and genes as the abundance matrix
string
Alternative to matrix: a compressed CEL files archive such as often found in GEO
string
Use SOFT files from GEO by providing the GSE study identifier
string
Column in the samples sheet to be used as the primary sample identifier
string
sample
Type of observation
string
sample
Column in the sample sheet to be used as the display identifier for observations. If unset, will use value of —observations_id_col.
string
Options related to features
Feature ID attribute in the abundance table as well as in the GTF file (e.g. the gene_id field)
string
gene_id
Feature name attribute in the abundance table as well as in the GTF file (e.g. the gene symbol field)
string
gene_name
Type of feature we have, often ‘gene’
string
gene
When set, use the control features in scaling/ normalisation
boolean
A text file listing technical features (e.g. spikes)
string
Comma-separated string, specifies feature metadata columns to be used for exploratory analysis, platform-specific
string
gene_id,gene_name,gene_biotype
This parameter allows you to supply your own feature annotations. These can often be automatically derived from the GTF used upstream for RNA-seq, or from the Bioconductor annotation package (for affy arrays).
string
^\S+\.(csv|tsv)$
Options related to the use of paramsheet
Name of the paramset (as specified in paramsheet) to run
string
Path to a paramsheet file
string
^\S+\.csv$
Options for processing of affy arrays with justRMA()
Column of the sample sheet containing the Affymetrix CEL file name
string
file
logical value. If TRUE, then background correct using RMA background correction.
boolean
true
integer value indicating which RMA background to use
integer
2
logical value. If TRUE, then works on the PM matrix in place as much as possible, good for large datasets.
boolean
Used to specify the name of an alternative cdf package. If set to NULL, then the usual cdf package based on Affymetrix’ mappings will be used.
string
logical value. If TRUE, a matrix of probe annotations will be derived.
boolean
true
should the spots marked as ‘MASKS’ set to NA?
boolean
should the spots marked as ‘OUTLIERS’ set to NA?
boolean
if TRUE, then overrides what is in rm.mask and rm.oultiers.
boolean
Genome annotation file in GTF format
string
^\S+\.gtf(\.gz)?
Where a GTF file is supplied, which feature type to use
string
transcript
Where a GTF file is supplied, which field should go first in the converted output table
string
gene_id
Options for processing of proteomics MaxQuant tables with the Proteus R package
Prefix of the column names of the MaxQuant proteingroups table in which the intensity values are saved; the prefix has to be followed by the sample names that are also found in the samplesheet. Default: ‘LFQ intensity’; will search for both the prefix as entered and the prefix followed by one whitespace.
string
LFQ intensity
Normalization function to use on the MaxQuant intensities.
string
Which method to use for plotting sample distributions of the MaxQuant intensities; one of ‘violin’, ‘dist’, ‘box’.
string
Should a loess line be added to the plot of mean-variance relationship of the conditions? Default: true.
boolean
true
Valid R palette name
string
Set1
Options related to filtering upstream of differential analysis
Minimum abundance value
number
1
Minimum observations that must pass the threshold to retain the row/ feature (e.g. gene).
number
1
A minimum proportion of observations, given as a number between 0 and 1, that must pass the threshold. Overrides minimum_samples
number
An optional grouping variable to be used to calculate a min_samples value
string
A minimum proportion of observations, given as a number between 0 and 1, that must have a value (not NA) to retain the row/ feature (e.g. gene).
number
0.5
Minimum observations that must have a value (not NA) to retain the row/ feature (e.g. gene). Overrides filtering_min_proportion_not_na.
number
Set to run IMMUNEDECONV
boolean
Set method to run with IMMUNEDECONV. Available options can be found in ‘https://omnideconv.org/immunedeconv/articles/immunedeconv.html’
string
quantiseq
Set function to run with IMMUNEDECONV. Available options can be found in ‘https://omnideconv.org/immunedeconv/articles/immunedeconv.html’
string
deconvolute
Options related to data exploration
Clustering method used in dendrogram creation
string
ward.D2
Correlation method used in dendrogram creation
string
spearman
Number of features selected before certain exploratory analyses. If -1, will use all features.
integer
500
Length of the whiskers in boxplots as multiple of IQR. Defaults to 1.5.
number
1.5
Threshold on MAD score for outlier identification
integer
-5
How should the main grouping variable be selected? ‘auto_pca’, ‘contrasts’, or a valid column name from the observations table.
string
auto_pca
Specifies assay names to be used for matrices, platform-specific.
string
raw,normalised,variance_stabilised
Specifies final assay to be used for exploratory analysis, platform-specific
string
variance_stabilised
Of which assays to compute the log2 during exploratory analysis. Not necessary for maxquant data as this is controlled by the pipeline.
string
raw,normalised
Valid R palette name
string
Set1
Options related to differential operations
Differential analysis method
string
Advanced option: the suffix associated tabular differential results tables. Will by default use the appropriate suffix according to the study_type.
string
The feature identifier column in differential results tables
string
gene_id
The fold change column in differential results tables
string
log2FoldChange
The p value column in differential results tables
string
pvalue
The q value column in differential results tables.
string
padj
Minimum fold change used to calculate differential feature numbers
number
2
Maximum p value used to calculate differential feature numbers
number
1
Maximum q value used to calculate differential feature numbers
number
0.05
Where a features file (GTF) has been provided, what attributed to use to name features
string
gene_name
Indicate whether or not fold changes are on the log scale (default is to assume they are)
boolean
true
Valid R palette name
string
Set1
In differential analysis (DEseq2 or Limma), subset to the contrast samples before modelling variance?
boolean
test
parameter passed to DESeq()
string
fitType
parameter passed to DESeq()
string
sfType
parameter passed to DESeq()
string
‘minReplicatesForReplace’ parameter passed to DESeq()
integer
7
useT
parameter passed to DESeq2
boolean
independentFiltering
parameter passed to results()
boolean
true
lfcThreshold
parameter passed to results()
integer
altHypothesis
parameter passed to results()
string
greaterAbs
pAdjustMethod
parameter passed to results()
string
BH
alpha
parameter passed to results()
number
0.1
minmu
parameter passed to results()
number
0.5
variance stabilisation method to use when making a variance stabilised matrix
string
Shink fold changes in results?
boolean
true
Number of cores
integer
1
blind
parameter for rlog() and/ or vst()
boolean
true
nsub
parameter passed to vst()
integer
1000
passed to lmFit(), positive integer giving the number of times each distinct probe is printed on each array.
number
passed to lmFit(), positive integer giving the spacing between duplicate occurrences of the same probe, spacing=1 for consecutive rows.
string
Sample sheet column to be used to derive a vector or factor specifying a blocking variable on the arrays
string
passed to lmFit(), the inter-duplicate or inter-technical replicate correlation
string
passed to lmFit(), the fitting method
string
passed to eBayes(), a numeric value between 0 and 1, assumed proportion of genes which are differentially expressed
number
0.01
passed to eBayes(), logical, should an intensity-dependent trend be allowed for the prior variance?
boolean
passed to eBayes(), logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances?
boolean
passed to eBayes, comma separated string of two values, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes
string
0.1,4
passed to eBayes, comma separated string of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when robust=TRUE.
string
0.05,0.1
passed to topTable(), minimum absolute log2-fold-change required
integer
passed to topTable(), logical, should confidence 95% intervals be output for logFC? Alternatively, can take a numeric value between zero and one specifying the confidence level required.
boolean
passed to topTable(), method used to adjust the p-values for multiple testing.
string
cutoff value for adjusted p-values. Only genes with lower p-values are listed.
number
1
Turns on and off usage of voom normalisation in the Limma module.
boolean
Functional analysis method
string
Gene sets in GMT or GMX-format; for GSEA: multiple comma-separated input files in either format are possible. For gprofiler2: A single file in GMT format is possible; this has lowest priority and will be overridden by —gprofiler2_token and —gprofiler2_organism.
string
Permutation type
string
Number of permutations
integer
1000
Enrichment statistic
string
Metric for ranking genes
string
Gene list sorting mode
string
Gene list ordering mode
string
Max size: exclude larger sets
integer
500
Min size: exclude smaller sets
integer
15
Normalisation mode
string
Randomization mode
string
Make detailed geneset report?
boolean
true
Use median for class metrics
boolean
Number of markers
integer
100
Plot graphs for the top sets of each phenotype
integer
20
Seed for permutation
string
timestamp
Save random ranked lists
boolean
Make a zipped file with all reports
boolean
Short name of the organism that is analyzed, e.g. hsapiens for homo sapiens.
string
Should only significant enrichment results be considered?
boolean
true
Should underrepresentation be measured instead of overrepresentation?
boolean
The method that should be used for multiple testing correction.
string
On which source databases to run the gprofiler query
string
Whether to include evcodes in the results.
boolean
Maximum q value used for significance testing.
number
0.05
Token that should be used as a query.
string
Path to CSV/TSV/TXT file that should be used as a background for the query; alternatively, ‘auto’ (default) or ‘false’.
string
auto
^\S+\.(csv|tsv|txt)$|auto|false
Which column to use as gene IDs in the background matrix.
string
How to calculate the statistical domain size.
string
How many genes must be differentially expressed in a pathway for it to be considered enriched? Default 1.
integer
1
Valid R palette name
string
Blues
Should a Shiny app be built?
boolean
true
Should the app be deployed to shinyapps.io?
boolean
Your shinyapps.io account name
string
The name of the app to push to in your shinyapps.io account
string
Rmd report template from which to create the pipeline report
string
${projectDir}/assets/differentialabundance_report.Rmd
^\S+\.Rmd$
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
A logo to display in the report instead of the generic pipeline logo
string
${projectDir}/docs/images/nf-core-differentialabundance_logo_light.png
CSS to use to style the output, in lieu of the default nf-core styling
string
${projectDir}/assets/nf-core_style.css
A markdown file containing citations to include in the fiinal report
string
${projectDir}/CITATIONS.md
A title for reporting outputs
string
An author for reporting outputs
string
Semicolon-separated string of contributor info that should be listed in the report.
string
A description for reporting outputs
string
Whether to generate a scree plot in the report
boolean
true
Reference genome related files and options required for the workflow.
Name of iGenomes reference.
string
Do not load the iGenomes reference config.
boolean
The base path to the igenomes reference files
string
s3://ngi-igenomes/igenomes/
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Less common options for the pipeline, typically set in a config file.
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Send plain-text email instead of HTML.
boolean
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string