Analyze#

Identify differentially abundant genes between the control (the inoculum) and treatment conditions with `mbarq analyze`#

Input/Output Files#

Required Inputs

Count file produced by mbarq merge

barcode	Name	Sample1	Sample2	…
ACCTGGTAG	geneA	500	1000	…
ACCGGGGAA	geneA	100	500	…
CCCGGGAAA	geneB	300	300	…

Sample data file (CSV) in the following format:

sampleID	treatment
Sample1	control
Sample2	treatment1
…	…

Name of the column indicating treatment in the sample data should be specified using --treatment_column (for the example above, --treatment_column treatment)
Treatment level that should be used as a control/baseline should be specified using --baseline (for the example above, --baseline control)

Suggested Inputs

We highly recommend adding control strains (i.e. strains with barcodes inserted into fitness-neutral locations) to the barcode library. This greatly facilitates quality control and analysis of the data.
If control strains are present in the library, the control barcodes can be specified with a control file using the --control_file option.
- In the simplest option, the control file will only contain the barcode sequences of the control strains (1 barcode per line).
- If different control strains were added at different concentrations, the concentration of each barcode can be specified in the second column.
- If control strains included strains of different genotypes (ex. wild type as well as negative control strains), the genotype can be specified in the 3rd column.
- Only wild-type strains will be used for quality control and analysis. This should be specified as wt, WT, or wildtype.
- The control file should be in CSV format, and contain NO header.

[Required]	[Optional]	[Optional]
ACCTGGGTT	0.005	wt
CCGGAAGGT	0.001	wt

Output Files

mbarq_merged_counts_batch.txt: Information on sample and batch
mbarq_merged_counts.correlations.csv: Correlation for each batch
mbarq_merged_counts_rra_results.csv: Information for each gene about number of barcodes, LFC and false discovery rate
mbarq_merged_counts_barcodes_results.csv: Information for each barcode about LFC and significance scores

For each comparison:

mbarq_merged_counts_cond1_vs_cond0.gene_summary.txt: Summary for each gene
mbarq_merged_counts_cond1_vs_cond0.report.Rmd: MAGeCK Comparison Report
mbarq_merged_counts_cond1_vs_cond0.sgrna_summary.txt: Summary for each sgRNA

Output Format Options

The final results can be output in two formats:

Long format (default): Each row represents a gene-treatment combination. This format includes a ‘contrast’ column indicating the treatment condition.
Wide format: Each row represents a gene, with separate columns for each treatment condition (e.g., ‘LFC_d1’, ‘LFC_d2’). This format is useful for downstream analysis and visualization.

Use the --format option to specify the desired output format.

Format Examples:

Long format (default):

Name	number_of_barcodes	LFC	neg_selection_fdr	pos_selection_fdr	contrast
geneA	3	1.2	0.05	0.9	d1
geneA	3	1.5	0.03	0.8	d2
geneB	2	-0.8	0.8	0.1	d1
geneB	2	-0.9	0.7	0.2	d2

Wide format:

Name	number_of_barcodes	LFC_d1	LFC_d2	neg_selection_fdr_d1	neg_selection_fdr_d2	pos_selection_fdr_d1	pos_selection_fdr_d2
geneA	3	1.2	1.5	0.05	0.03	0.9	0.8
geneB	2	-0.8	-0.9	0.8	0.7	0.1	0.2

Example Usage#

# Basic usage with long format output (default)
mbarq analyze -i <count_file> -s <sample_data_file> -c <control_file> \ 
--treatment_column treatment --baseline control 

# Output results in wide format
mbarq analyze -i <count_file> -s <sample_data_file> -c <control_file> \ 
--treatment_column treatment --baseline control --format wide

All Options#

mbarq analyze
Usage: mbarq analyze <options>

Options:
  -i, --count_file FILE    CSV file produced by `mbarq merge`
  -s, --sample_data FILE   CSV file containing sample data
  -c, --control_file FILE  control barcode file, see documentation for proper
                           format
  -g, --gene_name STR      column in the count file containing gene
                           identifiers [Name]
  --treatment_column STR   column in sample data file indicating treatment
  --baseline STR           treatment level to use as control/baseline, ex.
                           day0
  -n, --name STR           experiment name, by default will try to use count
                           file name
  -o, --out_dir DIR        Output directory
  --norm_method STR        mageck normalization method: median, total, or 
                           control. By default will use control barcodes if 
                           provided, otherwise median
  --filter_low_counts INT  filter out barcodes with < N reads across all 
                           conditions [0]
  -f, --format STR         output file format: long or wide [long]
  -h, --help               Show this message and exit.

mBARq 1.0 documentation

Analyze

Contents

Analyze#

Identify differentially abundant genes between the control (the inoculum) and treatment conditions with `mbarq analyze`#

Input/Output Files#

Example Usage#

All Options#

mBARq 1.0 documentation

Analyze

Contents

Analyze#

Identify differentially abundant genes between the control (the inoculum) and treatment conditions with mbarq analyze#

Input/Output Files#

Example Usage#

All Options#

Identify differentially abundant genes between the control (the inoculum) and treatment conditions with `mbarq analyze`#