Map
Contents
Map#
Identify insertion sites for barcoded sequencing library with mbarq map#
Input/Output Files:#
Required Inputs
FASTQ file generated from sequencing barcoded mutant library
Genome FASTA file of the bacteria used to generate the library
Transposon construct structure add diagram.
Suggested inputs
Annotation file in GFF3 format (this will allow mapping insertion sites to genomic features).
Filtering parameter (
-l, in our hands, filtering barcodes supported by less than 100 reads produced reliable library annotations. This of course is dependent on the depth of the sequencing, and should be tested for each use case).Report the closest gene (
-c). If agfffile is provided, by default,mbarqwill only report features overlapping the insertion site. In addition,mbarqcan report the location and distance of the closest downstream feature for barcodes that do not directly overlap any features.
Output files
library.annotated.csv: final library map with annotationslibrary.map.csv: final library map without annotationslibrary_mapping.log: log filelibrary.blastn: blast output for each barcodelibrary.fasta: fasta files of barcodes and host sequenceslibrary.output.bed: bedtools intersection of gff and barcode locations
Example Usage#
mbarq map -f <library_R1.fastq.gz> -g <host.fasta> -a <host.gff> -l 100 \
-n LibraryName -tn B17N13GTGTATAAGAGACAG
All Options#
Usage: mbarq map <options>
Options:
-f, --forward FILE input file for reads in forward orientation;
FASTQ formatted; gz is ok. [required]
-g, --genome FILE reference genome in FASTA format [required]
-a, --gff FILE annotation file in GFF format
-n, --name STR unique library name, by default will try to use
FASTQ filename
-tn, --transposon STR transposon construct structure, consisting of the following:
1. barcode length, written as B[# of nt], eg. B17
2. conserved sequence motif, usually part of transposons inverted repeat (IR), eg. GTGTATAAGAGACAG
3. if there are extra nucleotides between the barcode and
conserved sequence motif, indicate with N[# of nt], eg. N13
The default represents the following construct:
----------------------------------------------------------------------------
Read ||AGTACTTTACTACTACT||TACCTGACCGTAA||GTGTATAAGAGACAG||TTACCTGACCGAC
----------||-----------------||-------------||---------------||-------------
Components|| barcode || spacer || conserved || host
|| || || motif (IR) ||
----------||-----------------||-------------||---------------||-------------
Encoding || B17 || N13 ||GTGTATAAGAGACAG||
----------------------------------------------------------------------------
Note: relative position of barcode and conserved sequence motif matters,
i.e. if conserved sequence motif comes before the barcode,
it should be written as GTGTATAAGAGACAGN13B17.
[B17N13GTGTATAAGAGACAG]
-o, --out_dir DIR output directory [.]
-l, --filter_low_counts INT filter out barcodes supported by [INT] or less
reads [0]
-ft, --feat_type STR feature type in the GFF file to be used for
annotation, e.g. gene, exon, CDS [gene]
--attributes STR[,STR] Feature attributes to extract from GFF file
[ID,Name,locus_tag]
--closest_gene for barcodes not directly overlapping a
feature, report the closest feature [False]
-h, --help Show this message and exit.
Re-annotate mapping file#
Library map file library.map.csv can be re-annotated (for example, if you would like to use different feature types or attributes) without re-mapping using mbarq annotate-mapped.
Required Inputs
Unannotated barcode file produced by
mbarq mapAn annotation file in gff format
Example Usage#
mbarq annotate-mapped -i <library.map.csv> -a <host.gff> -ft gene --attributes ID,Name,locus_tag
All Options#
Usage: mbarq annotate-mapped <options>
Options:
-i, --barcode_file FILE unannotated barcode file produced by "map"
-a, --gff FILE annotation file in gff format
-n, --name STR unique library name, by default will try to use
FASTQ filename
-o, --out_dir DIR output directory [.]
-ft, --feat_type STR feature type in the gff file to be used for
annotation, e.g. gene, exon, CDS [gene]
--attributes STR[,STR] Feature attributes to extract from annotation file
[ID,Name,locus_tag]
--closest_gene for barcodes not directly overlapping a feature,
report the closest feature [False]
-h, --help Show this message and exit.