# Map ## Identify insertion sites for barcoded sequencing library with `mbarq map` ### Input/Output Files: **Required Inputs** - FASTQ file generated from sequencing barcoded mutant library - Genome FASTA file of the bacteria used to generate the library - Transposon construct structure **add diagram**. **Suggested inputs** - Annotation file in GFF3 format (this will allow mapping insertion sites to genomic features). - Filtering parameter (``-l``, in our hands, filtering barcodes supported by less than 100 reads produced reliable library annotations. This of course is dependent on the depth of the sequencing, and should be tested for each use case). - Report the closest gene (``-c``). If a ``gff`` file is provided, by default, ``mbarq`` will only report features overlapping the insertion site. In addition, ``mbarq`` can report the location and distance of the closest downstream feature for barcodes that do not directly overlap any features. **Output files** - ``library.annotated.csv``: final library map with annotations - ``library.map.csv``: final library map without annotations - ``library_mapping.log``: log file - ``library.blastn``: blast output for each barcode - ``library.fasta``: fasta files of barcodes and host sequences - ``library.output.bed``: bedtools intersection of gff and barcode locations ### Example Usage ```shell mbarq map -f -g -a -l 100 \ -n LibraryName -tn B17N13GTGTATAAGAGACAG ``` ### All Options ``` Usage: mbarq map Options: -f, --forward FILE input file for reads in forward orientation; FASTQ formatted; gz is ok. [required] -g, --genome FILE reference genome in FASTA format [required] -a, --gff FILE annotation file in GFF format -n, --name STR unique library name, by default will try to use FASTQ filename -tn, --transposon STR transposon construct structure, consisting of the following: 1. barcode length, written as B[# of nt], eg. B17 2. conserved sequence motif, usually part of transposons inverted repeat (IR), eg. GTGTATAAGAGACAG 3. if there are extra nucleotides between the barcode and conserved sequence motif, indicate with N[# of nt], eg. N13 The default represents the following construct: ---------------------------------------------------------------------------- Read ||AGTACTTTACTACTACT||TACCTGACCGTAA||GTGTATAAGAGACAG||TTACCTGACCGAC ----------||-----------------||-------------||---------------||------------- Components|| barcode || spacer || conserved || host || || || motif (IR) || ----------||-----------------||-------------||---------------||------------- Encoding || B17 || N13 ||GTGTATAAGAGACAG|| ---------------------------------------------------------------------------- Note: relative position of barcode and conserved sequence motif matters, i.e. if conserved sequence motif comes before the barcode, it should be written as GTGTATAAGAGACAGN13B17. [B17N13GTGTATAAGAGACAG] -o, --out_dir DIR output directory [.] -l, --filter_low_counts INT filter out barcodes supported by [INT] or less reads [0] -ft, --feat_type STR feature type in the GFF file to be used for annotation, e.g. gene, exon, CDS [gene] --attributes STR[,STR] Feature attributes to extract from GFF file [ID,Name,locus_tag] --closest_gene for barcodes not directly overlapping a feature, report the closest feature [False] -h, --help Show this message and exit. ``` ## Re-annotate mapping file Library map file `library.map.csv` can be re-annotated (for example, if you would like to use different feature types or attributes) without re-mapping using `mbarq annotate-mapped`. **Required Inputs** - Unannotated barcode file produced by ``mbarq map`` - An annotation file in gff format ### Example Usage ```shell mbarq annotate-mapped -i -a -ft gene --attributes ID,Name,locus_tag ``` ### All Options ``` Usage: mbarq annotate-mapped Options: -i, --barcode_file FILE unannotated barcode file produced by "map" -a, --gff FILE annotation file in gff format -n, --name STR unique library name, by default will try to use FASTQ filename -o, --out_dir DIR output directory [.] -ft, --feat_type STR feature type in the gff file to be used for annotation, e.g. gene, exon, CDS [gene] --attributes STR[,STR] Feature attributes to extract from annotation file [ID,Name,locus_tag] --closest_gene for barcodes not directly overlapping a feature, report the closest feature [False] -h, --help Show this message and exit. ```