Funannotate Commands¶
A description for all funannotate commands.
Funannotate wrapper script¶
Funannotate is a series of Python scripts that are launched from a Python wrapper script. Each command has a help menu which you can print to the terminal by issuing the command without any arguments, i.e. funannotate
yields the following.
$ funannotate
Usage: funannotate <command> <arguments>
version: 1.8.14
Description: Funannotate is a genome prediction, annotation, and comparison pipeline.
Commands:
clean Find/remove small repetitive contigs
sort Sort by size and rename contig headers
mask Repeatmask genome assembly
train RNA-seq mediated training of Augustus/GeneMark
predict Run gene prediction pipeline
fix Fix annotation errors (generate new GenBank file)
update RNA-seq/PASA mediated gene model refinement
remote Partial functional annotation using remote servers
iprscan InterProScan5 search (Docker or local)
annotate Assign functional annotation to gene predictions
compare Compare funannotated genomes
util Format conversion and misc utilities
setup Setup/Install databases
test Download/Run funannotate installation tests
check Check Python, Perl, and External dependencies [--show-versions]
species list pre-trained Augustus species
database Manage databases
outgroups Manage outgroups for funannotate compare
Written by Jon Palmer (2016-2022) nextgenusfs@gmail.com
Preparing Genome for annotation¶
funannotate clean¶
Script “cleans” an assembly by looking for duplicated contigs. The script first sorts the contigs by size, then starting with the shortest contig it runs a “leave one out” alignment using Mummer to determine if contig is duplicated elsewhere. This script is meant to be run with a haploid genome, it has not been tested as a method to haplodize a polyploid assembly.
Usage: funannotate clean <arguments>
version: 1.8.14
Description: The script sorts contigs by size, starting with shortest contigs it uses minimap2
to find contigs duplicated elsewhere, and then removes duplicated contigs.
Arguments:
-i, --input Multi-fasta genome file (Required)
-o, --out Cleaned multi-fasta output file (Required)
-p, --pident Percent identity of overlap. Default = 95
-c, --cov Percent coverage of overlap. Default = 95
-m, --minlen Minimum length of contig to keep. Default = 500
--exhaustive Test every contig. Default is to stop at N50 value.
funannotate sort¶
Simple script to sort and rename a genome assembly. Often assemblers output contig/scaffold names that are incompatible with NCBI submission rules. Use this script to rename and/or drop scaffolds that are shorter than a minimum length.
Usage: funannotate sort <arguments>
version: 1.8.14
Description: This script sorts the input contigs by size (longest->shortest) and then relabels
the contigs with a simple name (e.g. scaffold_1). Augustus can have problems with
some complicated contig names.
Arguments:
-i, --input Multi-fasta genome file. (Required)
-o, --out Sorted by size and relabeled output file. (Required)
-b, --base Base name to relabel contigs. Default: scaffold
--minlen Shorter contigs are discarded. Default: 0
funannotate species¶
This function will output the current trained species in Augustus.
$ funannotate species
Species Augustus GeneMark Snap GlimmerHMM CodingQuarry Date
E_coli_K12 augustus pre-trained None None None None 2019-10-24
elegans augustus pre-trained None None None None 2019-10-24
awesome_testicus augustus pre-trained None None None None 2019-10-24
thermoanaerobacter_tengcongensis augustus pre-trained None None None None 2019-10-24
pfalciparum augustus pre-trained None None None None 2019-10-24
s_pneumoniae augustus pre-trained None None None None 2019-10-24
culex augustus pre-trained None None None None 2019-10-24
bombus_impatiens1 augustus pre-trained None None None None 2019-10-24
cryptococcus augustus pre-trained None None None None 2019-10-24
histoplasma augustus pre-trained None None None None 2019-10-24
neurospora_crassa augustus pre-trained None None None None 2019-10-24
schistosoma augustus pre-trained None None None None 2019-10-24
schistosoma augustus pre-trained None None None None 2019-10-24
pichia_stipitis augustus pre-trained None None None None 2019-10-24
candida_tropicalis augustus pre-trained None None None None 2019-10-24
histoplasma_capsulatum augustus pre-trained None None None None 2019-10-24
honeybee1 augustus pre-trained None None None None 2019-10-24
elephant_shark augustus pre-trained None None None None 2019-10-24
cryptococcus_neoformans_neoformans_JEC21 augustus pre-trained None None None None 2019-10-24
coprinus augustus pre-trained None None None None 2019-10-24
chlamy2011 augustus pre-trained None None None None 2019-10-24
verticillium_longisporum1 augustus pre-trained None None None None 2019-10-24
arabidopsis augustus pre-trained None None None None 2019-10-24
galdieria augustus pre-trained None None None None 2019-10-24
rice augustus pre-trained None None None None 2019-10-24
fly augustus pre-trained None None None None 2019-10-24
adorsata augustus pre-trained None None None None 2019-10-24
c_elegans_trsk augustus pre-trained None None None None 2019-10-24
pseudogymnoascus_destructans_20631-21 augustus pre-trained None None None None 2019-10-24
parasteatoda augustus pre-trained None None None None 2019-10-24
saccharomyces_cerivisiae_1234 augustus pre-trained None None None None 2019-10-24
template_prokaryotic augustus pre-trained None None None None 2019-10-24
s_aureus augustus pre-trained None None None None 2019-10-24
testicus_genome augustus pre-trained None None None None 2019-10-24
chaetomium_globosum augustus pre-trained None None None None 2019-10-24
caenorhabditis augustus pre-trained None None None None 2019-10-24
rhizopus_oryzae augustus pre-trained None None None None 2019-10-24
rhodnius augustus pre-trained None None None None 2019-10-24
lodderomyces_elongisporus augustus pre-trained None None None None 2019-10-24
tetrahymena augustus pre-trained None None None None 2019-10-24
coyote_tobacco augustus pre-trained None None None None 2019-10-24
chlamydomonas augustus pre-trained None None None None 2019-10-24
b_pseudomallei augustus pre-trained None None None None 2019-10-24
pneumocystis augustus pre-trained None None None None 2019-10-24
eremothecium_gossypii augustus pre-trained None None None None 2019-10-24
phanerochaete_chrysosporium augustus pre-trained None None None None 2019-10-24
fusarium augustus pre-trained None None None None 2019-10-24
cryptococcus_neoformans_gattii augustus pre-trained None None None None 2019-10-24
seahare augustus pre-trained None None None None 2019-10-24
ustilago_maydis augustus pre-trained None None None None 2019-10-24
lamprey augustus pre-trained None None None None 2019-10-24
nasonia augustus pre-trained None None None None 2019-10-24
tribolium2012 augustus pre-trained None None None None 2019-10-24
aspergillus_nidulans augustus pre-trained None None None None 2019-10-24
cryptococcus_neoformans_neoformans_B augustus pre-trained None None None None 2019-10-24
verticillium_albo_atrum1 augustus pre-trained None None None None 2019-10-24
wheat augustus pre-trained None None None None 2019-10-24
test_genome augustus pre-trained None None None None 2019-10-24
schizosaccharomyces_pombe augustus pre-trained None None None None 2019-10-24
amphimedon augustus pre-trained None None None None 2019-10-24
saccharomyces_cerevisiae_rm11-1a_1 augustus pre-trained None None None None 2019-10-24
aspergillus_fumigatus augustus pre-trained None None None None 2019-10-24
aedes augustus pre-trained None None None None 2019-10-24
aspergillus_terreus augustus pre-trained None None None None 2019-10-24
rubicus_maboogago augustus pre-trained None None None None 2019-10-24
awe_test augustus pre-trained None None None None 2019-10-24
neurospora augustus pre-trained None None None None 2019-10-24
ancylostoma_ceylanicum augustus pre-trained None None None None 2019-10-24
saccharomyces_cerevisiae_S288C augustus pre-trained None None None None 2019-10-24
yarrowia_lipolytica augustus pre-trained None None None None 2019-10-24
Conidiobolus_coronatus augustus pre-trained None None None None 2019-10-24
rubeus_macgubis augustus pre-trained None None None None 2019-10-24
botrytis_cinerea augustus pre-trained None None None None 2019-10-24
candida_guilliermondii augustus pre-trained None None None None 2019-10-24
anidulans augustus pre-trained None None None None 2019-10-24
trichinella augustus pre-trained None None None None 2019-10-24
candida_albicans augustus pre-trained None None None None 2019-10-24
aspergillus_oryzae augustus pre-trained None None None None 2019-10-24
fusarium_graminearum augustus pre-trained None None None None 2019-10-24
chlorella augustus pre-trained None None None None 2019-10-24
saccharomyces augustus pre-trained None None None None 2019-10-24
chicken augustus pre-trained None None None None 2019-10-24
magnaporthe_grisea augustus pre-trained None None None None 2019-10-24
bombus_terrestris2 augustus pre-trained None None None None 2019-10-24
laccaria_bicolor augustus pre-trained None None None None 2019-10-24
cacao augustus pre-trained None None None None 2019-10-24
generic augustus pre-trained None None None None 2019-10-24
maize5 augustus pre-trained None None None None 2019-10-24
debaryomyces_hansenii augustus pre-trained None None None None 2019-10-24
heliconius_melpomene1 augustus pre-trained None None None None 2019-10-24
toxoplasma augustus pre-trained None None None None 2019-10-24
kluyveromyces_lactis augustus pre-trained None None None None 2019-10-24
camponotus_floridanus augustus pre-trained None None None None 2019-10-24
coprinus_cinereus augustus pre-trained None None None None 2019-10-24
my_genome augustus pre-trained None None None None 2019-10-24
ustilago augustus pre-trained None None None None 2019-10-24
encephalitozoon_cuniculi_GB augustus pre-trained None None None None 2019-10-24
human augustus pre-trained None None None None 2019-10-24
tomato augustus pre-trained None None None None 2019-10-24
brugia augustus pre-trained None None None None 2019-10-24
pea_aphid augustus pre-trained None None None None 2019-10-24
yeast augustus pre-trained None None None None 2019-10-24
zebrafish augustus pre-trained None None None None 2019-10-24
sulfolobus_solfataricus augustus pre-trained None None None None 2019-10-24
Xipophorus_maculatus augustus pre-trained None None None None 2019-10-24
schistosoma2 augustus pre-trained None None None None 2019-10-24
pchrysosporium augustus pre-trained None None None None 2019-10-24
leishmania_tarentolae augustus pre-trained None None None None 2019-10-24
coccidioides_immitis augustus pre-trained None None None None 2019-10-24
ophidiomyces_ophiodiicola_cbs-122913 augustus pre-trained None None None None 2019-10-24
maize augustus pre-trained None None None None 2019-10-24
Options for this script:
To print a parameter file to terminal:
funannotate species -p myparameters.json
To print the parameters details from a species in the database:
funannotate species -s aspergillus_fumigatus
To add a new species to database:
funannotate species -s new_species_name -a new_species_name.parameters.json
funannotate mask¶
Repetitive elements should be soft-masked from a genome assembly to help direct the ab-initio gene
predictors. This can be accomplished with the often used RepeatModeler/RepeatMasker programs.
A wrapper for RepeatModeler/RepeatMasker is the funannotate mask
script. Note you can
use any other software to soft-mask your genome prior to running the gene prediction script.
Usage: funannotate mask <arguments>
version: 1.8.14
Description: This script is a wrapper for repeat masking. Default is to run very simple
repeat masking with tantan. The script can also run RepeatMasker and/or
RepeatModeler. It will generate a softmasked genome. Tantan is probably not
sufficient for soft-masking an assembly, but with RepBase no longer being
available RepeatMasker/Modeler may not be functional for many users.
Arguments:
-i, --input Multi-FASTA genome file. (Required)
-o, --out Output softmasked FASTA file. (Required)
Optional:
-m, --method Method to use. Default: tantan [repeatmasker, repeatmodeler]
-s, --repeatmasker_species Species to use for RepeatMasker
-l, --repeatmodeler_lib Custom repeat database (FASTA format)
--cpus Number of cpus to use. Default: 2
--debug Keep intermediate files
Training Ab-initio Gene Predictors¶
funannotate train¶
In order to use this script you will need RNA-seq data from the genome you are annotating, if
you don’t have RNA-seq data then funannotate predict
will train Augustus during runtime. This script
is a wrapper for genome-guided Trinity RNA-seq assembly followed by PASA assembly. These methods
will generate the input data to funannotate predict
, i.e. coord-sorted BAM alignments, trinity
transcripts, and high quality PASA GFF3 annotation. This script unfortunately has lots of dependencies
that include Hisat2, Trinity, Samtools, Fasta, GMAP, Blat, MySQL, PASA, and RapMap. The $PASAHOME
and $TRINITYHOME environmental variables need to be set or passed at runtime.
Usage: funannotate train <arguments>
version: 1.8.14
Description: Script is a wrapper for de novo genome-guided transcriptome assembly using
Trinity followed by PASA. Illumina and Long-read (nanopore/pacbio) RNA-seq
is also supported. Dependencies are hisat2, Trinity, samtools, fasta,
minimap2, PASA.
Required:
-i, --input Genome multi-fasta file
-o, --out Output folder name
-l, --left Left/Forward FASTQ Illumina reads (R1)
-r, --right Right/Reverse FASTQ Illumina reads (R2)
-s, --single Single ended FASTQ reads
Optional:
--stranded If RNA-seq library stranded. [RF,FR,F,R,no]
--left_norm Normalized left FASTQ reads (R1)
--right_norm Normalized right FASTQ reads (R2)
--single_norm Normalized single-ended FASTQ reads
--pacbio_isoseq PacBio long-reads
--nanopore_cdna Nanopore cDNA long-reads
--nanopore_mrna Nanopore mRNA direct long-reads
--trinity Pre-computed Trinity transcripts (FASTA)
--jaccard_clip Turn on jaccard clip for dense genomes [Recommended for fungi]
--no_normalize_reads Skip read Normalization
--no_trimmomatic Skip Quality Trimming of reads
--memory RAM to use for Jellyfish. Default: 50G
-c, --coverage Depth to normalize reads. Default: 50
-m, --min_coverage Min depth for normalizing reads. Default: 5
--pasa_db Database to use. Default: sqlite [mysql,sqlite]
--pasa_alignment_overlap PASA --stringent_alignment_overlap. Default: 30.0
--aligners Aligners to use with PASA: Default: minimap2 blat [gmap]
--pasa_min_pct_aligned PASA --MIN_PERCENT_ALIGNED. Default: 90
--pasa_min_avg_per_id PASA --MIN_AVG_PER_ID. Default: 95
--pasa_num_bp_splice PASA --NUM_BP_PERFECT_SPLICE_BOUNDARY. Default: 3
--max_intronlen Maximum intron length. Default: 3000
--species Species name, use quotes for binomial, e.g. "Aspergillus fumigatus"
--strain Strain name
--isolate Isolate name
--cpus Number of CPUs to use. Default: 2
--no-progress Do not print progress to stdout for long sub jobs
ENV Vars: If not passed, will try to load from your $PATH.
--PASAHOME
--TRINITYHOME
Gene Prediction¶
funannotate predict¶
This script is the “meat and potatoes” of funannotate. It will parse the data you provide and choose the best method to train the ab-initio gene predictors Augustus and GeneMark. After the predictors are trained, it runs Evidence Modeler to generate consensus gene models from all of the data present. Finally, the GFF3 file is converted to NCBI GenBank format.
Usage: funannotate predict <arguments>
version: 1.8.14
Description: Script takes genome multi-fasta file and a variety of inputs to do a comprehensive whole
genome gene prediction. Uses AUGUSTUS, GeneMark, Snap, GlimmerHMM, BUSCO, EVidence Modeler,
tbl2asn, tRNAScan-SE, Exonerate, minimap2.
Required:
-i, --input Genome multi-FASTA file (softmasked repeats)
-o, --out Output folder name
-s, --species Species name, use quotes for binomial, e.g. "Aspergillus fumigatus"
Optional:
-p, --parameters Ab intio parameters JSON file to use for gene predictors
--isolate Isolate name, e.g. Af293
--strain Strain name, e.g. FGSCA4
--name Locus tag name (assigned by NCBI?). Default: FUN_
--numbering Specify where gene numbering starts. Default: 1
--maker_gff MAKER2 GFF file. Parse results directly to EVM.
--pasa_gff PASA generated gene models. filename:weight
--other_gff Annotation pass-through to EVM. filename:weight
--rna_bam RNA-seq mapped to genome to train Augustus/GeneMark-ET
--stringtie StringTie GTF result
-w, --weights Ab-initio predictor and EVM weight. Example: augustus:2 or pasa:10
--augustus_species Augustus species config. Default: uses species name
--min_training_models Minimum number of models to train Augustus. Default: 200
--genemark_mode GeneMark mode. Default: ES [ES,ET]
--genemark_mod GeneMark ini mod file
--busco_seed_species Augustus pre-trained species to start BUSCO. Default: anidulans
--optimize_augustus Run 'optimze_augustus.pl' to refine training (long runtime)
--busco_db BUSCO models. Default: dikarya. `funannotate outgroups --show_buscos`
--organism Fungal-specific options. Default: fungus. [fungus,other]
--ploidy Ploidy of assembly. Default: 1
-t, --tbl2asn Assembly parameters for tbl2asn. Default: "-l paired-ends"
-d, --database Path to funannotate database. Default: $FUNANNOTATE_DB
--protein_evidence Proteins to map to genome (prot1.fa prot2.fa uniprot.fa). Default: uniprot.fa
--protein_alignments Pre-computed protein alignments in GFF3 format
--p2g_pident Exonerate percent identity. Default: 80
--p2g_diamond_db Premade diamond genome database for protein2genome mapping
--p2g_prefilter Pre-filter hits software selection. Default: diamond [tblastn]
--transcript_evidence mRNA/ESTs to align to genome (trans1.fa ests.fa trinity.fa). Default: none
--transcript_alignments Pre-computed transcript alignments in GFF3 format
--augustus_gff Pre-computed AUGUSTUS GFF3 results (must use --stopCodonExcludedFromCDS=False)
--genemark_gtf Pre-computed GeneMark GTF results
--trnascan Pre-computed tRNAscanSE results
--min_intronlen Minimum intron length. Default: 10
--max_intronlen Maximum intron length. Default: 3000
--soft_mask Softmasked length threshold for GeneMark. Default: 2000
--min_protlen Minimum protein length. Default: 50
--repeats2evm Use repeats in EVM consensus model building
--keep_evm Keep existing EVM results (for rerunning pipeline)
--evm-partition-interval Min length between genes to make a partition: Default: 1500
--no-evm-partitions Do not split contigs into partitions
--repeat_filter Repetitive gene model filtering. Default: overlap blast [overlap,blast,none]
--keep_no_stops Keep gene models without valid stops
--SeqCenter Sequencing facilty for NCBI tbl file. Default: CFMR
--SeqAccession Sequence accession number for NCBI tbl file. Default: 12345
--force Annotated unmasked genome
--cpus Number of CPUs to use. Default: 2
--no-progress Do not print progress to stdout for long sub jobs
--tmpdir Volume/location to write temporary files. Default: /tmp
--header_length Maximum length of FASTA headers. Default: 16
ENV Vars: If not specified at runtime, will be loaded from your $PATH
--EVM_HOME
--AUGUSTUS_CONFIG_PATH
--GENEMARK_PATH
--BAMTOOLS_PATH
funannotate fix¶
While funannotate predict does its best to generate gene models that will pass NCBI annotation specs, occasionally gene models fall through the cracks (i.e. they are errors that the author has not seen yet). Gene models that generate submission errors are automatically flagged by funannotate predict and alerted to the user. The user must manually fix the .tbl annotation file to fix these models. This script is a wrapper for archiving the previous genbank annotations and generating a new set with the supplied .tbl annotation file.
Usage: funannotate fix <arguments>
version: 1.8.14
Description: Script takes a GenBank genome annotation file and an NCBI tbl file to
generate updated annotation. Script is used to fix problematic gene models
after running funannotate predict or funannotate update.
Required:
-i, --input Annotated genome in GenBank format.
-t, --tbl NCBI tbl annotation file.
-d, --drop Gene models to remove/drop from annotation. File with locus_tag 1 per line.
Optional:
-o, --out Output folder
--tbl2asn Parameters for tbl2asn. Default: "-l paired-ends"
funannotate update¶
This script updates gene models from funannotate predict using RNA-seq data. The method relies on RNA-seq –> Trinity –> PASA –> Kallisto. Using this script you can also update an NCBI GenBank genome using RNA-seq data, i.e. you can update gene models on a pre-existing submission and the script will maintain proper annotation naming/updating in accordance with NCBI rules.
Usage: funannotate update <arguments>
version: 1.8.14
Description: Script will run PASA mediated update of gene models. It can directly update
the annotation from an NCBI downloaded GenBank file using RNA-seq data or can be
used after funannotate predict to refine UTRs and gene model predictions. Kallisto
is used to evidence filter most likely PASA gene models. Dependencies are
hisat2, Trinity, samtools, fasta, minimap2, PASA, kallisto, bedtools.
Required:
-i, --input Funannotate folder or Genome in GenBank format (.gbk,.gbff).
or
-f, --fasta Genome in FASTA format
-g, --gff Annotation in GFF3 format
--species Species name, use quotes for binomial, e.g. "Aspergillus fumigatus"
Optional:
-o, --out Output folder name
-l, --left Left/Forward FASTQ Illumina reads (R1)
-r, --right Right/Reverse FASTQ Illumina reads (R2)
-s, --single Single ended FASTQ reads
--stranded If RNA-seq library stranded. [RF,FR,F,R,no]
--left_norm Normalized left FASTQ reads (R1)
--right_norm Normalized right FASTQ reads (R2)
--single_norm Normalized single-ended FASTQ reads
--pacbio_isoseq PacBio long-reads
--nanopore_cdna Nanopore cDNA long-reads
--nanopore_mrna Nanopore mRNA direct long-reads
--trinity Pre-computed Trinity transcripts (FASTA)
--jaccard_clip Turn on jaccard clip for dense genomes [Recommended for fungi]
--no_normalize_reads Skip read Normalization
--no_trimmomatic Skip Quality Trimming of reads
--memory RAM to use for Jellyfish. Default: 50G
-c, --coverage Depth to normalize reads. Default: 50
-m, --min_coverage Min depth for normalizing reads. Default: 5
--pasa_config PASA assembly config file, i.e. from previous PASA run
--pasa_db Database to use. Default: sqlite [mysql,sqlite]
--pasa_alignment_overlap PASA --stringent_alignment_overlap. Default: 30.0
--aligners Aligners to use with PASA: Default: minimap2 blat [gmap]
--pasa_min_pct_aligned PASA --MIN_PERCENT_ALIGNED. Default: 90
--pasa_min_avg_per_id PASA --MIN_AVG_PER_ID. Default: 95
--pasa_num_bp_splice PASA --NUM_BP_PERFECT_SPLICE_BOUNDARY. Default: 3
--max_intronlen Maximum intron length. Default: 3000
--min_protlen Minimum protein length. Default: 50
--alt_transcripts Expression threshold (percent) to keep alt transcripts. Default: 0.1 [0-1]
--p2g NCBI p2g file (if updating NCBI annotation)
-t, --tbl2asn Assembly parameters for tbl2asn. Example: "-l paired-ends"
--name Locus tag name (assigned by NCBI?). Default: use existing
--sbt NCBI Submission file
--species Species name, use quotes for binomial, e.g. "Aspergillus fumigatus"
--strain Strain name
--isolate Isolate name
--SeqCenter Sequencing facilty for NCBI tbl file. Default: CFMR
--SeqAccession Sequence accession number for NCBI tbl file. Default: 12345
--cpus Number of CPUs to use. Default: 2
--no-progress Do not print progress to stdout for long sub jobs
ENV Vars: If not passed, will try to load from your $PATH.
--PASAHOME
--TRINITYHOME
Adding Functional Annotation¶
funannotate remote¶
Some programs are Linux-only and not compatible on Mac OSX, to accomodate all users there are a series of remote based searches that can be done from the command line. anitSMASH secondary metabolite gene cluster prediction, Phobius, and InterProScan5 can be done from this interface. Note that if you can install these tools locally, those searches will likely be much faster and thus preferred.
Usage: funannotate remote <arguments>
version: 1.8.14
Description: Script runs remote server functional annotation for Phobius and
antiSMASH (fungi). These searches are slow, if you can setup these services
locally it will be much faster to do that. PLEASE do not abuse services!
Required:
-m, --methods Which services to run, space separated [phobius,antismash,all]
-e, --email Email address to identify yourself to services.
-i, --input Funannotate input folder.
or
-g, --genbank GenBank file (must be annotated).
-o, --out Output folder name.
--force Force query even if antiSMASH server looks busy
funannotate iprscan¶
This script is a wrapper for a local InterProScan5 run or a local Docker-based IPR run. The Docker build uses the blaxterlab/interproscan image.
Usage: funannotate iprscan <arguments>
version: 1.8.14
Description: This script is a wrapper for running InterProScan5 using Docker or from a
local installation. The script splits proteins into smaller chunks and then
launches several interproscan.sh "processes". It then combines the results.
Arguments:
-i, --input Funannotate folder or FASTA protein file. (Required)
-m, --method Search method to use: [local, docker] (Required)
-n, --num Number of fasta files per chunk. Default: 1000
-o, --out Output XML InterProScan5 file
Docker arguments:
-c, --cpus Number of CPUs (total). Default: 12
--cpus_per_chunk Number of cpus per Docker instance. Default: 4
Local arguments:
--iprscan_path Path to interproscan.sh. Default: which(interproscan.sh)
-c, --cpus Number of InterProScan instances to run
(configure cpu/thread control in interproscan.properties file)
funannotate annotate¶
This script is run after funannotate predict or funannotate update and assigns functional annotation to the protein coding gene models. The best functional annotation is done when InterProScan 5 is run on your protein prior to running this script.
Usage: funannotate annotate <arguments>
version: 1.8.14
Description: Script functionally annotates the results from funannotate predict. It pulls
annotation from PFAM, InterPro, EggNog, UniProtKB, MEROPS, CAZyme, and GO ontology.
Required:
-i, --input Folder from funannotate predict
or
--genbank Genome in GenBank format
-o, --out Output folder for results
or
--gff Genome GFF3 annotation file
--fasta Genome in multi-fasta format
-s, --species Species name, use quotes for binomial, e.g. "Aspergillus fumigatus"
-o, --out Output folder for results
Optional:
--sbt NCBI submission template file. (Recommended)
-a, --annotations Custom annotations (3 column tsv file)
-m, --mito-pass-thru Mitochondrial genome/contigs. append with :mcode
--eggnog Eggnog-mapper annotations file (if NOT installed)
--antismash antiSMASH secondary metabolism results (GBK file from output)
--iprscan InterProScan5 XML file
--phobius Phobius pre-computed results (if phobius NOT installed)
--signalp SignalP pre-computed results (-org euk -format short)
--isolate Isolate name
--strain Strain name
--rename Rename GFF gene models with locus_tag from NCBI.
--fix Gene/Product names fixed (TSV: GeneID Name Product)
--remove Gene/Product names to remove (TSV: Gene Product)
--busco_db BUSCO models. Default: dikarya
-t, --tbl2asn Additional parameters for tbl2asn. Default: "-l paired-ends"
-d, --database Path to funannotate database. Default: $FUNANNOTATE_DB
--force Force over-write of output folder
--cpus Number of CPUs to use. Default: 2
--tmpdir Volume/location to write temporary files. Default: /tmp
--p2g protein2genome pre-computed results
--header_length Maximum length of FASTA headers. Default: 16
--no-progress Do not print progress to stdout for long sub jobs
Comparative Genomics¶
funannotate compare¶
This script takes “funannotate” genomes (output from multiple funannotate annotate) and runs some comparative genomic operations. The script compares the annotation and generates graphs, CSV files, GO enrichment, dN/dS ratios, orthology, etc –> the output is visualized HTML format in a web browser.
Usage: funannotate compare <arguments>
version: 1.8.14
Description: Script does light-weight comparative genomics between funannotated genomes. Output
is graphs, phylogeny, CSV files, etc --> visualized in web-browser.
Required:
-i, --input List of funannotate genome folders or GBK files
Optional:
-o, --out Output folder name. Default: funannotate_compare
-d, --database Path to funannotate database. Default: $FUNANNOTATE_DB
--cpus Number of CPUs to use. Default: 2
--run_dnds Calculate dN/dS ratio on all orthologs. [estimate,full]
--go_fdr P-value for FDR GO-enrichment. Default: 0.05
--heatmap_stdev Cut-off for heatmap. Default: 1.0
--num_orthos Number of Single-copy orthologs to use for ML. Default: 500
--bootstrap Number of boostrap replicates to run with RAxML. Default: 100
--outgroup Name of species to use for ML outgroup. Default: no outgroup
--proteinortho ProteinOrtho5 POFF results.
--ml_method Maxmimum Liklihood method: Default: raxml [raxml,iqtree]
--ml_model Substitution model for IQtree. Default: modelfinder
--no-progress Do not print progress to stdout for long sub jobs
Installation and Database Management¶
funannotate setup¶
This command needs to be run to download required databases. It requires the user to specify a location to save the database files. This location can then be added to the ~/.bash_profile so funannotate knows where to locate the database files.
Usage: funannotate setup <arguments>
version: 1.8.14
Description: Script will download/format necessary databases for funannotate.
Options:
-i, --install Download format databases. Default: all
[merops,uniprot,dbCAN,pfam,repeats,go,
mibig,interpro,busco_outgroups,gene2product]
-b, --busco_db Busco Databases to install. Default: dikarya [all,fungi,aves,etc]
-d, --database Path to funannotate database
-u, --update Check remote md5 and update if newer version found
-f, --force Force overwriting database
-w, --wget Use wget to download instead of python requests
-l, --local Use local resource JSON file instead of current on github
funannotate database¶
Simple script displays the currently installed databases.
$ funannotate database
Funannotate Databases currently installed:
Database Type Version Date Num_Records Md5checksum
pfam hmmer3 35.0 2021-11 19632 c78ab387de299860bd242d6f57930c7f
gene2product text 1.82 2022-09-25 34212 23a8436fb3a7d09c87febc7f2ee86615
interpro xml 90.0 2022-08-04 40597 0cd0aff2b5df0d5c57a888e5953a754e
dbCAN hmmer3 10.0 2021-10-03 641 04696dfba1c3bb82ff9b72cfbb3e4a65
busco_outgroups outgroups 1.0 2022-09-25 8 6795b1d4545850a4226829c7ae8ef058
merops diamond 12.0 2017-10-04 5009 a6dd76907896708f3ca5335f58560356
mibig diamond 1.4 2019-10-20 31023 118f2c11edde36c81bdea030a0228492
uniprot diamond 2022_03 2022-08-03 568002 30ad53c6d2b4bc36b75ed2814a3708f7
go text 2022-09-19 2022-09-19 47343 8f0f6557c8140bc68af67ac57239236d
repeats diamond 1.0 2019-10-20 11950 4e8cafc3eea47ec7ba505bb1e3465d21
To update a database type:
funannotate setup -i DBNAME -d /usr/local/share/funannotate --force
To see install BUSCO outgroups type:
funannotate database --show-outgroups
To see BUSCO tree type:
funannotate database --show-buscos
funannotate outgroups¶
This script is a helper function to manage and update outgroups for funannotate compare. Outgroup species can be specified in funannotate compare to use as a reference for BUSCO-mediated maximum likelihood phylogeny. This script allows the user to add a genome to the available outgroups folder by running BUSCO and formatting it appropriately.
Usage: funannotate outgroups <arguments>
version: 1.8.14
Description: Managing the outgroups folder for funannotate compare
Arguments:
-i, --input Proteome multi-fasta file. Required.
-s, --species Species name for adding a species. Required.
-b, --busco_db BUSCO db to use. Default. dikarya
-c, --cpus Number of CPUs to use for BUSCO search.
-d, --database Path to funannotate database. Default: $FUNANNOTATE_DB