After your genome has gone through the gene prediction module and you have gene models that pass NCBI specs the next step is to add functional annotate to the protein-coding genes. Funannotate accomplishes this using several curated databases and is run using the
funannotate annotate command.
Funannotate will parse the protein-coding models from the annotation and identify Pfam domains, CAZYmes, secreted proteins, proteases (MEROPS), and BUSCO groups. If you provide the script with InterProScan5 data
--iprscan, funannotate will also generate additional annotation: InterPro terms, GO ontology, and fungal transcription factors. If Eggnog-mapper is installed locally or you pass eggnog results via
--eggnog, then Eggnog annotations and COGs will be added to the functional annotation. The scripts will also parse UniProtKb/SwissProt searches with Eggnog-mapper searches (optional) to generate gene names and product descriptions.
InterProScan5 and Eggnog-Mapper are two functional annotation pipelines that can be parsed by funannotate, however due to the large database sizes they are not run directly. If
emapper.py (Eggnog-mapper) is installed, then it will be run automatically during the functional annotation process. Because InterProScan5 is Linux only, it must be run outside funannotate and the results passed to the script. If you are on Mac, I’ve included a method to run InterProScan5 using Docker and the
funannotate predict output will let the user know how to run this script. Alternatively, you can run the InterProScan5 search remotely using the
funannotate remote command.
Phobius and SignalP will be run automatically if they are installed (i.e. in the PATH), however, Phobius will not run on Mac. If you are on Mac you can run Phobius with the
funannotate remote script.
If you are annotating a fungal genome, you can run Secondary Metabolite Gene Cluster prediction using antiSMASH. This can be done on the webserver, submit your GBK file from predict (predict_results/yourGenome.gbk) or alternatively you can submit from the command line using
funannotate remote. Of course, if you are on Linux you can install the antiSMASH program locally and run that way as well. The annotated GBK file is fed back to this script with the
funannotate predict, the output from
funannotate annotate will be populated in the output/annotate_results folder. The output files are:
|Basename.gbk||Annotated Genome in GenBank Flat File format|
|Basename.contigs.fsa||Multi-fasta file of contigs, split at gaps (use for NCBI submission)|
|Basename.agp||AGP file; showing linkage/location of contigs (use for NCBI submission)|
|Basename.tbl||NCBI tbl annotation file (use for NCBI submission)|
|Basename.sqn||NCBI Sequin genome file (use for NCBI submission)|
|Basename.scaffolds.fa||Multi-fasta file of scaffolds|
|Basename.proteins.fa||Multi-fasta file of protein coding genes|
|Basename.transcripts.fa||Multi-fasta file of transcripts (mRNA)|
|Basename.discrepency.report.txt||tbl2asn summary report of annotated genome|
|Basename.annotations.txt||TSV file of all annotations added to genome. (i.e. import into excel)|
|Gene2Products.must-fix.txt||TSV file of Gene Name/Product deflines that failed to pass tbl2asn checks and must be fixed|
|Gene2Products.need-curating.txt||TSV file of Gene Name/Product defines that need to be curated|
|Gene2Products.new-names-passed.txt||TSV file of Gene Name/Product deflines that passed tbl2asn but are not in Gene2Products database. Please submit a PR with these.|
$ funannotate annotate Usage: funannotate annotate <arguments> version: 1.7.0 Description: Script functionally annotates the results from funannotate predict. It pulls annotation from PFAM, InterPro, EggNog, UniProtKB, MEROPS, CAZyme, and GO ontology. Required: -i, --input Folder from funannotate predict or --genbank Genome in GenBank format -o, --out Output folder for results or --gff Genome GFF3 annotation file --fasta Genome in multi-fasta format -s, --species Species name, use quotes for binomial, e.g. "Aspergillus fumigatus" -o, --out Output folder for results Optional: --sbt NCBI submission template file. (Recommended) -a, --annotations Custom annotations (3 column tsv file) --eggnog Eggnog-mapper annotations file (if NOT installed) --antismash antiSMASH secondary metabolism results (GBK file from output) --iprscan InterProScan5 XML file --phobius Phobius pre-computed results (if phobius NOT installed) --isolate Isolate name --strain Strain name --rename Rename GFF gene models with locus_tag from NCBI. --fix Gene/Product names fixed (TSV: GeneID Name Product) --remove Gene/Product names to remove (TSV: Gene Product) --busco_db BUSCO models. Default: dikarya -t, --tbl2asn Additional parameters for tbl2asn. Default: "-l paired-ends" -d, --database Path to funannotate database. Default: $FUNANNOTATE_DB --force Force over-write of output folder --cpus Number of CPUs to use. Default: 2