03 - custom curation database

Build custom databases for protein-coding gene curation

By default, MitoPilot uses protein-coding gene sequences from NCBI RefSeq to finetune start and stop codon positions of your annotations.

We have provided a helper function MitoPilot::custom_curation_db, which allows you to supplement the RefSeq databases with your own protein-coding gene sequences. Using a custom database can greatly improve automatic curation if your focal clade is poorly represented in RefSeq.

You could gather additional protein-coding gene sequences from many sources, including:

  • GenBank mitogenomes that are not part of RefSeq
  • mitogenomes from other data repositories
  • your own unpublished mitogenomes

Make sure to carefully consider what you are adding to the custom database. You should only use high-confidence sequences, as poor quality reference data will result in poorly curated gene models.

Please see the MitoPilot::custom_curation_db documentation for further instructions.

Note

MitoPilot only needs a curation database for protein-coding genes. rRNA and tRNA curation does not utilize reference alignment.

Using a custom curation database with MitoPilot

To use a custom curation database, you need to specify the database directory path in the ref_dir field in the Curate Opts. window of the Annotate module. This path will be printed by the MitoPilot::custom_curation_db function.

You should also specify “Metazoa_RefSeq231_custom” or “Chordata_custom” in the ref_db field, depending on which base database you used with the MitoPilot::custom_curation_db function.

Fig. 4