CAZyme Annotation#
Introduction#
CAZyme annotation is a critical step in identifying and classifying Carbohydrate-Active Enzymes (CAZymes) in biological sequences. The run_dbcan tool enables comprehensive annotation of CAZymes from various input types:
Prokaryotic genomes (nucleotide sequences)
Metagenomic contigs (nucleotide sequences)
Protein sequences (prokaryotic or eukaryotic)
The annotation process integrates multiple analytical tools to ensure high sensitivity and specificity in CAZyme identification.
Command Syntax#
run_dbcan CAZyme_annotation --input_raw_data <INPUT_FILE> --output_dir <OUTPUT_DIRECTORY> --db_dir <DATABASE_DIRECTORY> --mode <MODE>
Key Parameters#
Parameter |
Description |
|---|---|
|
Path to input sequence file (FASTA format) |
|
Directory for output files |
|
Directory containing database files |
|
Analysis mode: |
|
Optional: Specify tools to use ( |
Usage Examples#
Analyzing Prokaryotic Genomes#
When working with bacterial or archaeal genomes, use the prok mode:
run_dbcan CAZyme_annotation --input_raw_data EscheriaColiK12MG1655.fna --output_dir output_EscheriaColiK12MG1655_fna --db_dir db --mode prok
Analyzing Protein Sequences#
For pre-translated protein sequences, use the protein mode:
run_dbcan CAZyme_annotation --input_raw_data EscheriaColiK12MG1655.faa --output_dir output_EscheriaColiK12MG1655_faa --db_dir db --mode protein
Analyzing Eukaryotic Proteins#
Eukaryotic proteins are processed the same way using protein mode:
run_dbcan CAZyme_annotation --input_raw_data Xylona_heveae_TC161.faa --output_dir output_Xylona_heveae_TC161_faa --db_dir db --mode protein
run_dbcan CAZyme_annotation --input_raw_data Xylhe1_GeneCatalog_proteins_20130827.aa.fasta --output_dir output_Xylhe1_faa --db_dir db --mode protein
Tip
For large eukaryotic datasets, consider change the computational resources with --threads to specify the number of CPU cores.
The default is all cores of your machine.
Output Files#
The annotation process generates several key output files in your specified output directory:
uniInput.faa- Unified input file for all toolsoverview.txt- Summary of identified CAZymesdbCAN_hmm_results.tsv- Detailed HMMER resultsdiamond.out- DIAMOND search resultsdbCANsub_hmm_results.tsv- dbCAN sub-HMM results including substrate specificity
Customizing the Analysis#
To customize which analytical methods are used:
run_dbcan CAZyme_annotation --input_raw_data input.fna --output_dir output --db_dir db --mode prok --methods hmm --methods diamond
Available method combinations: hmm, diamond, dbCANsub, or any combination.
Tip
Optional signal peptide and transmembrane topology columns (SignalP 6.0 / DeepTMHMM) require separate installation and testing. See SignalP 6.0 and DeepTMHMM (optional tools).