User Guide#
Update: What’s New in run_dbCAN#
The new version of run_dbCAN introduces multiple new features and significant performance improvements, making the pipeline more user-friendly and efficient. We highly recommend users to upgrade to this version. If you have any questions or suggestions, please feel free to contact us:
Dr. Yanbin Yin, Professor (yyin@unl.edu)
Xinpeng Zhang, PhD Student (xzhang55@huskers.unl.edu)
Dr. Haidong Yi, Software Engineer (hyi@stjude.org)
All conda environments dependencies can be found at the following link: run_dbCAN Conda Environments
Key Features and Improvements#
Simplified Database Downloading
Added a new
databasecommand for downloading database files, making the process simpler than before.Supports downloading from both HTTP and AWS S3 sources (use
--aws_s3flag for faster and more stable downloads).Use
--cgc/--no-cgcoption to control whether to download CGC-related databases.
Enhanced Input Processing
Replaced
prodigalwith pyrodigal for input processing.
Improved HMMER Performance
Replaced
HMMERwith pyHMMER, which is faster and more efficient.Redesigned memory usage to support both low-memory and high-efficiency modes (https://pyhmmer.readthedocs.io/en/stable/examples/performance_tips.html).
Modular Code Structure
Reorganized the logic and structure of
run_dbCANby splitting functions into modules and following Object Oriented Programming.Rewrote non-Python code in Python for improved readability.
Centralized parameter management using configuration files.
Leveraged the power of
pandasfor efficient data processing.Added extensive logging and time reporting to make the pipeline more user-friendly.
Enhanced dbCAN-sub and overview Features
Added coverage justifications and location information for dbCAN-sub.
Included CAZyme justification in the final results with an extra column called “Best Results.”
Now follow the rule:
CAZy-sub > dbCAN-sub > dbCAN-famfor the final results.
Redesigned CGCFinder
Now supports JGI, NCBI, and Prodigal gff formats.
Directly searches eukaryotic genomes, including fungi (beta function).
Added a new function to visualize the CGCs on the genome (beta function).
Faster Substrate Prediction
Replaced
blastpwithDIAMONDfor substrate prediction, significantly improving speed and efficiency.
Updated Metagenomic Protocols
Improved steps for metagenomic data processing (https://www.biorxiv.org/content/10.1101/2024.01.10.575125v1).
SignalP 6.0 and DeepTMHMM (optional topology)
SignalP 6.0: signal peptide prediction in the
CAZyme_annotationcommand via--run_signalp. Results are merged intooverview.tsvin a SignalP column. Organism class:--signalp_org(other/euk).DeepTMHMM: transmembrane topology via
--run_deeptmhmmand--deeptmhmm_dir(directory containing the user-installedpredict.py). Results are merged into DeepTMHMM inoverview.tsv.Neither tool is bundled with
dbcan. Install and test them locally, then enable the flags. See SignalP 6.0 and DeepTMHMM (optional tools) and the SignalP 6.0 installation instructions.
Global Logging System
Implemented comprehensive logging system available for all commands.
Use
--log-levelto set logging level (DEBUG/INFO/WARNING/ERROR/CRITICAL, default: WARNING).Use
--log-fileto write logs to a file in addition to console output.Use
--verboseor-vflag for detailed debug logging (equivalent to –log-level DEBUG).
Hint
If you want to run the pipeline from raw metagenomic reads, please refer to the following part: metagenomics_pipeline
Otherwise, refer to the instructions below. Please note that some precomputed results may have different names compared to the previous version.
Note
For detailed instructions, refer to the respective sections in the documentation.
Getting start
User guide
Comparison
Metagenomics pipeline
- Run from Raw Reads(Cater 2023): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Amelia 2024): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Priest 2023): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Wastyk 2021): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Emilson 2024): Automated CAZyme and Glycan Substrate Annotation in Microbiomes: A Step-by-Step Protocol
- Run from Raw Reads(Cater 2023): Supplementary Protocol for co-assembly
- Run from Raw Reads(Cater 2023): Supplementary Protocol for subsample
- Run from Raw Reads(Cater 2023): Supplementary Protocol for assembly-free
Nextflow Pipeline
Change logs
References
Contributors