Database Description#
The built-in downloader uses the db_current HTTP snapshot from pro.unl.edu by default.
You can also download the pinned S3 release for faster and more stable transfers by using the --aws_s3 flag with the run_dbcan database command (see Preparing Databases).
The databases are generally updated annually between July and September.
The databases used by run_dbCAN are described below.
CAZyme Databases#
DIAMOND database - Description: Fast protein alignment database used to identify CAZyme sequences. - Filename:
CAZy.dmnddbCAN HMM database - Description: HMM profiles for CAZyme families; sensitive identification. - Filename:
dbCAN.hmmdbCAN-sub HMM database - Description: Subfamily-level HMM profiles for fine-grained CAZyme subfamily identification. - Filename:
dbCAN-sub.hmm
CGC Databases#
Transporter DIAMOND database (from TCDB) - Description: Transporter proteins used to identify transporters in CGCs. - Filename:
TCDB.dmndTranscription Factor HMM database (fungi, from MycoCosm) - Description: HMM profiles for transcription factors used in fungal datasets. - Filename:
TF.hmmTranscription Factor DIAMOND database (prokaryotes, from PRODORIC) - Description: TF protein database used to identify TFs in prokaryotic CGCs. - Filename:
TF.dmndSignal Transduction Protein HMM database - Description: HMM profiles for signal transduction proteins. - Filename:
STP.hmmSulfatase DIAMOND database (from SulfAtlas) - Description: Sulfatase protein database used to identify sulfatases. - Filename:
sulfatlas_db.dmndPeptidase DIAMOND database (from MEROPS) - Description: Peptidase protein database used to identify peptidases. - Filename:
peptidase_db.dmnddbCAN-PUL DIAMOND database - Description: Polysaccharide Utilization Loci (PUL) protein database for PUL identification. - Filename:
PUL.dmnd
Substrate Prediction Databases#
Substrate mapping table - Description: Mapping from CAZyme family/EC to known substrates. - Filename:
fam-substrate-mapping.tsvdbCAN-PUL substrate table - Description: Substrate mapping associated with PULs from the dbCAN-PUL database. - Directory:
dbCAN-PUL/- Spreadsheet:dbCAN-PUL.xlsx