CARD Download README

Use or reproduction of these materials, in whole or in part, by any commercial 
organization whether or not for non-commercial (including research) or commercial purposes
is prohibited, except with written permission of McMaster University. Commercial uses are
offered only pursuant to a written license and user fee. To obtain permission and begin 
the licensing process, see http://card.mcmaster.ca/about.

CITATION:

Alcock et al. 2023. "CARD 2023: expanded curation, support for machine learning, and resistome 
prediction at the Comprehensive Antibiotic Resistance Database" Nucleic Acids Research, 
51, D690-D699. https://pubmed.ncbi.nlm.nih.gov/36263822/

CARD SHORT NAMES:

A CARD-specific abbreviation for AMR gene names associated with Antibiotic Resistance
Ontology terms, often not based on the literature. This is used for programmatic and 
compatibility purposes and is not ontologically relevant. Each ontology term with an 
associated AMR detection model has a CARD Short Name that appears in CARD data files 
and output generated by RGI. If the original gene name is less than 15 characters, the 
CARD short name is identical; if the gene name is greater than 15 characters, the CARD 
Short Name has been abbreviated by CARD curators specifically to identify the proper 
gene or protein name. All CARD Short Names are unique and have whitespace characters 
replaced by underscore characters. The convention for pathogen names is capitalized 
first letter of the genus followed by the lowercase first three letters of the species 
name. The antibiotic abbreviations are from https://journals.asm.org/journal/aac/abbreviations
plus some custom abbreviations by the CARD curators. Simple CARD Short Names often do not
involve either, e.g. CTX-M-15, but where applicable the CARD Short Names follow pathogen_gene
or pathogen_gene_drug. The full lists of abbreviations can be found in the enclosed files: 

"shortname_antibiotics.tsv"
"shortname_pathogens.tsv"

FASTA:

Nucleotide and corresponding protein FASTA downloads are available as separate files for 
each model type.  For example, the "protein homolog" model type contains sequences of
antimicrobial resistance genes that do not include mutation as a determinant of resistance
- these data are appropriate for BLAST analysis of metagenomic data or searches excluding 
secondary screening for resistance mutations. In contrast, the "protein variant" model 
includes reference wild type sequences used for mapping SNPs conferring antimicrobial 
resistance - without secondary mutation screening, analyses using these data will include 
false positives for antibiotic resistant gene variants or mutants.

MODELS:

The file "card.json" contains the complete data for all of CARD's AMR detection models, 
including reference sequences, SNP mapping data, model parameters, and ARO classification.
"card.json" is used by the Resistance Gene Identifier software. 

Values for "High Confidence TB", "Moderate Confidence TB", "Minimal Confidence TB", and
"Indeterminate Confidence TB" were obtained from https://platform.reseqtb.org.

INDEX FILES:

The file "aro_index.tsv" contains a list of ARO tagging of GenBank accessions stored in 
CARD.

The file "aro_categories.tsv" contains a list of ARO terms used to categorize all entries
in CARD and results via the RGI. These categories reflect AMR gene family, target drug 
class, and mechanism of resistance.

The file "aro_categories_index.tsv" contains a list a GenBank accessions stored 
in CARD cross-referenced with the major categories within the ARO. These categories 
reflect AMR gene family, target drug class, and mechanism of resistance, so GenBank 
accessions may have more than one cross-reference. For more complex categorization of 
the data, use the full ARO available at http://card.mcmaster.ca/download.

The file "snps.txt" lists the SNPs associated with specific detection models.

The file "pmids.tsv" lists the citations for each Antibiotic Resistance Ontology term.

