CARD Variants Download README Use or reproduction of these materials, in whole or in part, by any commercial organization whether or not for non-commercial (including research) or commercial purposes is prohibited, except with written permission of McMaster University. Commercial uses are offered only pursuant to a written license and user fee. To obtain permission and begin the licensing process, see http://card.mcmaster.ca/about. For details on how these data are generated, see https://card.mcmaster.ca/genomes and https://card.mcmaster.ca/prevalence. CITATION: Alcock et al. 2023. "CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database" Nucleic Acids Research, 51, D690-D699. https://pubmed.ncbi.nlm.nih.gov/36263822/ CARD SHORT NAMES: A CARD-specific abbreviation for AMR gene names associated with Antibiotic Resistance Ontology terms, often not based on the literature. This is used for programmatic and compatibility purposes and is not ontologically relevant. Each ontology term with an associated AMR detection model has a CARD Short Name that appears in CARD data files and output generated by RGI. If the original gene name is less than 15 characters, the CARD short name is identical; if the gene name is greater than 15 characters, the CARD Short Name has been abbreviated by CARD curators specifically to identify the proper gene or protein name. All CARD Short Names are unique and have whitespace characters replaced by underscore characters. The convention for pathogen names is capitalized first letter of the genus followed by the lowercase first three letters of the species name. The antibiotic abbreviations are from https://journals.asm.org/journal/aac/abbreviations plus some custom abbreviations by the CARD curators. Simple CARD Short Names often do not involve either, e.g. CTX-M-15, but where applicable the CARD Short Names follow pathogen_gene or pathogen_gene_drug. The full lists of abbreviations can be found at: https://card.mcmaster.ca/latest/data FASTA: The provided FASTA files are separated for both sequence type (nucleotide and peptide sequences) and for model type. The included AMR detection model types from CARD are: protein homolog model, protein variant model, protein overexpression model and rRNA gene variant model. Note that the rRNA gene variant model is nucleotide-only as this model identifies putative AMR-conferring mutations in rRNA. For an in-depth description of each model type, please see the CARD website or Alcock et al., 2023. The header for each FASTA sequence includes: 1) the prevalence sequence ID (crossreferenced with the "index-for-model-sequences.txt" described below); 2) the ARO name for that determninant as it appears in CARD; 3) the ARO accession for that determinant; and, 4) the detection model type used to predict that determinant. Please note these sequences are as they appeared from assembly and so do not directly indicate the predicted AMR-relevant mutation (as applicable to protein variant, protein overexpression and rRNA gene variant models). In addition, in a small number of cases sequences may have multiple ARO names and accessions delimited by commas. INDEX FILES: "index-for-model-sequences.txt.gz": contains all the detection statistics for the sequences available in the above FASTA files, indicating pathogen, detection criteria, ARO categorization, and similarity to curated CARD reference sequence. The headers included are: "prevalence_sequence_id": indicates the corresponding sequence in the FASTA file "model_id": CARD-assigned ID indicating the AMR detection model "aro_term": indicates the name of the AMR determinant to which the model is attached "aro_accession": the unique ARO accession for each ontology term in CARD "detection_model": the detection model type used for this determinant "species_name": the analyzed pathogen in which this determinant was detected "ncbi_accession": the NCBI RefSeq accession from which this determinant was detected "data_type": the assembly level of the analyzed genome (chromosome, plasmid, wgs) "rgi_criteria": the confidence of the AMR determinant hit from RGI (Perfect or Strict) "percent_identity": the percent identity of the matched sequence region to the CARD reference "bitscore": the bitscore of the matched sequence region to the CARD reference "amr_gene_family": the AMR Gene Family from CARD for this determinant "resistance_mechanism": the Resistance Mechanism from CARD for this determinant "drug_class": the Drug Class from CARD for this determinant "card-genomes.txt.gz": tab-separated; indicates for each analyzed genome which putative AMR determinants were detected by RGI at both the Perfect and Strict confidence levels. This provides a "putative resistome" for each genome. Available and viewable on the web at: https://card.mcmaster.ca/genomes. The headers included are: "dna_accession": the NCBI RefSeq accession analyzed "pathogen": the pathogen/species indicated for this assembly "data_source": the assembly level of the analyzed genome (chromosome, plasmid, wgs) "perfect_hits": comma-separated list of AMR determinants identified as "perfect" by RGI "strict_hits": comma-separated list of AMR determinants identified as "strict" by RGI "card_prevalence.txt.gz": tab-separated; indicates the prevalence of each detected AMR determinant by assembly type (chromosome, plasmid, wgs) and RGI confidence crtieria (Perfect or Strict). Available and viewable on the web at https://card.mcmaster.ca/prevalence. The headers included are: "ARO Accession": the unique ARO identifier from CARD for each ontology term "Name": the name for this accession as it appears in CARD "Model ID": the AMR detection model ID used to predict this determinant "Model Type": the AMR detection model type used to predict this determinant "Pathogen": the pathogen/species being described by prevalence statistics "NCBI Plasmid / NCBI Chromosome / NCBI WGS / NCBI Genomic Island": the prevalence (as %) of this determinant across all analyzed assemblies for this data type "Criteria": the RGI criteria (perfect or strict) used to calculate the prevalence of this determinant "ARO Categories": semi-colon-separated list of ARO categories listed for this determinant (AMR Gene Family, Resistance Mechanism, Drug Class) KMER FILES: The files "61_kmer_db.json.gz" and "all_amr_61mers.txt.gz" are used by the Resistome Gene Identifier (RGI) for pathogen-of-origin analysis of predicted antimicrobial resistance genes, see https://github.com/arpcard/rgi.