
|--------------------------------------------------------------------------
| Resistance Gene Identifier (RGI) Documentation
|--------------------------------------------------------------------------

	Before you run the RGI scripts, make sure you have installed needed external tools:

|--------------------------------------------------------------------------
| MetaGeneMark http://exon.gatech.edu/GeneMark/license_download.cgi
|--------------------------------------------------------------------------

	- Tested with MetaGeneMark v3.26 on linux 64 and Mac OS X
	- After downloading MetaGeneMark copy mgm directory into release-rgi folder
	- Follow the INSTALL instructions inside the mgm
	- Change directory to release-rgi
	- Test MetaGeneMark using the following command:

	$ ./mgm/gmhmmp (There should be list of commands)

|--------------------------------------------------------------------------
| BLAST ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
|--------------------------------------------------------------------------

	- Tested with BLAST 2.2.28 and BLAST 2.2.31+ on linux 64 and Mac OS X

	* You can alson run the following command to install blast. This will only install version 2.2.28

	$ sudo apt-get install ncbi-blast+

	- Test blast install with the following command:

	$ makeblastdb

	* Biopython http://biopython.org/DIST/docs/install/Installation.html#sec12
	* Run the following command to install Bio-python

	$ sudo apt-get install python-biopython

	* Download the database - card.json from Downloads on the CARD website (a copy may be included with this release)

|--------------------------------------------------------------------------
| Running RGI:
|--------------------------------------------------------------------------

Open a terminal, type: 

	$ python rgi.py -h 

|--------------------------------------------------------------------------
| RGI inputs
|--------------------------------------------------------------------------

	$ python rgi.py -h

		usage: rgi.py [-h] [-t INTYPE] [-i INPUTSEQ] [-n THREADS] [-o OUTPUT]
		              [-e CRITERIA] [-c CLEAN]

		Resistance Gene Identifier - Version 3.0.3

		optional arguments:
		  -h, --help            show this help message and exit
		  -t INTYPE, --inType INTYPE
		                        must be one of contig, orf, protein, read (default:
		                        contig)
		  -i INPUTSEQ, --inputSeq INPUTSEQ
		                        input file must be in either FASTA (contig and
		                        protein), FASTQ(read) or gzip format! e.g
		                        myFile.fasta, myFasta.fasta.gz
		  -n THREADS, --num_threads THREADS
		                        Number of threads (CPUs) to use in the BLAST search
		                        (default=32)
		  -o OUTPUT, --out_file OUTPUT
		                        Output JSON file (default=Report)
		  -e CRITERIA, --exclude_loose CRITERIA
		                        This option is used to include or exclude the loose
		                        hits. Options are 0 or 1 (default=1 for exclude)
		  -c CLEAN, --clean CLEAN
		                        This removes temporary files in the results directory
		                        after run. Options are 0 or 1 (default=1 for remove)


	INTYPE could be one of 'contig', 'protein' or 'read'.

	1. 'contig' means that inputSequence is a DNA sequence stored in a FASTA file, presumably a complete genome or assembly contigs. RGI will predict ORFs de novo and predict resistome using a combination of BLASTP against the CARD data, curated cut-offs, and SNP screening.

	2. 'protein', as its name suggests, requires a FASTA file with protein sequences. As above, RGI predict resistome using a combination of BLASTP against the CARD data, curated cut-offs, and SNP screening.

	3. 'read' expects raw FASTQ format nucleotide data and predicts resistome using a combination of BLASTX against the CARD data, curated cut-offs, and SNP screening. This is an experimental tool and we have yet to adjust the CARD cut-offs for BLASTX.  We will be exploring other metagenomics or FASTQ screening methods. Note that RGI does not perform any pre-processing of the FASTQ data (linker trimming, etc).


|--------------------------------------------------------------------------
| RGI outputs
|--------------------------------------------------------------------------

	RGI Output will produce a detailed JSON file: Report.json

	The JSON is as follows (example shows only one hit):

	- gene_71|gi|378406451|gb|JN420336.1| Klebsiella pneumoniae plasmid pNDM-MAR, complete sequence: {
		// Hit 1
		gnl|BL_ORD_ID|39|hsp_num:0: {
			SequenceFromBroadStreet: "MRYIRLCIISLLATLPLAVHASPQPLEQIKQSESQLSGRVGMIEMDLASGRTLTAWRADERFPMMSTFKVVLCGAVLARVDAGDEQLERKIHYRQQDLVDYSPVSEKHLADGMTVGELCAAAITMSDNSAANLLLATVGGPAGLTAFLRQIGDNVTRLDRWETELNEALPGDARDTTTPASMAATLRKLLTSQRLSARSQRQLLQWMVDDRVAGPLIRSVLPAGWFIADKTGASKRGARGIVALLGPNNKAERIVVIYLRDTPASMAERNQQIAGIGAA",
			"orf_start": 67822,
			"ARO_name": "SHV-12",
			"type_match": "Loose",
			"query": "INDWRLDYNECRPHSSLNYLTPAEFAAGWRN",
			"evalue": 3.82304,
			"max-identities": 10,
			"orf_strand": "-",
			"bit-score": 24.6386,
			"cvterm_id": "35914",
			"sequenceFromDB": "LDRWETELNEALPGDARDTTTPASMAATLRK",
			"match": "++ W  + NE  P  + +  TPA  AA  R ",
			"model_id": "103",
			"orf_From": "gi|378406451|gb|JN420336.1| Klebsiella pneumoniae plasmid pNDM-MAR, complete sequence",
			"pass_evalue": 1e-100,
			"query_end": 68607,
			"ARO_category": {
			    "36696": {
				    "category_aro_name": "antibiotic inactivation enzyme",
				    "category_aro_cvterm_id": "36696",
				    "category_aro_accession": "3000557",
				    "category_aro_description": "Enzyme that catalyzes the inactivation of an antibiotic resulting in resistance.  Inactivation includes chemical modification, destruction, etc."
				},
				"36268": {
			        "category_aro_name": "beta-lactam resistance gene",
			        "category_aro_cvterm_id": "36268",
			        "category_aro_accession": "3000129",
			        "category_aro_description": "Genes conferring resistance to beta-lactams."
			    }
			},
			"ARO_accession": "3001071",
			"query_start": 68515,
			"model_name": "SHV-12",
			"model_type": "protein homolog model",
			"orf_end": 68646
		},
		...
		// Hit 2
		...
		// Hit 3
		...
	}

|--------------------------------------------------------------------------
| Getting Tab Delimited output after running RGI:
|--------------------------------------------------------------------------

	Run the following command to get help on how to get the Tab Delimited output

	$ python convertJsonToTSV.py -h

|--------------------------------------------------------------------------
| convertJsonToTSV inputs
|--------------------------------------------------------------------------

	$ python convertJsonToTSV.py -h

		usage: convertJsonToTSV.py [-h] [-i AFILE] [-o OUTPUT]

		Convert RGI JSON file to Tab-delimited file

		optional arguments:
		  -h, --help            show this help message and exit
		  -i AFILE, --afile AFILE
		                        must be a json file generated from RGI in JSON or gzip
		                        format e.g out.json, out.json.gz
		  -o OUTPUT, --out_file OUTPUT
		                        Output JSON file (default=dataSummary)

|--------------------------------------------------------------------------
| convertJsonToTSV outputs
|--------------------------------------------------------------------------

	This outputs a tab-delimited text file: dataSummary.txt

	The tab-output is as follows:

	ORF_ID	CONTIG	START	STOP	ORIENTATION	CUT_OFF	Best_Hit_evalue	Best_Hit_ARO	Best_Identites	ARO	ARO_name	Model_type	SNP	AR0_category	bit_score
					

|--------------------------------------------------------------------------
| Files Structure
|--------------------------------------------------------------------------


`-- rgi
   |-- mgm
	   |-- ...
	   |-- ...
   |-- __init__.py
   |-- card.json
   |-- clean.py
   |-- contigToORF.py
   |-- contigToProteins.py
   |-- convertJsonToTSV.py
   |-- filepaths.py
   |-- formatJson.py
   |-- fqToFsa.py
   |-- INSTALL
   |-- load.py
   |-- README
   |-- rgi.py
   


|--------------------------------------------------------------------------
| Loading new card.json:
|--------------------------------------------------------------------------

	* If new card.json is available. Replace card.json in the directory show above. Use the following command:

	$ python load.py -h


|--------------------------------------------------------------------------
| Load inputs
|--------------------------------------------------------------------------

	$ python load.py -h

		usage: load.py [-h] [-i AFILE]

		Load card database json file

		optional arguments:
		  -h, --help            show this help message and exit
		  -i AFILE, --afile AFILE
		                        must be a card database json file


|--------------------------------------------------------------------------
| Clean databases
|--------------------------------------------------------------------------

	* Database is created once the rgi.py is run. Use clean.py to remove databases after new card.json is loaded.

	* Then run clean.py to clean directory.

	$ python clean.py -h


|--------------------------------------------------------------------------
| Clean inputs
|--------------------------------------------------------------------------

	$ python clean.py -h

		usage: clean.py [-h]

		Removes BLAST databases created using card.json

		optional arguments:
		  -h, --help  show this help message and exit

|--------------------------------------------------------------------------
| Format JSON
|--------------------------------------------------------------------------

	$ python formatJson.py -h 

		usage: formatJson.py [-h] [-i IN_FILE] [-o OUT_FILE]

		Convert RGI JSON file to Readable JSON file

		optional arguments:
		  -h, --help            show this help message and exit
		  -i IN_FILE, --in_file IN_FILE
		                        input file must be in JSON format e.g Report.json
		  -o OUT_FILE, --out_file OUT_FILE
		                        Output JSON file (default=ReportFormatted)	

|--------------------------------------------------------------------------
| Contact Us:
|--------------------------------------------------------------------------

	For help please contact the following awesome people:

	* Dr. Andrew McArthur <mcarthua@mcmaster.ca>
	* Amos Raphenya <raphenar@mcmaster.ca>
	* Pearl Guo <p7guo@uwaterloo.ca>
	* Justin Jia <jiabf@mcmaster.ca>
 