genehood-cli
Command-line interface to generate GeneHood datasets.
Dependencies
Genehood needs nodeJS
version 10+ and ncbi-tools+
version 2.6+ to run.
Install
npm install -g genehood-cli
Usage
GeneHood uses MiST3 API to collect the necessary information needed for the analysis. Thus, the only inputs required from the user are:
- A list of reference genes,
- how many upstream and downstream genes should be in the analysis.
- Phylogenetic analysis in Newick format (optional)
Reference Genes
GeneHood reads a list of reference genes from the user and searches for the upstream and downstream information from those genes on MiST3.
For this reason, GeneHood uses the MiST3 standard for gene identifiers called stable id.
It is a composite of the NCBI genome version and the locus number of the gene.
Here are some examples:
MiST3 stable id | description |
---|---|
GCF_000005845.2-b4355 |
Chemoreceptor tsr(b4355) from Escherichia coli str. K-12 substr. MG1655 |
GCF_000006765.1-PA5040 |
Secretin pilQ(PA5040) from the Pseudomonas aeruginosa PAO1 |
GCF_000006905.1-CC_2066 |
part of L-ring flgH(CC_2066) from Caulobacter crescentus CB15 |
Performing GeneHood analysis
Once Genehood-cli is installed globally (-g option), NPM generates an executable called: genehood
.
genehood
takes one argument as the name of the project (in this example myNewProject) and a mandatory --action
flag with four possible values:
value | description |
---|---|
init |
Initializes the configuration and data file for the project |
run |
Starts a new run from an existing configuration file |
keepGoing |
It restarts a run from the last successful step of the analysis pipeline |
cleanUp |
Delete the temporary files generated by GeneHood |
Step 1: Initialize the project
To start a new analysis, we must initialize a new project.
genehood myProject --action init
This command will generate two files:
myProject.geneHood.config.json
myProject.geneHood.data.josn.gz
Now, we must edit the config
file to tell GeneHood to which genes it should collect gene neighborhood information.
Step 2: Edit the config file to set initial parameters
genehood-cli version 0.2.8 has flags to facilitate this process, see below.
There are several parts in the GeneHood config file, but what matters is under the section user
. There we will find three sub-sections:
section | description |
---|---|
settings |
This is where all the input data goes |
newickTree |
This is where we should add a Newick tree (optional) |
startingStep |
For advanced users if they want to start from a different step other than the default |
stopStep |
For advanced users that don't want to run the entire pipeline |
Let's focus on the settings
section first. It has three sub-sections that need user input:
section | description |
---|---|
stableIds |
This is where we will add reference genes using MiST3 stable identifier |
upstream |
Integer of how many genes should be collected upstream from the reference gene |
downstream |
Integer of how many genes should be collected downstream from the reference gene |
geneHoodPrefix |
This is pre-filled with the name of the GeneHood project. |
For example, let us add as reference genes the _cheA_s from the three chemosensory systems in the Vibrio cholerae:
system | stable Ids |
---|---|
F6 | GCF_000006745.1-VC2063 |
F7 | GCF_000006745.1-VCA1095 |
F9 | GCF_000006745.1-VC1397 |
and also, let us include 15 genes upstream and 15 downstream from the reference genes.
To do that, we can edit the config
file using any text editor.
The user
section of the config
file will be something like this:
"user": {
"newickTree": "",
"settings": {
"downstream": 15,
"geneHoodPrefix": "vibrio",
"stableIds": [
"GCF_000006745.1-VC1397",
"GCF_000006745.1-VC2063",
"GCF_000006745.1-VCA1095"
],
"upstream": 15
},
"startingStep": "fetchData",
"stopStep": ""
}
Save the file and proceed to the next step.
Step 2 (alternative): Set parameters using flags.
We can set the genes downstream and upstream using --addRange
We can add the identifiers to a text file (one identifier per line) and pass to genehood using the flag --addStableIds
.
If we put the identifiers into a file named vibrioIds.txt
, we can accomplish the same setup as before by typing:
genehood myProject --addRange 10 10 --addStableIds vibrioIds.txt
Step 3: Running GeneHood
Make sure we have an Internet connection and that blastp
and makeblastdb
are executables in our systems.
then run:
genehood myProject --action run
That is it. GeneHood should do all the rest.
Step 4: Clean up
If everything goes as expected, we should have a file called myProject.geneHood.pack.json.gz
in our directory. It probably should have a bunch of other files that GeneHood used temporarily.
We can safely remove these temp files using the action cleanUp
from genehood
:
genehood myProject --action cleanUp
GeneHood cleans all the files but 2: the config
file and the pack
file. It is a little redundant since GeneHood's pack also contains the config
file. We made it this way to facilitate for the user to see how they ran the analysis or to re-run the analysis with few changes in the config
file, if needed.
Now we just need to visualize the data.
Optional step 4.5: Add Phylogeny
We can add a phylogeny (in Newick format) to the config file at any moment, and the genehood-cli API has a helper option: --addPhylogeny
. If we add the phylogeny after the pack has been built, genehood-cli will repack the file for us.
Adding phylogeny will let the viewer to order the gene clusters following the order of the phylogenetic tree. The tree can be built in any way: single gene, multiple concatenated genes and etc. However, in order for the viewer to work the names of the leafs need to be exactly the same as the identifiers of the reference genes.
To add a new phylogeny:
genehood myProject --addPhylogeny myPhylogeny.nwk
genehood.io
Step 5: Load the data onOpen the GeneHood on a web browser and load the myProject.geneHood.pack.json.gz
.
Now just explore the data.
To learn more about the GeneHood viewer, go to genehood.io and click in Demo
.
Developers Documentation
... to be continued.
Written with