genehood-cli
TypeScript icon, indicating that this package has built-in type declarations

0.2.9-1 • Public • Published

genehood-cli

npm License: CC0-1.0 pipeline status coverage report

Command-line interface to generate GeneHood datasets.

Dependencies

Genehood needs nodeJS version 10+ and ncbi-tools+ version 2.6+ to run.

Install

npm install -g genehood-cli

Usage

GeneHood uses MiST3 API to collect the necessary information needed for the analysis. Thus, the only inputs required from the user are:

  • A list of reference genes,
  • how many upstream and downstream genes should be in the analysis.
  • Phylogenetic analysis in Newick format (optional)

Reference Genes

GeneHood reads a list of reference genes from the user and searches for the upstream and downstream information from those genes on MiST3.

For this reason, GeneHood uses the MiST3 standard for gene identifiers called stable id.

It is a composite of the NCBI genome version and the locus number of the gene.

Here are some examples:

MiST3 stable id description
GCF_000005845.2-b4355 Chemoreceptor tsr(b4355) from Escherichia coli str. K-12 substr. MG1655
GCF_000006765.1-PA5040 Secretin pilQ(PA5040) from the Pseudomonas aeruginosa PAO1
GCF_000006905.1-CC_2066 part of L-ring flgH(CC_2066) from Caulobacter crescentus CB15

Performing GeneHood analysis

Once Genehood-cli is installed globally (-g option), NPM generates an executable called: genehood.

genehood takes one argument as the name of the project (in this example myNewProject) and a mandatory --action flag with four possible values:

value description
init Initializes the configuration and data file for the project
run Starts a new run from an existing configuration file
keepGoing It restarts a run from the last successful step of the analysis pipeline
cleanUp Delete the temporary files generated by GeneHood

Step 1: Initialize the project

To start a new analysis, we must initialize a new project.

genehood myProject --action init

This command will generate two files:

  • myProject.geneHood.config.json
  • myProject.geneHood.data.josn.gz

Now, we must edit the config file to tell GeneHood to which genes it should collect gene neighborhood information.

Step 2: Edit the config file to set initial parameters

genehood-cli version 0.2.8 has flags to facilitate this process, see below.

There are several parts in the GeneHood config file, but what matters is under the section user. There we will find three sub-sections:

section description
settings This is where all the input data goes
newickTree This is where we should add a Newick tree (optional)
startingStep For advanced users if they want to start from a different step other than the default
stopStep For advanced users that don't want to run the entire pipeline

Let's focus on the settings section first. It has three sub-sections that need user input:

section description
stableIds This is where we will add reference genes using MiST3 stable identifier
upstream Integer of how many genes should be collected upstream from the reference gene
downstream Integer of how many genes should be collected downstream from the reference gene
geneHoodPrefix This is pre-filled with the name of the GeneHood project.

For example, let us add as reference genes the _cheA_s from the three chemosensory systems in the Vibrio cholerae:

system stable Ids
F6 GCF_000006745.1-VC2063
F7 GCF_000006745.1-VCA1095
F9 GCF_000006745.1-VC1397

and also, let us include 15 genes upstream and 15 downstream from the reference genes.

To do that, we can edit the config file using any text editor.

The user section of the config file will be something like this:

"user": {
 "newickTree": "",
 "settings": {
  "downstream": 15,
  "geneHoodPrefix": "vibrio",
  "stableIds": [
   "GCF_000006745.1-VC1397",
   "GCF_000006745.1-VC2063",
   "GCF_000006745.1-VCA1095"
  ],
  "upstream": 15
 },
 "startingStep": "fetchData",
 "stopStep": ""
}

Save the file and proceed to the next step.

Step 2 (alternative): Set parameters using flags.

We can set the genes downstream and upstream using --addRange

We can add the identifiers to a text file (one identifier per line) and pass to genehood using the flag --addStableIds.

If we put the identifiers into a file named vibrioIds.txt, we can accomplish the same setup as before by typing:

genehood myProject --addRange 10 10 --addStableIds vibrioIds.txt

Step 3: Running GeneHood

Make sure we have an Internet connection and that blastp and makeblastdb are executables in our systems.

then run:

genehood myProject --action run

That is it. GeneHood should do all the rest.

Step 4: Clean up

If everything goes as expected, we should have a file called myProject.geneHood.pack.json.gz in our directory. It probably should have a bunch of other files that GeneHood used temporarily.

We can safely remove these temp files using the action cleanUp from genehood:

genehood myProject --action cleanUp

GeneHood cleans all the files but 2: the config file and the pack file. It is a little redundant since GeneHood's pack also contains the config file. We made it this way to facilitate for the user to see how they ran the analysis or to re-run the analysis with few changes in the config file, if needed.

Now we just need to visualize the data.

Optional step 4.5: Add Phylogeny

We can add a phylogeny (in Newick format) to the config file at any moment, and the genehood-cli API has a helper option: --addPhylogeny. If we add the phylogeny after the pack has been built, genehood-cli will repack the file for us.

Adding phylogeny will let the viewer to order the gene clusters following the order of the phylogenetic tree. The tree can be built in any way: single gene, multiple concatenated genes and etc. However, in order for the viewer to work the names of the leafs need to be exactly the same as the identifiers of the reference genes.

To add a new phylogeny:

genehood myProject --addPhylogeny myPhylogeny.nwk

Step 5: Load the data on genehood.io

Open the GeneHood on a web browser and load the myProject.geneHood.pack.json.gz.

Now just explore the data.

To learn more about the GeneHood viewer, go to genehood.io and click in Demo.

Developers Documentation

Developer's Documentation

... to be continued.

Written with in Typescript.

Package Sidebar

Install

npm i genehood-cli

Weekly Downloads

0

Version

0.2.9-1

License

CC0-1.0

Unpacked Size

365 kB

Total Files

61

Last publish

Collaborators

  • daviortega