node package manager
Share your code. npm Orgs help your team discover, share, and reuse code. Create a free org ยป

txreader

txreader

description

Handling transcript information data. (Node.js) (e.g. knownGene.txt)

installation

$ npm install txreader

dependencies

usage

create an instance

var tx = TxReader.create('knownGene.txt', {
  xref: 'kgXref.txt' // gene name info (optional)
});

get info from ucsc transcript id

var info = tx.getInfo('uc001acn.2'); // info object (explains later)

get ucsc transcripts ids from gene name

var BCRs = tx.getTxsByGene('BCR');

get ucsc transcript ids from refseq id

var NMs  = tx.getTxsByRefSeqId('NM_033487'); // get list of transcripts whose refseq id is 'NM_033487'

API Documentation

  • TxReader.create(knownGene, options)
  • txr.getTxsByExon(formattedExon)
  • txr.getTxsByGene(geneName)
  • txr.getTxsByRefSeqId(refseqId)
  • txr.getNames()
  • txr.getGeneName(txname)
  • txr.getRefSeqId(txname)
  • txr.getInfo(name)
  • txr.getExons(name)
  • txr.getSeq(name, fr, options)
  • TxReader.parseLine(line)

TxReader.create(knownGene, options)

Creates an instance of TxReader.

knownGene is a file UCSC provides. The file format is in http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz

options is option object.

key description example
xref xref file (compatible with http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/kgXref.txt.gz ) kgXref.txt
noCacheInfo if true, not caching transcript information true

txr.getInfo(name)

Gets an information object of a transcript.

name is a name of transcript.

Returns information object, following the format.

key name description example
name name of the transcript uc011msz.1
chrom chromosome name chr11
strand strand of the transcript (+/-) +
isMinus if strand is minus (boolean) false
txStart transcription start position (0-based coordinate system) 12345880
txEnd transcription end position (0-based coordinate system) 12346880
cdsStart coding region start position (0-based coordinate system) 12345880
cdsEnd coding region end position (0-based coordinate system) 12346880
proteinID protein ID B7ZGX9
exons list of exons order by exon num. (0-based coordinate) [{chr: xxx, start: xxx, end: xxx, strand: xxx}, ...]
gene gene name ALG13
refseqId refseq ID NM_033487

txr.getSeq(name, fr, options)

Gets sequences of given name.

name is a UCSC transcript.

fr is a instance of fastareader.

options is as follows.

key name description default example
startExon start exon number 1 2
startBase start base in the start exon(0-based coordinate) 0 21
endExon end exon number (exons.length) 4
endBase end base in the end exon(0-based coordinate) (exon length of the endExon) 300

txr.getNames()

Gets a list of all transcripts.

txr.getTxsByExon(formattedExon)

Gets a hash of transcript which has the given exon.

formattedExon is compatible with dna library

chr2:34100214-34101989,-

Returns a hash whose keys are UCSC transcript id and values are the exon number.

txr.getTxsByGene(geneName)

Gets a list of UCSC transcripts whose gene name is geneName.

geneName must be compatible with one written in options.xref file.

txr.getTxsByRefSeqId(refseqId)

Gets a list of UCSC transcripts whose gene name is geneName.

refseqId must be compatible with one written in options.xref file.

TxReader.parseLine(line)

(static) Parses a line from UCSC knownGene file. Returns information object (written in txr.getInfo()).