txreader
description
Handling transcript information data. (Node.js) (e.g. knownGene.txt)
installation
$ npm install txreader
dependencies
usage
create an instance
var tx = TxReader.create('knownGene.txt', {
xref: 'kgXref.txt' // gene name info (optional)
});
get info from ucsc transcript id
var info = tx.getInfo('uc001acn.2'); // info object (explains later)
get ucsc transcripts ids from gene name
var BCRs = tx.getTxsByGene('BCR');
get ucsc transcript ids from refseq id
var NMs = tx.getTxsByRefSeqId('NM_033487'); // get list of transcripts whose refseq id is 'NM_033487'
API Documentation
- TxReader.create(knownGene, options)
- txr.getTxsByExon(formattedExon)
- txr.getTxsByGene(geneName)
- txr.getTxsByRefSeqId(refseqId)
- txr.getNames()
- txr.getGeneName(txname)
- txr.getRefSeqId(txname)
- txr.getInfo(name)
- txr.getExons(name)
- txr.getSeq(name, fr, options)
- TxReader.parseLine(line)
TxReader.create(knownGene, options)
Creates an instance of TxReader.
knownGene is a file UCSC provides. The file format is in http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz
options is option object.
key | description | example |
---|---|---|
xref | xref file (compatible with http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/kgXref.txt.gz ) | kgXref.txt |
noCacheInfo | if true, not caching transcript information | true |
txr.getInfo(name)
Gets an information object of a transcript.
name is a name of transcript.
Returns information object, following the format.
key name | description | example |
---|---|---|
name | name of the transcript | uc011msz.1 |
chrom | chromosome name | chr11 |
strand | strand of the transcript (+/-) | + |
isMinus | if strand is minus (boolean) | false |
txStart | transcription start position (0-based coordinate system) | 12345880 |
txEnd | transcription end position (0-based coordinate system) | 12346880 |
cdsStart | coding region start position (0-based coordinate system) | 12345880 |
cdsEnd | coding region end position (0-based coordinate system) | 12346880 |
proteinID | protein ID | B7ZGX9 |
exons | list of exons order by exon num. (0-based coordinate) | [{chr: xxx, start: xxx, end: xxx, strand: xxx}, ...] |
gene | gene name | ALG13 |
refseqId | refseq ID | NM_033487 |
txr.getSeq(name, fr, options)
Gets sequences of given name.
name is a UCSC transcript.
fr is a instance of fastareader.
options is as follows.
key name | description | default | example |
---|---|---|---|
startExon | start exon number | 1 | 2 |
startBase | start base in the start exon(0-based coordinate) | 0 | 21 |
endExon | end exon number | (exons.length) | 4 |
endBase | end base in the end exon(0-based coordinate) | (exon length of the endExon) | 300 |
txr.getNames()
Gets a list of all transcripts.
txr.getTxsByExon(formattedExon)
Gets a hash of transcript which has the given exon.
formattedExon is compatible with dna library
chr2:34100214-34101989,-
Returns a hash whose keys are UCSC transcript id and values are the exon number.
txr.getTxsByGene(geneName)
Gets a list of UCSC transcripts whose gene name is geneName.
geneName must be compatible with one written in options.xref file.
txr.getTxsByRefSeqId(refseqId)
Gets a list of UCSC transcripts whose gene name is geneName.
refseqId must be compatible with one written in options.xref file.
TxReader.parseLine(line)
(static) Parses a line from UCSC knownGene file. Returns information object (written in txr.getInfo()).