utility functions to handle DNA/RNA string data


Utility functions to handle DNA/RNA strings in bioinformatics. (Node.js)

git clone git://github.com/shinout/dna.git


npm install dna
  • dna.complStrand(str, rev, rna)
  • dna.getRandomSeq(len, rna)
  • dna.getFormat(chr, start, end, strand)
  • dna.parseFormat(str, objFormat)
  • dna.writeFasta(rname, seq, wstream, num)
  • dna.padN(seq, len, options)
  • dna.writeFastq(name, seq, qual, wstream)
  • dna.getPosLen(start, end)
  • dna.getStartEnd(pos, len)
  • dna.numberize()

Gets a complementary strand of str.

If rev is true, reverse the sequence. (5' -> 3').

If rna is true, T -> U .

dna.complStrand("ACCTG") // TGGAC
dna.complStrand("ACCTG", true) // CAGGT
dna.complStrand("ACCTG", null, true) // UGGAC
dna.complStrand("ACCTG", true, true) // CAGGU

Gets a random sequence with length len.

If rna is true, T -> U.

Gets a formatted position expression by four information.

Here is the formatted position expression.

<reference name>:<0 based start>-<0 based end>,<strand>

So this function returns


Parses str as a formatted position expression.

if objFormat is true, returns

{ chr    : "chr13",
  start  : 12345678,
  end    : 13456789,
  strand : '-'

By default, returns an array.

["chr13", 12345678, 13456789, '-']

Writes fasta to wstream.

rname is the reference name. Required.

seq is the sequence to write. Required.

wstream is a writable stream. By default, process.stdout.

num is number to fold. By default, 50.

Returns nothing.

dna.writeFasta("title", "GCTTCAA");
// >title

Pads N till the seq.length becomes len.

seq is a string. len is a number.

options is optional object.

If options.cutIfOver is true, and seq.length > len already, cuts the seq.

If options.left is true, pads Ns to the right of seq. (N are padded at the right by default.)

dna.padN("ACGA", 10) // ACGANNNNNN
dna.padN("ACGA", 3)  // ACGA
dna.padN("ACGA", 3, {cutIfOver: true})  // ACG
dna.padN("ACGA", 10, {left: true}) // NNNNNNACGA

Writes fastq to wstream.

name is the name of the sequence. Required.

seq is the sequence to write. Required.

qual is the base quality of the sequence. qual.length === seq.length.

wstream is a writable stream. By default, process.stdout.

Returns boolean (the wstream is writable or not).

dna.writeFastq("title", "GCTTCAA", "IIHHHAI");
// @title
// +

Returns 1-based coordinate position and the length from 0-based coordinate start and end.

Returns an array.

Returns [start+1, end-start]

dna.getPosLen(3, 10) // [4, 7]

Returns 0-based coordinate start and end from 1-based coordinate position and the length.

Returns an array.

Returns [pos-1, pos + len -1]

dna.getStartEnd(3, 10) // [2, 12]

Numberize num.

If Number(num) is NaN, returns _default if given. Otherwise throw an error. name is used to specify which number occurs the error.

dna.numberize("14") // 14
dna.numberize("AA", 10) // 10
dna.numberize("AA", "position") // throws an error: "position is NaN"

Gets the code (id) of a given chromosome

  • chr9 -> 9
  • chr22 -> 22
  • chrX -> 23
  • chrY -> 24
  • chrM -> 25
  • others: -> an exception is thrown.

name is the name of a chromosome.

If nothrow is true and not found, returns the original value.

Gets the canonical names of a given chromosome.

  • 9 -> chr9
  • chromosome22 -> chr22
  • CHRUN_GN000239 -> chrUn_gl000239
  • others: -> throw exception

name is the name of a chromosome

Gets possible reference names from the code

  • 9 -> [9, chr9, Chr9, CHR9, chrom9, Chrom9, CHROM9]

code is a chromosome code or chromosome name

fn is optional. Function with its arguments each rname candidate. Returns some value when rname is valid, Returns false when invalid. If this argument is set, dna.getChromList() returns the value fn returns.