@gmod/cram
    TypeScript icon, indicating that this package has built-in type declarations

    1.6.4 • Public • Published

    @gmod/cram

    NPM version Coverage Status Build Status

    Read CRAM files (indexed or unindexed) with pure JS, works in node or in the browser.

    • Reads CRAM 3.x and 2.x (3.1 added in v1.6.0)
    • Does not read CRAM 1.x
    • Can use .crai indexes out of the box, for efficient sequence fetching, but also has an index API that would allow use with other index types
    • Does implement bzip2 but not lzma codecs (yet); if this is important to your use case, please file an issue

    Install

    $ npm install --save @gmod/cram
    # or
    $ yarn add @gmod/cram

    Usage

    const { IndexedCramFile, CramFile, CraiIndex } = require('@gmod/cram')
    
    // Use indexedfasta library for seqFetch, if using local file (see below)
    const { IndexedFasta, BgzipIndexedFasta } = require('@gmod/indexedfasta')
    
    // this uses local file paths for node.js for IndexedFasta, for usages using
    // remote URLs see indexedfasta docs for filehandles and
    // https://github.com/gmod/generic-filehandle
    const t = new IndexedFasta({
      path: '/filesystem/yourfile.fa',
      faiPath: '/filesystem/yourfile.fa.fai',
    })
    
    // example of fetching records from an indexed CRAM file.
    // NOTE: only numeric IDs for the reference sequence are accepted.
    // For indexedfasta the numeric ID is the order in which the sequence names
    // appear in the header
    
    // Wrap in an async and then run
    run = async () => {
      const idToName = []
      const nameToId = {}
    
      // example opening local files on node.js
      // can also pass `cramUrl` (for the IndexedCramFile class), and `url` (for
      // the CraiIndex) params to open remote URLs
      //
      // alternatively `cramFilehandle` (for the IndexedCramFile class) and
      // `filehandle` (for the CraiIndex) can be used,  see for examples
      // https://github.com/gmod/generic-filehandle
    
      const indexedFile = new IndexedCramFile({
        cramPath: '/filesystem/yourfile.cram',
        //or
        //cramUrl: 'url/to/file.cram'
        //cramFilehandle: a generic-filehandle or similar filehandle
        index: new CraiIndex({
          path: '/filesystem/yourfile.cram.crai',
          // or
          // url: 'url/to/file.cram.crai'
          // filehandle: a generic-filehandle or similar filehandle
        }),
        seqFetch: async (seqId, start, end) => {
          // note:
          // * seqFetch should return a promise for a string, in this instance retrieved from IndexedFasta
          // * we use start-1 because cram-js uses 1-based but IndexedFasta uses 0-based coordinates
          // * the seqId is a numeric identifier, so we convert it back to a name with idToName
          // * you can return an empty string from this function for testing if you want, but you may not get proper interpretation of record.readFeatures
          return t.getSequence(idToName[seqId], start - 1, end)
        },
        checkSequenceMD5: false,
      })
      const samHeader = await indexedFile.cram.getSamHeader()
    
      // use the @SQ lines in the header to figure out the
      // mapping between ref ref ID numbers and names
    
      const sqLines = samHeader.filter(l => l.tag === 'SQ')
      sqLines.forEach((sqLine, refId) => {
        sqLine.data.forEach(item => {
          if (item.tag === 'SN') {
            // this is the ref name
            const refName = item.value
            nameToId[refName] = refId
            idToName[refId] = refName
          }
        })
      })
    
      const records = await indexedFile.getRecordsForRange(
        nameToId['chr1'],
        10000,
        20000,
      )
      records.forEach(record => {
        console.log(`got a record named ${record.readName}`)
        if (record.readFeatures != undefined) {
          record.readFeatures.forEach(({ code, pos, refPos, ref, sub }) => {
            // process the read features. this can be used similar to
            // CIGAR/MD strings in SAM. see CRAM specs for more details.
            if (code === 'X') {
              console.log(
                `${record.readName} shows a base substitution of ${ref}->${sub} at ${refPos}`,
              )
            }
          })
        }
      })
    }
    
    run()
    
    // can also pass `cramUrl` (for the IndexedCramFile class), and `url` (for the CraiIndex) params to open remote URLs
    // alternatively `cramFilehandle` (for the IndexedCramFile class) and `filehandle` (for the CraiIndex) can be used,  see for examples https://github.com/gmod/generic-filehandle

    You can use cram-js without NPM also with the cram-bundle.js. See the example directory for usage with script tag

    API (auto-generated)

    CramRecord

    Table of Contents

    CramRecord

    Class of each CRAM record returned by this API.

    isPaired

    Returns boolean true if the read is paired, regardless of whether both segments are mapped

    isProperlyPaired

    Returns boolean true if the read is paired, and both segments are mapped

    isSegmentUnmapped

    Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired

    isMateUnmapped

    Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired

    isReverseComplemented

    Returns boolean true if the read is mapped to the reverse strand

    isMateReverseComplemented

    Returns boolean true if the mate is mapped to the reverse strand

    isRead1

    Returns boolean true if this is read number 1 in a pair

    isRead2

    Returns boolean true if this is read number 2 in a pair

    isSecondary

    Returns boolean true if this is a secondary alignment

    isFailedQc

    Returns boolean true if this read has failed QC checks

    isDuplicate

    Returns boolean true if the read is an optical or PCR duplicate

    isSupplementary

    Returns boolean true if this is a supplementary alignment

    isDetached

    Returns boolean true if the read is detached

    hasMateDownStream

    Returns boolean true if the read has a mate in this same CRAM segment

    isPreservingQualityScores

    Returns boolean true if the read contains qual scores

    isUnknownBases

    Returns boolean true if the read has no sequence bases

    getReadBases

    Get the original sequence of this read.

    Returns String sequence basepairs

    getPairOrientation

    Get the pair orientation of a paired read. Adapted from igv.js

    Returns String of paired orientatin

    addReferenceSequence

    Annotates this feature with the given reference sequence basepair information. This will add a sub and a ref item to base subsitution read features given the actual substituted and reference base pairs, and will make the getReadSequence() method work.

    Parameters
    • refRegion object

    • compressionScheme CramContainerCompressionScheme

    Returns undefined nothing

    ReadFeatures

    The feature objects appearing in the readFeatures member of CramRecord objects that show insertions, deletions, substitutions, etc.

    Static fields

    • code (character): One of "bqBXIDiQNSPH". See page 15 of the CRAM v3 spec for their meanings.
    • data (any): the data associated with the feature. The format of this varies depending on the feature code.
    • pos (number): location relative to the read (1-based)
    • refPos (number): location relative to the reference (1-based)

    IndexedCramFile

    Table of Contents

    constructor

    Parameters
    • args object

      • args.cram CramFile
      • args.index Index-like object that supports getEntriesForRange(seqId,start,end) -> Promise[Array[index entries]]
      • args.cacheSize number? optional maximum number of CRAM records to cache. default 20,000
      • args.fetchSizeLimit number? optional maximum number of bytes to fetch in a single getRecordsForRange call. Default 3 MiB.
      • args.checkSequenceMD5 boolean? default true. if false, disables verifying the MD5 checksum of the reference sequence underlying a slice. In some applications, this check can cause an inconvenient amount (many megabases) of sequences to be fetched.

    getRecordsForRange

    Parameters
    • seq number numeric ID of the reference sequence
    • start number start of the range of interest. 1-based closed coordinates.
    • end number end of the range of interest. 1-based closed coordinates.
    • opts (optional, default {})

    hasDataForReferenceSequence

    Parameters

    Returns Promise true if the CRAM file contains data for the given reference sequence numerical ID

    CramFile

    Table of Contents

    constructor

    Parameters
    • args object

      • args.filehandle object? a filehandle that implements the stat() and read() methods of the Node filehandle API https://nodejs.org/api/fs.html#fs_class_filehandle
      • args.path object? path to the cram file
      • args.url object? url for the cram file. also supports file:// urls for local files
      • args.seqFetch function? a function with signature (seqId, startCoordinate, endCoordinate) that returns a promise for a string of sequence bases
      • args.cacheSize number? optional maximum number of CRAM records to cache. default 20,000
      • args.checkSequenceMD5 boolean? default true. if false, disables verifying the MD5 checksum of the reference sequence underlying a slice. In some applications, this check can cause an inconvenient amount (many megabases) of sequences to be fetched.

    containerCount

    CraiIndex

    Table of Contents

    constructor

    Parameters

    hasDataForReferenceSequence

    Parameters

    Returns Promise true if the index contains entries for the given reference sequence ID, false otherwise

    getEntriesForRange

    fetch index entries for the given range

    Parameters

    Returns Promise promise for an array of objects of the form {start, span, containerStart, sliceStart, sliceBytes }

    CramUnimplementedError

    Extends Error

    Error caused by encountering a part of the CRAM spec that has not yet been implemented

    CramMalformedError

    Extends CramError

    An error caused by malformed data.

    CramBufferOverrunError

    Extends CramMalformedError

    An error caused by attempting to read beyond the end of the defined data.

    Academic Use

    This package was written with funding from the NHGRI as part of the JBrowse project. If you use it in an academic project that you publish, please cite the most recent JBrowse paper, which will be linked from jbrowse.org.

    License

    MIT © Robert Buels

    Install

    npm i @gmod/cram

    DownloadsWeekly Downloads

    570

    Version

    1.6.4

    License

    MIT

    Unpacked Size

    1.02 MB

    Total Files

    281

    Last publish

    Collaborators

    • teresam856
    • nathandunn
    • rbuels
    • enuggetry
    • cmdcolin
    • garrettjstevens