@gmod/cram

![NPM version](https://npmjs.org/package/@gmod/cram)
![Coverage Status](https://codecov.io/gh/GMOD/cram-js/branch/master)
![Build Status](https://github.com/GMOD/cram-js/actions?query=branch%3Amaster+workflow%3APush+)

Read CRAM files (indexed or unindexed) with pure JS, works in node or in the
browser.

- Reads CRAM 3.x and 2.x (3.1 added in v1.6.0)
- Does not read CRAM 1.x
- Can use .crai indexes out of the box, for efficient sequence fetching, but
also has an index API that would allow use with other index
types
- Has preliminary support for bzip2 and lzma codecs. lzma requires the latest
@gmod/cram version, and uses webassembly. If you find you are unable to
compile it, you can try downgrading

Install

``bash $ npm install --save @gmod/cram

`or`


$ yarn add @gmod/cram


Usage

`js const { IndexedCramFile, CramFile, CraiIndex } = require('@gmod/cram')

// Use indexedfasta library for seqFetch, if using local file (see below) const { IndexedFasta, BgzipIndexedFasta } = require('@gmod/indexedfasta')

// this uses local file paths for node.js for IndexedFasta, for usages using // remote URLs see indexedfasta docs for filehandles and // https://github.com/gmod/generic-filehandle2 const t = new IndexedFasta({ path: '/filesystem/yourfile.fa', faiPath: '/filesystem/yourfile.fa.fai', })

// example of fetching records from an indexed CRAM file. // NOTE: only numeric IDs for the reference sequence are accepted. // For indexedfasta the numeric ID is the order in which the sequence names // appear in the header

// Wrap in an async and then run run = async () => { const idToName = [] const nameToId = {}

// example opening local files on node.js // can also passcramUrl (for the IndexedCramFile class), and url(for // the CraiIndex) params to open remote URLs // // alternativelycramFilehandle(for the IndexedCramFile class) and //filehandle(for the CraiIndex) can be used, see for examples // https://github.com/gmod/generic-filehandle2

const indexedFile = new IndexedCramFile({ cramPath: '/filesystem/yourfile.cram', //or //cramUrl: 'url/to/file.cram' //cramFilehandle: a generic-filehandle2 or similar filehandle index: new CraiIndex({ path: '/filesystem/yourfile.cram.crai', // or // url: 'url/to/file.cram.crai' // filehandle: a generic-filehandle2 or similar filehandle }), seqFetch: async (seqId, start, end) => { // note: // * seqFetch should return a promise for a string, in this instance retrieved from IndexedFasta // * we use start-1 because cram-js uses 1-based but IndexedFasta uses 0-based coordinates // * the seqId is a numeric identifier, so we convert it back to a name with idToName // * you can return an empty string from this function for testing if you want, but you may not get proper interpretation of record.readFeatures return t.getSequence(idToName[seqId], start - 1, end) }, checkSequenceMD5: false, }) const samHeader = await indexedFile.cram.getSamHeader()

// use the @SQ lines in the header to figure out the // mapping between ref ID numbers and names

const sqLines = samHeader.filter(l => l.tag === 'SQ') sqLines.forEach((sqLine, refId) => { sqLine.data.forEach(item => { if (item.tag === 'SN') { // this is the ref name const refName = item.value nameToId[refName] = refId idToName[refId] = refName } }) })

const records = await indexedFile.getRecordsForRange( nameToId['chr1'], 10000, 20000, ) records.forEach(record => { console.log(got a record named ${record.readName}) if (record.readFeatures != undefined) { record.readFeatures.forEach(({ code, pos, refPos, ref, sub }) => { // process the read features. this can be used similar to // CIGAR/MD strings in SAM. see CRAM specs for more details. if (code === 'X') { console.log(${record.readName} shows a base substitution of ${ref}->${sub} at ${refPos}, ) } }) } }) }

run()

// can also pass cramUrl (for the IndexedCramFile class), and url(for the CraiIndex) params to open remote URLs // alternativelycramFilehandle (for the IndexedCramFile class) and filehandle(for the CraiIndex) can be used, see for examples https://github.com/gmod/generic-filehandle2`

You can use cram-js without NPM also with the cram-bundle.js. See the example directory for usage with script tag

`API (auto-generated)`

- CramRecord - format of CRAM records returned by this API - ReadFeatures - format of read features on records - IndexedCramFile - indexed access into a CRAM file - CramFile - .cram API - CraiIndex - .crai index API - Error Classes - special error classes thrown by this API

`$3`

##### Table of Contents

- CramRecord - Parameters - isPaired - isProperlyPaired - isSegmentUnmapped - isMateUnmapped - isReverseComplemented - isMateReverseComplemented - isRead1 - isRead2 - isSecondary - isFailedQc - isDuplicate - isSupplementary - isDetached - hasMateDownStream - isPreservingQualityScores - isUnknownBases - getReadBases - getPairOrientation - addReferenceSequence - Parameters

#### CramRecord

Class of each CRAM record returned by this API.

##### Parameters

- $0any -$0.flags-$0.cramFlags-$0.readLength-$0.mappingQuality-$0.lengthOnRef-$0.qualityScores-$0.mateRecordNumber-$0.readBases-$0.readFeatures-$0.mateToUse-$0.readGroupId-$0.readName-$0.sequenceId-$0.uniqueId-$0.templateSize-$0.alignmentStart-$0.tags

##### isPaired

Returns boolean true if the read is paired, regardless of whether both segments are mapped

##### isProperlyPaired

Returns boolean true if the read is paired, and both segments are mapped

##### isSegmentUnmapped

Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired

##### isMateUnmapped

Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired

##### isReverseComplemented

Returns boolean true if the read is mapped to the reverse strand

##### isMateReverseComplemented

Returns boolean true if the mate is mapped to the reverse strand

##### isRead1

Returns boolean true if this is read number 1 in a pair

##### isRead2

Returns boolean true if this is read number 2 in a pair

##### isSecondary

Returns boolean true if this is a secondary alignment

##### isFailedQc

Returns boolean true if this read has failed QC checks

##### isDuplicate

Returns boolean true if the read is an optical or PCR duplicate

##### isSupplementary

Returns boolean true if this is a supplementary alignment

##### isDetached

Returns boolean true if the read is detached

##### hasMateDownStream

Returns boolean true if the read has a mate in this same CRAM segment

##### isPreservingQualityScores

Returns boolean true if the read contains qual scores

##### isUnknownBases

Returns boolean true if the read has no sequence bases

##### getReadBases

Get the original sequence of this read.

Returns String sequence basepairs

##### getPairOrientation

Get the pair orientation of a paired read. Adapted from igv.js

Returns String of paired orientatin

##### addReferenceSequence

Annotates this feature with the given reference sequence basepair information. This will add asub and a refitem to base substitution read features given the actual substituted and reference base pairs, and will make thegetReadSequence() method work.

###### Parameters

- refRegionobject -refRegion.startnumber -refRegion.endnumber -refRegion.seqstring

- compressionScheme CramContainerCompressionScheme

Returns undefined nothing

`$3`

The feature objects appearing in the readFeaturesmember of CramRecord objects that show insertions, deletions, substitutions, etc.

#### Static fields

- code (character): One of "bqBXIDiQNSPH". See page 15 of the CRAM v3 spec for their meanings. - data (any): the data associated with the feature. The format of this varies depending on the feature code. - pos (number): location relative to the read (1-based) - refPos (number): location relative to the reference (1-based)

`$3`

##### Table of Contents

- constructor - Parameters - getRecordsForRange - Parameters - hasDataForReferenceSequence - Parameters

#### constructor

##### Parameters

- argsobject -args.cramCramFile -args.indexIndex-like object that supports getEntriesForRange(seqId,start,end) -> Promise\[Array\[index entries]] -args.cacheSizenumber? optional maximum number of CRAM records to cache. default 20,000 -args.checkSequenceMD5boolean? default true. if false, disables verifying the MD5 checksum of the reference sequence underlying a slice. In some applications, this check can cause an inconvenient amount (many megabases) of sequences to be fetched.

#### getRecordsForRange

##### Parameters

- seqnumber numeric ID of the reference sequence -startnumber start of the range of interest. 1-based closed coordinates. -endnumber end of the range of interest. 1-based closed coordinates. -opts**{viewAsPairs: boolean?, pairAcrossChr: boolean?, maxInsertSize: number?}** (optional, default{})

#### hasDataForReferenceSequence

##### Parameters

- seqIdnumber

Returns Promise true if the CRAM file contains data for the given reference sequence numerical ID

`$3`

##### Table of Contents

- containerCount

#### containerCount

Returns **Promise<(number | undefined)>**

`$3`

##### Table of Contents

- constructor - Parameters - hasDataForReferenceSequence - Parameters - getEntriesForRange - Parameters

#### constructor

##### Parameters

- argsobject -args.pathstring? -args.urlstring? -args.filehandle FileHandle?

#### hasDataForReferenceSequence

##### Parameters

- seqIdnumber

Returns Promise true if the index contains entries for the given reference sequence ID, false otherwise

#### getEntriesForRange

fetch index entries for the given range

##### Parameters

- seqIdnumber -queryStartnumber -queryEndnumber

Returns Promise promise for an array of objects of the form{start, span, containerStart, sliceStart, sliceBytes }`

$3

Extends Error

Error caused by encountering a part of the CRAM spec that has not yet been
implemented

$3

Extends CramError

An error caused by malformed data.

$3

Extends CramMalformedError

An error caused by attempting to read beyond the end of the defined data.

Academic Use

This package was written with funding from the NHGRI as
part of the JBrowse project. If you use it in an academic
project that you publish, please cite the most recent JBrowse paper, which will
be linked from jbrowse.org.

License

MIT © Robert Buels

@gmod/cram

Read CRAM files (indexed or unindexed) with pure JS, works in node or in the
browser.

Install

``bash $ npm install --save @gmod/cram

`or`


$ yarn add @gmod/cram


Usage

`js const { IndexedCramFile, CramFile, CraiIndex } = require('@gmod/cram')

// Use indexedfasta library for seqFetch, if using local file (see below) const { IndexedFasta, BgzipIndexedFasta } = require('@gmod/indexedfasta')

// Wrap in an async and then run run = async () => { const idToName = [] const nameToId = {}

// use the @SQ lines in the header to figure out the // mapping between ref ID numbers and names

run()

You can use cram-js without NPM also with the cram-bundle.js. See the example directory for usage with script tag

`API (auto-generated)`

`$3`

##### Table of Contents

#### CramRecord

Class of each CRAM record returned by this API.

##### Parameters

##### isPaired

Returns boolean true if the read is paired, regardless of whether both segments are mapped

##### isProperlyPaired

Returns boolean true if the read is paired, and both segments are mapped

##### isSegmentUnmapped

Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired

##### isMateUnmapped

Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired

##### isReverseComplemented

Returns boolean true if the read is mapped to the reverse strand

##### isMateReverseComplemented

Returns boolean true if the mate is mapped to the reverse strand

##### isRead1

Returns boolean true if this is read number 1 in a pair

##### isRead2

Returns boolean true if this is read number 2 in a pair

##### isSecondary

Returns boolean true if this is a secondary alignment

##### isFailedQc

Returns boolean true if this read has failed QC checks

##### isDuplicate

Returns boolean true if the read is an optical or PCR duplicate

##### isSupplementary

Returns boolean true if this is a supplementary alignment

##### isDetached

Returns boolean true if the read is detached

##### hasMateDownStream

Returns boolean true if the read has a mate in this same CRAM segment

##### isPreservingQualityScores

Returns boolean true if the read contains qual scores

##### isUnknownBases

Returns boolean true if the read has no sequence bases

##### getReadBases

Get the original sequence of this read.

Returns String sequence basepairs

##### getPairOrientation

Get the pair orientation of a paired read. Adapted from igv.js

Returns String of paired orientatin

##### addReferenceSequence

###### Parameters

- refRegionobject -refRegion.startnumber -refRegion.endnumber -refRegion.seqstring

- compressionScheme CramContainerCompressionScheme

Returns undefined nothing

`$3`

The feature objects appearing in the readFeaturesmember of CramRecord objects that show insertions, deletions, substitutions, etc.

#### Static fields

`$3`

##### Table of Contents

- constructor - Parameters - getRecordsForRange - Parameters - hasDataForReferenceSequence - Parameters

#### constructor

##### Parameters

#### getRecordsForRange

##### Parameters

#### hasDataForReferenceSequence

##### Parameters

- seqIdnumber

Returns Promise true if the CRAM file contains data for the given reference sequence numerical ID

`$3`

##### Table of Contents

- containerCount

#### containerCount

Returns **Promise<(number | undefined)>**

`$3`

##### Table of Contents

- constructor - Parameters - hasDataForReferenceSequence - Parameters - getEntriesForRange - Parameters

#### constructor

##### Parameters

- argsobject -args.pathstring? -args.urlstring? -args.filehandle FileHandle?

#### hasDataForReferenceSequence

##### Parameters

- seqIdnumber

Returns Promise true if the index contains entries for the given reference sequence ID, false otherwise

#### getEntriesForRange

fetch index entries for the given range

##### Parameters

- seqIdnumber -queryStartnumber -queryEndnumber

Returns Promise promise for an array of objects of the form{start, span, containerStart, sliceStart, sliceBytes }`

$3

Extends Error

Error caused by encountering a part of the CRAM spec that has not yet been
implemented

$3

Extends CramError

An error caused by malformed data.

$3

Extends CramMalformedError

An error caused by attempting to read beyond the end of the defined data.