Tools for working with the illumina2bam fork located at https://github.com/staliv/illumina2bam. The fork is modified for the Lund oncology research department workflow.
Requires Node.js. In order to install you can use:
git clone https://github.com/joyent/node.git cd node/ git checkout v0.6.19 //for example ./configure make make install
Now you have node and npm setup, continue to install illumina2bam-tools:
sudo npm install illumina2bam-tools -g
git clone http://github.com/staliv/illumina2bam-tools.git cd illumina2bam-tools sudo npm install . -g
This will create links in the
/usr/local/bin/ directory for the existing tools in this package.
- specify which directory contains the distributed jars from the illumina2bam project
- the scripts will automatically search for the jars by looking recursively one folder "up" from your current working directory (when invoking for example the
- the scripts will try to write to the
settings.jsonfile but they will fail (somewhat gracefully) as long as the scripts aren't running as a user with write permissions to the
- specify the directory where your jars are located (in the
settings.jsonfile) in order to speed up the process
Wrapper for performing illumina bcl to bam encoding and demultiplexing.
Options: -s, --samplesheet Samplesheet [required] -b, --basecallsDirectory Basecalls directory [required] -o, --outputDirectory Output directory, sub dirs /project/RunID will be created [required] -t, --tempDirectory Temp directory, sub dirs will be created [required] -f Output format [bam|sam], default to 'bam' [default: "bam"] -v, --verbose Verbose output -m Maximum mismatches for a barcode to be considered a match [default: 0] -d Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match [default: 2] -n Maximum allowable number of no-calls in a barcode read before it is considered unmatchable [default: 0] --im Maximum memory heap size for illumina2bam process, defaults to 2g [default: "2g"] --ib Maximum memory heap size for BamIndexDecoder process, defaults to 1g [default: "1g"] --debug Parse the first tile in each lane [default: false] --force Disables check if library already exists, hence overwrites files if they already exist [default: false] --omitLanes Comma separated list with numbers identifying lanes to omit [default: ""] --keepUndetermined Keeps output of undetermined reads. Useful for debugging purposes. [default: false]
- Headers in the samplesheet that have (nn) after the header name will have that id added as meta data with the corresponding value in the read group of the resulting bam.
- Lines that begin with a "#" will be dismissed.
- The ReadString accepts the values:
- I = Index/Barcode
- Y = Bases are read
- N = Bases are skipped
- J = Joker positions in the barcode, useful if one cycle is messed up and you need to mask one of the bases in the barcode and still be able to demultiplex. For example if the barcode is TCTCGCCAT and the second index in the full read of I9Y90N2,I9Y90N2 has a bad cycle in the third position this can be masked with the ReadString I9Y90N2,I2J1I6Y90N2. The full barcode in the underlying matching algorithm will then be TCTCGCCATTCJCGCCAT. Now the IndexDecoder, which is a part of the subsequent splitting process, masks the specified base before counting mismatches on the barcode and determines if the barcode is a match or not.
Example samplesheet, values are seperated by one tab:
#FCID Lane Index Library Sample Pool (po) Project (pr) Protocol (lp) Isize Control Operator (op) ReadString Concentration Priority Sequencing_Center Description FCID 5 TCTCGCCAT Lib1 Sample1 projectName protocolName 500 N staliv I9Y92,I9Y92 12 LuOnk Test run FCID 5 AGATAGGTT Lib1 Sample2 projectName protocolName 500 N staliv I9Y92,I9Y92 12 LuOnk Test run FCID 5 GTCGCTAGT Lib1 Sample3 projectName protocolName 500 N staliv I9Y92,I9Y92 12 LuOnk Test run FCID 5 CAGATATCT Lib1 Sample4 projectName protocolName 500 N staliv I9Y92,I9Y92 12 LuOnk Test run