a tool for detecting structural variations using soft-clipping information



This is a tool for detecting structural variations using soft-clipping information From SAM files.



ClipCrop uses SHRiMP2 or bwa internally.

First you have to install SHRiMP2 or bwa (>=v0.5) and add the binary to PATH env. If you use SHRiMP2, "export SHRIMP_FOLDER=/path/to/SHRiMP2" to your env.

install ClipCrop

ClipCrop is implemented in Node.js.

For users who are used to Node.js, just

$ npm install clipcrop

Of course, in the field of bioinformatics, Node.js is still not a major scripting language,

You should install Node.js by its version manager called nvm.

Do not install Node.js from apt-get or other OS package managers!

$ git clone git:// ~/.nvm

$ source ~/.nvm/

$ nvm install v0.6.1

$ nvm use v0.6.1

$ npm install clipcrop

The installation of Node.js may take a long time, but be patient.

For later use, it is better to write the following lines to your .bashrc (or alternatives).

source ~/.nvm/
nvm use v0.6.1


$ clipcrop <sam file> <reference fasta file> [<fasta information json file>]


sam file SAM file with soft-clipping information. The recommended mapping tool is [bwa](
reference fasta file reference genome used for mapping
fasta information json file (optional) JSON file for [FASTAReader](

This file optional, and is used for faster reading of reference genomes.

See README of FASTAReader for more detail.


dir directory to put result files. default = basename(path)
bp_filter_parallel the number of processes to use to filter breakpoints. default: 8
max_diff max difference within breakpoint cluster values. default: 2
min_cluster_size minimum cluster size to be a valid breakpoint. default: 10
min_quality minimum base quality score to allow, default: 5
bases_around_break number of extended bases around breakpoint to be mapped by clipped sequences. default: 1000
sv_max_diff max difference within breakpoint cluster values. default: 10
sv_min_cluster_size minimum cluster size to be a valid SV. default: 10
bwa_threads the number of threads bwa uses. default: 8


results are formatted as BED format.

#rname  start end type  subtype len score rname2  start2  caller  other
chr1  224199455 224199456 INS * * 38  = * clipcrop  num:158 LR:49/109
rname the name of the chromosome
start start position of the SV events
end end position of the SV events
type SV types (one of DEL, INS, INV, CTX, DUP) CTX : translocation
subtype subtypes of each SV types.
len length of the event
score reliability score of the event. If 0, it cannot be reliable.
rname2 (for translocation) the chromosome of the second breakpoint.
start2 (for translocation) the start position of the second breakpoint.
caller always "clipcrop"
other.num the number of supported sequences of the breakpoint
other.LR the number of L/R clips