SHEAR: Sample Heterogeneity Estimation and Assembly by Reference
SHEAR is a tool for next-generation sequencing data analysis that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis. By utilizing structural variant detection algorithms, SHEAR also offers improved performance in the form of a stronger ability to handle difficult structural variant types and improved computational efficiency.
For more information, please refer to our article in BMC Genomics.
Please contact landman@cs.umn.edu with any questions, comments, or concerns.
Please cite the following article:
- Landman SR, Hwang TH, Silverstein KAT, Li Y, Dehm SM, Steinbach M, and Kumar V. SHEAR: sample heterogeneity estimation and assembly by reference. BMC Genomics 2014, 15:84.
Download SHEAR (v1.1.2)
SHEAR has been downloaded 364 times.
Changelog and Previous Versions
- v1.1.2 (November 21 2015)
- Added support for SVs that have breakpoints at the ends of chromosomes.
- Fixed a bug that would cause *.vcf files from DELLY output to be combined with duplicate headers.
- Fixed a bug that would cause SHEAR to crash due to memory problems when trying to output very large SDI files.
- Fixed a bug that would sometimes delete prediction files provided via the --preds argument after parsing them.
- v1.1.1 (October 28 2015)
- Added a new parameter (i.e. --realign-unmapped) to use unmapped reads during local realignment, which was previously done automatically but is now off by default.
- Improve efficiency in processing very large BAM files.
- Fixed bugs that would throw errors if SV breakpoints occurred too close to the edge of a FASTA region.
- v1.1.0 (October 17 2015)
- Breakpoint microhomology is now reported for SVs (i.e. the number of identical bases at the breakpoints of the variant). When there is ambiguity in what the SV coordinates are due to microhomology, SHEAR will report the 5'-most location as well as the number of bases that are identical.
- Fixed heterogeneity estimation when using DELLY for SVs that have breakpoint microhomology.
- Fixed SV filtering log messages so that the true source of a false positive is reported instead of pointing to a different false positive.
- v1.0.0 (September 29 2015)
- Local realignment pipeline now uses BWA-MEM rather than BWA-SW. BWA-SW can still be used via the --align-algo option.
- SV prediction is now done using DELLY algorithm. CREST can still be used via the --sv-algo option.
- Numerous improvements to the read extraction and local realignment pipeline, such as extracting larger regions surrounding the breakpoints while still only realigning those reads very close to the breakpoints.
- Iterative realignment now continues until no new unique SVs are predicted, rather than until no new SV breakpoints are predicted.
- Lots of general performance improvements.
- v0.2.14 (March 15 2015)
- Fixed a bug that would throw an error if SV breakpoints occurred too close to the edge of a FASTA region.
- v0.2.13 (December 29 2014)
- Now compatible with the latest versions of Picard (i.e. 1.124+).
- Fixed a bug with SDI output when translocations are predicted. These are not supposed to show up in SDI files (although support is coming soon) but were getting inserted as null values.
- Fixed a bug that would throw an error when parsing the -r parameter (i.e. region) for some sequence names.
- General performance improvements.
- v0.2.12 (September 11 2014)
- Now compatible with the latest versions of GATK (i.e. 3.2+). The configure_shear script will detect the version of GATK being used and configure the class paths correctly.
- Changed how SHEAR interfaces with Picard. Note that now Picard is not referenced during SHEAR compilation, but instead the PICARD_DIRECTORY environment variable should be set at runtime. This should solve various problems that would occur when running SHEAR with conflicting versions of GATK and Picard.
- v0.2.11 (July 12 2014)
- Now compatible with the latest versions of Picard (i.e. 1.114+). The configure_shear script will detect the version of Picard being used and configure the class paths correctly.
- Fixed a bug with SHEAR-Assemble that was throwing an error in some situations when it should have been skipping over an overlapping variants.
- v0.2.10 (June 15 2014)
- Added a new parameter (i.e. --min-het) for SHEAR-SV that allows for a minimum threshold for the estimated heterogeneity level in order for variants to be included in the results.
- Fixed a bug with SHEAR-Assemble that was incorrectly processing input FASTA files with whitespace characters in the sequence header lines.
- v0.2.9 (Apr 22 2014)
- Fixed some compilation issues for certain versions of GATK and Picard.
- Fixed a bug that would cause SHEAR to crash when provided with input BAM files containing poorly-formatted CIGAR strings, a known issue with certain versions of BWA.
- Added additional validation for correctly-formatted *.2bit files. This will prevent SHEAR from stalling indefinitely when provided with bad *.2bit files.
- v0.2.8 (Feb 11 2014)
- SHEAR now automatically runs cleanup on BAM input files to fix common formatting errors before processing.
- v0.2.7 (Jan 28 2014)
- Fixed a bug that would result in SNPs not being properly represented in the output SDI files.
- Output report files now contain reference/variant bases for SNPs, deletions, insertions, and inversions.
- v0.2.6 (Nov 14 2013)
- Fixed a bug that would throw an error for certain types of inversion SVs.
- Fixed cleanup of temporary files.
- Fixed a bug related to processing alignments with no detectable SVs.
- v0.2.5 (Oct 30 2013)
- Fixed a minor bug that would throw an error during local realignment for bad SAM flags in BWA-SW output.
- v0.2.4 (Oct 21 2013)
- Fixed minor bugs that would occasionally prevent gfServer from locating a valid open port to use.
- v0.2.3 (Sep 18 2013)
- Improvements to the local re-alignment algorithm. Soft-clipped reads should mostly remain in their original neighborhoods now, rather than jump to the corresponding breakpoint.
- Added a "--preds" option that can specify a pre-existing CREST prediction file (i.e. *.predSV.txt) to be used to begin the SHEAR pipeline. This is useful if CREST has already been run on the alignment, so that that doesn't have to be repeated, or to resume a previous run of SHEAR that exited.
- Minor improvements to output log information.
- v0.2.2 (Sep 9 2013)
- Fixed a bug that was would occasionally cause gfServer to start improperly and hang if two instances of SHEAR were started at nearly the same time on the same node. This was due to both instances of gfServer attempting to reserver the same port number.
- v0.2.1 (Sep 6 2013)
- SHEAR-SV now has support for SNP/INDEL calling via GATK's HaplotypeCaller. SNPs and INDELs will be predicted on the original alignment and their heterogeneity level will be estimated. These will be included in the outputted SDI file to be used by SHEAR-Assemble. SHEAR can be limited to detecting and assembling SVs only via the "--sv-only" option for SHEAR-SV.
- Added an argument to run SHEAR-SV on only a select region or locus in the alignment (e.g. -r chr1:10000-15000).
- Now accepts relative path for 2bit file instead of only absolute path.
- Less clutter outputted in normal (i.e. non-debug) mode.
- Slight speed improvements.
- Various code cleanup, especially with error handling.
- v0.1.2 (Jul 9 2013)
- Fixed a bug that caused problems when two instances of SHEAR were run on the same machine at near the same time due to colliding ports for their respective gfServers. The fix requires UNIX's lsof and pgrep commands to be on the PATH, so Windows support is dropped for now. Hopefully this will be added back in a future version.
- Fixed a bug that caused the SHEAR process to hang after an error if gfServer is still open.
- Read groups are no longer required for input BAM alignment. If read group information is missing it will be automatically added by SHEAR.
- v0.1.1 (May 13 2013)