LW-FQZip

Permits to compress FASTQ data. LW-FQZip is a lossless light-weight reference-based compression algorithm. The data are first split into metadata, short reads and quality scores, respectively and then processes independently with different schemes. The software is equipped with lightweight mapping model, bitwise prediction by partial matching (PPM), arithmetic coding, and multi-threading parallelism. It shows good compatibility to long-read sequencing data and is hoped to provide insights into the storage problems of new sequencing data.

LW-FQZip citation

[…] ata set by either (i) assembling reads into long contigs, typically by a de Bruijn graph-based approach (e.g., Quip, Leon, k-Path, and KIC), or by (ii) aligning the reads to a reference genome (e.g., LW-FQZip); the reads are then encoded as simple pointers to the reference or the assembled contigs.Because both sequence mapping and assembly are computationally intensive tasks, all the above HTS com […]

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China; The Broad Institute, Cambridge, MA, USA; School of Computer Science, University of Birmingham, Birmingham, UK

LW-FQZip funding source(s)

This work was supported in part by the National Natural Science Foundation of China (61471246 and 61205092), Guangdong Foundation of Outstanding Young Teachers in Higher Education Institutions (Yq2013141), Shenzhen Scientific Research and Development Funding Program (JCYJ20130329115450637, KQC201108300045A and ZYC201105170243A) and Guangdong Natural Science Foundation (S2012010009545).

LW-FQZip review

star_borderstar_borderstar_borderstar_borderstar_border

starstarstarstarstar

Anonymous user #383

star_borderstar_borderstar_borderstar_borderstar_border

starstarstarstarstar

2015-12-24, 12:52

Desktop

I tried this tool but the compression doesn't seem to be that great.

I could compress the example fastq of 31M was to 8.6M. This is exactly the same size as if you would use gzip on this file. Using bzip2 the file size is reduced even further to 7.3M.

Additionally when I tried to compress my own fastq files (human) I got he error 'error genome file size'. After solving this issue in a hard coded way, I got a buffer overflow using the LWFQZip -c command.