ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter (2017)

departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements. We benchmarked ABySS 2.0 human genome assembly using a Genome in a Bottle data set of 250-bp Illumina paired-end and 6-kbp mate-pair libraries from a single individual. Our assembly yielded a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using <35 GB of RAM.

Compacting de Bruijn graphs from sequencing data quickly and in low memory (2016)

an algorithm and a tool bcalm 2 for the compaction of de Bruijn graphs. bcalm 2 is a parallel algorithm that distributes the input based on a minimizer hashing technique, allowing for good balance of memory usage throughout its execution. For human sequencing data, bcalm 2 reduces the computational burden of compacting the de Bruijn graph to roughly an hour and 3 GB of memory. We also applied bcalm 2 to the 22 Gbp loblolly pine and 20 Gbp white spruce sequencing datasets. Compacted graphs were constructed from raw reads in less than 2 days and 40 GB of memory on a single machine. Hence, bcalm 2 is at least an order of magnitude more efficient than other available methods.

Categories: NA

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

NA

Public Dataset List

Below are a lot of public datasets https://www.openml.org/search?type=data http://archive.ics.uci.edu/ml/datasets.php https://www.re3data.org/ https://www.data.gov/ https://www.kdnuggets.com/datasets/index.html http://dataportals.org/   Related

NA

Using NN to perform Genome Assembly

A Machine Learning Approach to DNA Shotgun Sequence Assembly. 2015. DNA FRAGMENT ASSEMBLY USING NEURAL PREDICTION TECHNIQUES. 1999. The main idea is to use NN for read prediction. For the reads with same prediction pattern, we cluster them Read more…

NA

ASA Statement on the Role of Statistics in Data Science

ASA Statement on the Role of Statistics in Data Science The rise of data science, including Big Data and data analytics, has recently attracted enormous attention in the popular press for its spectacular contributions in Read more…

%d bloggers like this: