Public Dataset List

Below are a lot of public datasets https://www.openml.org/search?type=data http://archive.ics.uci.edu/ml/datasets.php https://www.re3data.org/ https://www.data.gov/ https://www.kdnuggets.com/datasets/index.html http://dataportals.org/  

Genome assembly by Statistician

GAML: genome assembly by maximum likelihood (2015) Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling (2014) ILP-based maximum likelihood genome scaffolding (2014) Toward a statistically explicit understanding of de novo sequence assembly (2013) CGAL: computing genome assembly likelihoods (2013) Denovo likelihood-based measures Read more…

Large genome assembly

ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter (2017) departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements. We benchmarked Read more…

Improve assembly

A comparative evaluation of genome assembly reconciliation tools (2017) benchmarked seven assembly reconciliation tools, namely CISA, GAA, GAM_NGS, GARM, Metassembler, MIX, and ZORRO Despite the inability of these assembly tools to solve the general assembly reconciliation problem, each tool demonstrated some strengths that could Read more…