NA

Public Dataset List

Below are a lot of public datasets https://www.openml.org/search?type=data http://archive.ics.uci.edu/ml/datasets.php https://www.re3data.org/ https://www.data.gov/ https://www.kdnuggets.com/datasets/index.html http://dataportals.org/  

By zywang, ago
NA

Using NN to perform Genome Assembly

A Machine Learning Approach to DNA Shotgun Sequence Assembly. 2015. DNA FRAGMENT ASSEMBLY USING NEURAL PREDICTION TECHNIQUES. 1999. The main idea is to use NN for read prediction. For the reads with same prediction pattern, we cluster them into several parts and use the existing assemblers to assemble each part.

By zywang, ago
NA

Annual Review of Statistics and Its Application

Volume 1 I have chosen two examples, contingency table analysis and causal inference, but I could have written similarly about the evolution of statistical thinking associated with time series analysis, again going back to early contributions of Yule), or about spatial statistics or meta-analysis. Over time, we hope to provide comprehensive reviews Read more…

By zywang, ago
NA

Genome assembly by Statistician

GAML: genome assembly by maximum likelihood (2015) Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling (2014) ILP-based maximum likelihood genome scaffolding (2014) Toward a statistically explicit understanding of de novo sequence assembly (2013) CGAL: computing genome assembly likelihoods (2013) Denovo likelihood-based measures for comparing genome assemblies (2013) ALE: a generic assembly likelihood Read more…

By zywang, ago
NA

Long reads cooperate draft assembly

GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments (2015) Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology (2012) (PBJelly)

By zywang, ago
NA

Large genome assembly

ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter (2017) departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements. We benchmarked ABySS 2.0 human genome assembly using a Genome in a Read more…

By zywang, ago
NA

Improve assembly

A comparative evaluation of genome assembly reconciliation tools (2017) benchmarked seven assembly reconciliation tools, namely CISA, GAA, GAM_NGS, GARM, Metassembler, MIX, and ZORRO Despite the inability of these assembly tools to solve the general assembly reconciliation problem, each tool demonstrated some strengths that could lead to algorithmic advances for this problem. For instance, CISA generally was Read more…

By zywang, ago
NA

Genome assembly evaluation

SuRankCo: supervised ranking of contigs in de novo assemblies (2015) A machine learning approach to predict quality scores for contigs and to enable the ranking of contigs within an assembly. Information on characteristics of contigs from a de novo assembly are extracted by the SuRankCo-Feature module. These features include common characteristics such as length Read more…

By zywang, ago
NA

de Brujin graph

The present and future of de novo whole-genome assembly Jang-il Sohn and Jin-Wu Nam ‘Table 1. Summary of short read assemblers’ Speed Memory efficiency N50 length Input data type Assembly steps ‘Table 2. Strategies for challenges’ Assembly approach Sequencing error Complexity reducing Repeat resolving Uneven depth RAM memory ‘Table 3. Comparison Read more…

By zywang, ago