Skip to content

2. Introduction to DBC

spacocha edited this page Mar 18, 2015 · 1 revision

What is distribution-based clustering?

-Distribution-based clustering is a different way of organizing sequence data to maximize the useful information from the data and reduce redundancy.

-It can be applied to any dataset with multiple samples, and is most useful in analyzing data across samples where the abundance of organisms in the samples change fairly dramatically

-It can be used to conservatively identify true sequences in a sample or to conservatively estimate different populations

Different versions of distribution-based clustering

-The original implementation of DBC was slow because it used an inelegant interface with R, which calculated the statistical test

-New implementation in python uses rpy2 interface between python and r which is a stable and elegant interface

-This has increased the speed of the algorithm without any loss in accuracy

-This is the most current version on github