Show simple item record

dc.contributor.authorHu, Chenyue W.
Kornblau, Steven M.
Slater, John H.
Qutub, Amina A.
dc.date.accessioned 2015-11-09T18:52:17Z
dc.date.available 2015-11-09T18:52:17Z
dc.date.issued 2015
dc.identifier.citation Hu, Chenyue W., Kornblau, Steven M., Slater, John H., et al.. "Progeny Clustering: A Method to Identify Biological Phenotypes." Scientific Reports, 5, (2015) http://dx.doi.org/10.1038/srep12894.
dc.identifier.urihttps://hdl.handle.net/1911/82035
dc.description.abstract Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset.
dc.language.iso eng
dc.rights This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articleメs Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material.
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.title Progeny Clustering: A Method to Identify Biological Phenotypes
dc.type Journal article
dc.contributor.funder National Science Foundation
dc.contributor.funder Leukemia and Lymphoma Society
dc.contributor.funder Howard Hughes Medical Institute
dc.citation.journalTitle Scientific Reports
dc.citation.volumeNumber 5
dc.contributor.publisher Nature Publishing Group
dc.type.dcmi Text
dc.identifier.doihttp://dx.doi.org/10.1038/srep12894
dc.identifier.grantID CAREER 1150645 (National Science Foundation)
dc.identifier.grantID 6089 (Leukemia and Lymphoma Society)
dc.identifier.grantID Med-Into-Grad Fellowship (Howard Hughes Medical Institute)
dc.type.publication publisher version


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articleメs Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material.
Except where otherwise noted, this item's license is described as This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articleメs Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material.