Models for the preprocessing of reverse phase protein arrays
Neeley, E. Shannon
Baggerly, Keith A.
Doctor of Philosophy
Reverse-phase protein lysate arrays (RPPA) are becoming important tools for the analysis of proteins in biological systems. RPPAs combine current assays for detecting and measuring proteins with the high-throughput technology of microarrays. Protein level assays have the ability to address questions about signaling pathways and post translational modifications that genomic assays alone cannot answer. The importance of preprocessing microarray data has been shown in a variety of contexts over the years and many of the same issues carry over to RPPAs including spot level correction, quantification, and normalization. In this thesis, we develop models and tools to improve upon the standard methods for preprocessing RPPA data. In particular, at the spot level, we suggest alternative methods for estimating background signal when the default estimates are compromised. Further, we introduce a multiplicative adjustment at the spot level, modeled with a smoothed surface of the positive control spots, that removes spatial bias better than additive-only models. When mutli-level information is available for the positive controls, a method that builds nested surfaces at the positive control levels further decreases spatial bias. At the quantification level, we outline a newly developed R-package called SuperCurve. This package uses a model that borrows strength from all samples on an array to estimate both an over all dose-response curve and individuals estimates of relative sample protein expression. SuperCurve is easy to implement and is compatible with the latest version of R. Finally, we introduce a normalization model called Variable Slope (VS) normalization that corrects for sample loading bias, taking into account the fact that expression estimates are computed separately for each array. Previous normalization models fail to account for this feature, potentially adding more variability to the expression measurements. VS normalization is shown to recover true correlation structure better than standard methods. As processing methods for RPPA data improve, this technology helps identify proteomic signatures that are unique to subtypes of disease and can eventually be applied to personalized therapy.
Biology; Biostatistics; Statistics; Bioinformatics; Physics