"Log-spline density estimation"

Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

We considered the circular nature of the angular data using trigonometric spline, which was more efficient than the triangulation technique. This general framework also provides comprehensive machinery for clustering, model assessment, or data modeling for groups of protein backbone angles. Specifically, the estimated angular density corresponding to a protein structure has a basis expansion whose coefficients can be used as an input to a clustering algorithm. Furthermore, most of the existing protein classification techniques use sequence and 3D structure comparison to classify the proteins based on some (dis)similarity scores obtained after pairwise alignments. The proposed method is an alignment-free procedure that provides a vector of coefficients (i.e., features) associated with each structure (density) that can be directly used to classify the proteins. This general framework also provides a comprehensive means for assessing clustering models for various other data groups with circular nature. We also developed a shiny web application available at https://pscde-t.shinyapps.io/PSCDE-T/) that can be used by the research community to reproduce the results in this paper and estimate Ramachandran distributions collectively

Collective Estimation of Multiple Bivariate Density Functions With Application to Angular-Sampling-Based Protein Loop Modeling

© 2016 American Statistical Association. This article develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by …