In silico classification of solubility using binary k-nearest neighbor and physicochemical descriptors

B.F. Jensen, Per B. Brockhoff, C. Vind, S.B. Padkjær, H.H. Refsgaard

Publikation: Artikel i tidsskrift og konference artikel i tidsskriftTidsskriftartikelForskningpeer review

Abstract

Solubility is a key property that highly affects both the absorption and screening efficiency of drug candidates. Here, we present an in silico k-nearest neighbor model applying ten physicochemical descriptors which classify solubility based on a diverse set of chemical drug candidates. Data used for modeling the solubility consisted of turbidimetric solubility values at pH 7.4 for 518 drug candidates. Data were divided into a training set and a test set of 389 and 129 compounds, respectively, which additionally were binned into two classes with respect to the solubility: insoluble class: solubility ≤0.02 mg/mL and soluble class: solubility >0.02 mg/mL. Furthermore, a structural fragment analysis of soluble versus insoluble compounds was performed, and structural fragments and functional groups for which we found statistical difference in frequency between the two solubility classes were presented. Of the ten descriptors used for modeling, clog D was found to be the descriptor that separated the two solubility classes most efficiently. When the test set was classified, 84% were predicted to the right class. Validated with 12 soluble marketed drugs, the model predicted 11 of the 12 compounds to the right class. We found that the solubility model could be used to flag molecules with low solubility in an early stage of discovery projects.
OriginalsprogEngelsk
TidsskriftQSAR and Combinatorial Science
Vol/bind26
Udgave nummer4
Sider (fra-til)452-459
ISSN1611-020X
DOI
StatusUdgivet - 2007
Udgivet eksterntJa

Fingeraftryk

Dyk ned i forskningsemnerne om 'In silico classification of solubility using binary k-nearest neighbor and physicochemical descriptors'. Sammen danner de et unikt fingeraftryk.

Citationsformater