TY - JOUR
T1 - Classification of Membrane Permeability of Drug Candidates: A Methodological Investigation
AU - Jensen, Berith F.
AU - Refsgaard, Hanne H.F.
AU - Bro, Rasmus
AU - Brockhoff, Per B.
PY - 2005
Y1 - 2005
N2 - A data set consisting of 1040 drug candidates was divided into a training set and test set of 832 and 208 compounds, respectively. The training set was used for estimating a model for classification into two classes with respect to membrane permeation in a cell based assay: 1) apparent permeability below 4 * 10−6 cm/s and 2) apparent permeability on 4 * 10−6 cm/s or higher. Nine molecular descriptors were calculated for each compound and six classification techniques were applied: k-Nearest Neighbor, Linear and Quadratic Discriminant Analysis, Discriminant Adaptive Nearest-Neigbor, Soft Independent Modeling of Class Analogy and Classification Tree. A Discriminant Adaptive Nearest-Neigbor model based on four descriptors: Number of flex bonds, number of hydrogen bond donors, molecular weight and molecular polar surface area was selected as the best model. The selection was based on cross validation and a new weighted classification accuracy measure introduced in this study. In the test set of 208 compounds 9% was not classified. The false positive rate was 0.08 and the sensitivity was 0.76.
AB - A data set consisting of 1040 drug candidates was divided into a training set and test set of 832 and 208 compounds, respectively. The training set was used for estimating a model for classification into two classes with respect to membrane permeation in a cell based assay: 1) apparent permeability below 4 * 10−6 cm/s and 2) apparent permeability on 4 * 10−6 cm/s or higher. Nine molecular descriptors were calculated for each compound and six classification techniques were applied: k-Nearest Neighbor, Linear and Quadratic Discriminant Analysis, Discriminant Adaptive Nearest-Neigbor, Soft Independent Modeling of Class Analogy and Classification Tree. A Discriminant Adaptive Nearest-Neigbor model based on four descriptors: Number of flex bonds, number of hydrogen bond donors, molecular weight and molecular polar surface area was selected as the best model. The selection was based on cross validation and a new weighted classification accuracy measure introduced in this study. In the test set of 208 compounds 9% was not classified. The false positive rate was 0.08 and the sensitivity was 0.76.
KW - Drug candidates
KW - Membrane permeation
KW - k-Nearest Neighbor (k-NN)
KW - Linear Discriminant Analysis (LDA)
KW - Quadratic Discriminant Analysis (QDA)
KW - Discriminant Adaptive Nearest-Neigbor (DANN)
KW - Soft Independent Modeling of Class Analogy (SIMCA)
KW - Drug candidates
KW - Membrane permeation
KW - k-Nearest Neighbor (k-NN)
KW - Linear Discriminant Analysis (LDA)
KW - Quadratic Discriminant Analysis (QDA)
KW - Discriminant Adaptive Nearest-Neigbor (DANN)
KW - Soft Independent Modeling of Class Analogy (SIMCA)
U2 - 10.1002/qsar.200430928
DO - 10.1002/qsar.200430928
M3 - Journal article
SN - 1611-020X
VL - 24
SP - 449
EP - 457
JO - QSAR and Combinatorial Science
JF - QSAR and Combinatorial Science
IS - 4
ER -