TY - JOUR

T1 - Classification of Membrane Permeability of Drug Candidates: A Methodological Investigation

AU - Jensen, Berith F.

AU - Refsgaard, Hanne H.F.

AU - Bro, Rasmus

AU - Brockhoff, Per B.

PY - 2005

Y1 - 2005

N2 - A data set consisting of 1040 drug candidates was divided into a training set and test set of 832 and 208 compounds, respectively. The training set was used for estimating a model for classification into two classes with respect to membrane permeation in a cell based assay: 1) apparent permeability below 4 * 10−6 cm/s and 2) apparent permeability on 4 * 10−6 cm/s or higher. Nine molecular descriptors were calculated for each compound and six classification techniques were applied: k-Nearest Neighbor, Linear and Quadratic Discriminant Analysis, Discriminant Adaptive Nearest-Neigbor, Soft Independent Modeling of Class Analogy and Classification Tree. A Discriminant Adaptive Nearest-Neigbor model based on four descriptors: Number of flex bonds, number of hydrogen bond donors, molecular weight and molecular polar surface area was selected as the best model. The selection was based on cross validation and a new weighted classification accuracy measure introduced in this study. In the test set of 208 compounds 9% was not classified. The false positive rate was 0.08 and the sensitivity was 0.76.

AB - A data set consisting of 1040 drug candidates was divided into a training set and test set of 832 and 208 compounds, respectively. The training set was used for estimating a model for classification into two classes with respect to membrane permeation in a cell based assay: 1) apparent permeability below 4 * 10−6 cm/s and 2) apparent permeability on 4 * 10−6 cm/s or higher. Nine molecular descriptors were calculated for each compound and six classification techniques were applied: k-Nearest Neighbor, Linear and Quadratic Discriminant Analysis, Discriminant Adaptive Nearest-Neigbor, Soft Independent Modeling of Class Analogy and Classification Tree. A Discriminant Adaptive Nearest-Neigbor model based on four descriptors: Number of flex bonds, number of hydrogen bond donors, molecular weight and molecular polar surface area was selected as the best model. The selection was based on cross validation and a new weighted classification accuracy measure introduced in this study. In the test set of 208 compounds 9% was not classified. The false positive rate was 0.08 and the sensitivity was 0.76.

KW - Drug candidates

KW - Membrane permeation

KW - k-Nearest Neighbor (k-NN)

KW - Linear Discriminant Analysis (LDA)

KW - Quadratic Discriminant Analysis (QDA)

KW - Discriminant Adaptive Nearest-Neigbor (DANN)

KW - Soft Independent Modeling of Class Analogy (SIMCA)

KW - Drug candidates

KW - Membrane permeation

KW - k-Nearest Neighbor (k-NN)

KW - Linear Discriminant Analysis (LDA)

KW - Quadratic Discriminant Analysis (QDA)

KW - Discriminant Adaptive Nearest-Neigbor (DANN)

KW - Soft Independent Modeling of Class Analogy (SIMCA)

U2 - 10.1002/qsar.200430928

DO - 10.1002/qsar.200430928

M3 - Journal article

SN - 1611-020X

VL - 24

SP - 449

EP - 457

JO - QSAR and Combinatorial Science

JF - QSAR and Combinatorial Science

IS - 4

ER -