"The data set consists of 85 compounds, for which Wessel and co-workers gathered %HIA values from the literature. Sixteen of these 85 structures have Caco-2 cell permeability data, originating from Artursson and Karlsson that were used in the QSAR models proposed by Van de Waterbeemd et. al. and by Norinder et. al. In addition, Yazdanian et. al. reported Caco-2 cell permeability data for 29 of these 85 structures. Thus, three experimental measures were used to derive our QSAR model".

131 compounds and their 'Fraction Absorbed' values have been obtained from the table given in the paper. Some compounds had multiple values reported (as obtained from various references). These have been averaged and provided by us in the SDF and AMP files. The "Sanghvi_values.txt" file contains all data (as reported in Table 1) and "Sanghvi_avg_values.txt" contains averaged data for compounds with multiple entries (as reported in Table 2).
"Benserazide, HCl" was retrieved as "Benserazide", "Ceftriaxone, Na" as "Ceftriaxone", "Foscarnet, Na" as "Foscarnet", "L-Dopa" as "Levodopa", "L-Leucine" as "Leucine" and "Triacrilast" as "Tiacrilast". The last compound was found to be mispelt as checked from the original reference.

A total of 86 drugs and their %HIA (Human Intestinal Absorption) values taken from the table given in the paper.
Compound "trovoflaxicin" was retrieved as "trovafloxacin", "acetylsalicylic acid" as "aspirin", and "phenoxymethylpenicillinic acid" as "penicillinv" from ChemIDplus. Fragment removed from "timolol maleate" during "wash". AMP file has a "Label" column indicating 9 compounds used as a "Cross-Validation Set" and 10 compounds used as an "External Prediction Set" by the authors. Rest of the compounds are labeled as "Training Set".

A total of 55 drugs and their human % absorption values have been made available by the authors. One compound, "0311C90", was obtained by authors from Glaxo-Wellcome and we have not provided its structure or values here in our files. Compounds "acebutolol HCl", "alprenolol HCl", "bupropion HCl", "labetatol HCl", "oxyprenolol HCl", "propranolol HCl", "ranitidine HCl", and "sotalol HCl" were retrieved as "acebutolol", "alprenolol", "bupropion", "labetatol", "oxyprenolol", "propranolol", "ranitidine" and "sotalol". To the best of our understanding, this should not matter for QSAR modeling purposes. Further, "terbutaline hemisulfate" was retrieved as "terbutaline".

86 drugs and their experimentally-derived Intestinal Absorption (%) values have been given. "Fluconasole" was retrieved as "Fluconazole" and "Hydrocortizone" as "hydrocortisone" from ChemIDplus. Drugs were either part of 'Training and testing data sets' or 'Validation data set'. This scheme has been indicated in a 'Label' column in the AMP file.