A data set that contains information about compounds used in drug discovery. 
      Specifically, this data set consists of 5631 compounds on which an in-house
      solubility screen (ability of a compound to dissolve in a water/solvent mixture) was performed.      Based on this screen, compounds were categorized as either insoluble (n=3493) or soluble (n=2138).  
     Then, for each compound, 72 continuous, noisy structural
     descriptors were computed.