|Optimizing Learning Procedures Using Surprisal, a seminar by Gonçalo S. Martins|
Modern machine learning techniques require large amounts of labelled data in order to achieve interesting results. In this talk we present a novel methodology for coping with this issue, by reducing the amount of training data fed to a classifier while maintaining acceptable classification performance. We have investigated the correlation between the surprisal measurement and the classifier’s learning curve, and have developed a technique to filter non-informative samples from a dataset. Our methodology was tested in Bayesian classifiers in simulated trials and using a previously-established action description technique. Results show that, in these conditions, we can effectively use the surprisal measurement to drastically reduce the necessary training data while maintaining acceptable classifier performance.