Galois Tech Talk: An Analysis of Analysis
Presented by Charles Parker
A basic problem in computer science is binary classification, in which an algorithm applies a binary label to data based on the presence or absence of some phenomenon. Problems of this type abound in areas as diverse as computational biology, multimedia indexing, and anomaly detection. Evaluating the performance of a binary labeling algorithm is itself a complex task, often based on a domain-dependent notion of the relative cost of "false positives" versus "false negatives". As these costs are often not available to researchers or engineers, a number of methods are used to provide a cost-independent analysis of performance. In this talk, I will examine a number of these methods both theoretically and experimentally. The presented results suggest a set of best practices for evaluating binary classification algorithms, while questioning whether a cost-independent analysis is even possible.