Jan Hendrik Metzen

My personal blog on python and machine learning

Probability calibration

This post summarizes the new feature of calibrating predicted probabilities of binary and multi-class classifiers, which has been added in the 0.16 release of scikit-learn. It gives several examples, which illustrates both the different properties of under-confident and over-confident classifiers, and how to calibrate those such that they become well-calibrated.

Advice for applying Machine Learning

This post summarizes some recommendations on how to get started with machine learning on a new problem. This includes ways of visualizing your data, choosing a machine learning method suitable for the problem at hand, identifying and dealing with over- and underfitting, dealing with large (read: not very small) datasets, and pros-and-cons of different loss functions.

Compare Classifier Predictions using Reliability Diagrams

Reliability diagrams are useful for checking if the predicted probabilities of a binary classifier are well calibrated. For perfectly calibrated predictions, the curve in a reliability diagram should be as close as possible to the diagonal/identity. This post compares the reliability diagrams of different classfiers on artifical data and discusses the respective pros and cons.