Anomaly Detection

Predicting Rare Events

So far, each of the predictions you have made deal with events or labels that are relatively common. However, do traditional predictive models handle rare events equally as well as common events? No, not really. Therefore, the anomaly detection technique has evolved.

A variety of methods have been used for identifying uncommon labels or events. A common technique in the past was to perform a stratified sampling procedure in order to make the uncommon event relatively "more common" in the dataset. For example, only a very small number of credit card transactions are fraudulent (typically much less than one percent). However, those that are truly fraudulent typically represent major losses to a company. Therefore, in order to use the algorithms you learned in prior chapters, you would need to generate a stratified sample so that fraudulent transactions represent about 5 percent of the dataset. However, there are better techniques available more recently that we will cover which involve using more sophisticated methods for prediction that do not rely on common occurances of the dependent variable value.