What does Isolation Forest tell us about outliers

Liger
2 min readJul 6, 2021

--

Photo by Irina Iriser on Unsplash

Ever been to a forest? One thing that always intrigued me is what happens if one gets lost in a dense forest. How can one find their way out of it. The more deep you are in the forest, the more difficult it is. The factor that makes it difficult for a normal person with little to no background is that all the trees looks similar and there are no distinguishing points in his short vision that could help him to find the way out.

Now can we use the same logic to distinguish if an area/spot/tree belongs to the forest. Say one was in the middle of the forest in the first case and somewhere in the outskirts in the second case. Applying the above logic, he would be able to easily find the way out in the latter case (mostly due to some sign board or roads or huts) and hence, probability is less that it belongs to a forest when compared to the middle of the forest

Where am I heading with the above example? I am trying to draw out an analogy on the Isolation Forest algorithm to find the outliers. If you think about it an outliers is similar to someone at the outskirts of a forest. It is easier to distinguish them. One additional point to make is that in case of outliers it will be fewer in number.

Thus Isolation Forest finds out the outliers in the data by finding out how easy it is to identify the record from the whole dataset. Easier it is to identify, more probable it would be that it is an outlier. Advantage of using a Isolation Forest is that you need not find the actual distance, you just need to get a sense of how easy/difficult it is.

--

--

Liger

ML Engineer in making. Have been a part of Data domain for the past 6 years