Wednesday, September 2, 2009

What's the relationship between Machine Learning and Data-Mining

Machine Learning and Data-Mining are extremely related but it isn't clear for most people. I'll try to clarify the link in this short blog.

Let's start with definitions:
  • Data-Mining (DM) is the process of extracting patterns from data. The main goal is to understand relationships, validate models or identify unexpected relationships.
  • Machine Learing (ML) algorithms allows computer to learn from data. The learning process consist of extracting the patterns but the end goal is to use the knowledge to do prediction on new data.
Both, in ML and DM, we start by extracting patterns. In DM, the process ends there by looking a the patterns. In ML, we reuse learned patterns to do prediction.

One important difference about patterns extraction is that machine learning algorithms don't need to understand the representation of the patterns but data-miners do. As an example, it is hard to understand exactly what a neural network has learned but decisions tree are easy to understand and compare. On the other hand, comprehensive patterns allows machine learning practitionner to identify data problems and by fixing them, improve the prediction accurary of their model.

So basically, the data-mined patterns learned by any machine learning algos are used to do prediction on new data.

Some people might simply say that they are the same, the only difference is how you use the learned patterns: to understand or to predict.

note:
Unsupervised learning can be considered has data-mining because it doesn't involve prediction. In order to understand discovered clusters difference, we can simply use supervised learning on discovered patterns tagged datasets.
Share on Reddit!!!