DataKRK #26: Online incremental learning on streams
We have a special guest this time, Christophe Salperwyck from ABB Kraków who will describe Stream Mining approach:
Statistical learning provides numerous algorithms to build predictive models on past observations. These techniques proved their ability to deal with large scale realistic problems. However, new domains generate more and more data which are only visible once and need to be processed sequentially. These volatile data, known as data streams, come from telecommunication network management, social network, ad servers, web mining... The challenge is to build new algorithms able to learn under these constraints.
First data stream context and constraints will be presented. Then the presentation will be in three parts:
- “Concept drift”: how to deal with distribution changes in streams
- Stream summaries: how to keep past data distribution with low CPU/memory footprint
- Online classifiers: how to build online classifiers on data streams - naive Bayes and Decision Tree classifiers will be presented
Even though supervised learning/classification will be shortly presented, it is preferable to have some basic knowledge. The most complicated formula will be the naive Bayes one, so no worries on the mathematical part :-).