Large scale weak supervision & Scalable recommendations in a hybrid environment
- 18:00 Networking
- 18:30 Large Scale Weak Supervision with Snorkel and Apache Beam by Suneel Marthi
- 19:30 Break
- 19:45 Scalable recommendations in a hybrid environment by Mikolaj Kromka
1. Large Scale Weak Supervision with Snorkel and Apache Beam
The advent of Deep Learning models has led to a massive growth of real-world machine learning. The models models rely on massive hand-labeled training datasets which is a bottleneck in developing and modifying machine learning models.
Most large scale Machine Learning systems today like Google’s DryBell use some form of Weak Supervision to construct lower quality, large scale training datasets that can be used to continuously retrain and deploy models in a real-world scenario.
The challenge with continuous retraining is that one needs to maintain prior state (e.g., the learning functions in case of Weak Supervision or a pre-trained model like BERT or Word2Vec for Transfer Learning) that is shared across multiple streams, while continuously updating the model. Apache Beam’s Stateful Stream processing capabilities are a perfect match here including support for scalable Weak Supervision.
The audience would come away with a better understanding of how Weak Supervision with Apache Beam’s stateful stream processing can be used to accelerate the labeling of training data, and real-time training and update of machine learning models.
Suneel is a Member of Apache Software Foundation and is a Committer and PMC on Apache Mahout, Apache OpenNLP, Apache Stream. He presently works as a Principal Technologist – AI/ML at Amazon Web Services. He’s previously presented at Flink Forward, Hadoop Summit Europe, Berlin Buzzwords, Machine Learning Conference and Apache Big Data in the past. He’s based out of Dulles, Virginia in the Washington DC Metro area.
2. Scalable recommendations in a hybrid environment
How to develop projects using Machine Learning on Big Data? Is it possible to scale Python code and if yes then what is the cost? How to set up cooperation between engineers and data scientists? Answers to these and many other questions will be provided during the presentation based on experience in creating analytical platform for one of the biggest retailers in the world - Tesco. The main theme will be personalised product recommendation system assisting, in real time, millions of British clients.
Mikołaj Kromka is a Software Development Manager at VirtusLab, currently involved in projects using Machine Learning on Big Data, where he helps to parallelize and run the code in hybrid, cluster-cloud environment. Spark and pySpark woskhops trainer. Privately CS PhD student at AGH in Kraków, climber and explorer Cracow's museums.