DataKRK #43 | DuckDB, Polars and k8s for Big Data and ML workloads

Join us for the 43rd DataKRK: Tuesday, 25 Jun, 6PM @ HEVRE, Krakow
Discover the future of data analytics with presentations by Przemek Maciołek on DuckDB and Polars and Karol Gongola on Kubernetes for batch processing. As a special treat, we'll tune into the Polish representation match right after the presentations to catch the exciting finish together!
Agenda:
DuckDB and Polars - my laptop is faster than your data warehouse by Przemek Maciołek
It's a great time to work with data. We not only have some big mature products available but also some exciting new projects, which allow us to process large volumes of data very efficiently. Among them, DuckDB and Polars standout. In his talk, Przemek will describe what these are, how they compare and if they can replace your Pandas, Snowlake or Databricks setup. A demonstration of running heavy analytics, fully in the web browser will follow up.
Kubernetes Journey: From Microservices to Batch Processing (Big Data, ML) Platform by Karol Gongola
Originally designed for microservice architectures, Kubernetes has evolved into a robust platform which can be also used for batch processing like Big Data or ML workloads. This shift raises pivotal questions: Should Kubernetes be leveraged as a universal resource manager for various workloads? What considerations are essential for optimising batch processing on Kubernetes? This presentation will explore these questions, delving into crucial factors for running batch jobs efficiently and showcasing essential tools that facilitate this transition.
Speakers:
Przemek Maciołek | Founding Engineer at Motif Analytics
Przemek is a Founding Engineer at Motif Analytics (his 5th startup so far, the previous one got acquired by Sumo Logic). He has broad experience in architecting data platforms and believes in the value of selecting the right tool for the job
Karol Gongola | Staff Data Engineer at VirtusLab
Karol is a seasoned data engineer at VirtusLab. He has extensive experience in using on-premises Spark clusters and orchestrating data workflows. His background includes spearheading a transition from Oozie to Airflow and migrating to Spark on Kubernetes. Passionate about infrastructure technologies, Karol continually experiments with systems like Kubernetes, Ceph, Kubeflow, and Mlflow in his personal homelab, aiming to push the boundaries of what these tools can achieve in real-world scenarios.