#42 WDTT - Lightning Talks with GetInData
We are happy to invite you to the 42nd meetup of our group - Lightning Talks with GetInData and a guest from ING. The meeting will be held online via zoom on May 25th, at 18:00.
Big Data and AI engineering trends which we work with - Krzysztof Zarzycki
2. Data Discovery with Apache Atlas and Amundsen - improving the productivity of users interacting with data.
We will talk about how Amundsen is being used to realise some of data democratisation ideas and how user productivity can be improved by providing a central data discovery service
Mariusz Górski - Senior Data Engineer, Open Source committer and Public Cloud enthusiast. Fan of knowledge sharing. Likes to experiment, break things and fix them - sometimes in a random order. Professionally Tech Lead for Data Assets team within ING WBAA, where he contributes to delivering solutions for data ingestion, analysis and discoverability. Amundsen (Linux Foundation) maintainer.
Dominik Choma - Data Engineer, fan of Open-Source solutions. Particularly interested in stream processing systems. Engaged in the provision and development of analytical solutions, mostly in telco and banking areas.
3. Open-source vs cloud-managed - data engineers dilemmas in the cloud
During the presentations we will try to focus on the company data migration process. We will check and compare two solutions for on-premise MySQL data migration to AWS cloud.
- open-source: MySQL -> Debezium -> Kafka -> Kafka-connect -> AWS RDS
- cloud-native: MySQL -> AWS DMS -> AWS RDS
We will try to answer the question what are advantages and disadvantages of both solutions and which one of them to use and when.
Marcin Kacperek - AWS Senior BigData Engineer focused mainly in serverless data processing in AWS cloud. Fan of easy solutions for complicated problems, working with data and DevOps principles. Old boy C64 and Amiga programmer and huge fan of computer games. Interested in Salsa dancing and Stock market trading... but not at the same time :)
4. Networks! project - real-time monitoring that controls 50% of mobile network in Poland
The ability to analyse data in real time for mobile network monitoring is crucial for diagnostics and ensuring the quality of the service for end customers. During the talk we will show how we used Flink to build a stream analytics platform, this time on-premise.
Maciej Bryński - Big Data Architect, Apache Spark Instructor and Contributor. Big fan of other Big Data technologies including Apache Flink, Kafka, Cassandra and many more. In addition to designing Big Data systems and processes on a daily basis, Maciej also possesses hands-on expertise with tools needed to create those systems from scratch.
Rafał Małanij - a tech enthusiast who was always amazed by data and what insight we can get out of it. He started his career in IT over 13 years as a tech data warehousing guy, then moved to Project Management and Business Development. Currently he is involved in Data analytics and Cloud projects.
5. Kedro, data scientist’s swiss-army knife
Every expert working in the Data Science area uses their own set of favorite libraries and methods, but we all share a common methodology - first you clean the data, then split it into training and testing, finally you fit the model and evaluate it. Kedro aims to structure the way of writing ML code by providing an abstraction on pipeline construction and data access. During the short demo, I will show you how we adapted Kedro principles in several projects and enhanced its functionalities by open-sourcing plugins providing Kubeflow and Airflow support.
Mariusz Strzelecki - Machine-Learning-focused data engineer, wrangling small and huge datasets since 2015. Proud father of three :D
6. Closing - Krzysztof Zarzycki