Dynamic Talks | Big Data meetup
Grab a chance to come to the Grid Dynamics office and listen to Simon Calabrese "From ∆ (delta) to λ (lambda), Using Snowflake" and Bartosz Marszalek "V Is for Value: Building a Vector Search Engine with Minimum Resources".
Speakers: Simon Calabrese is a Big Data Tech Lead and SSL at Grid Dynamics.
How the architecture based on batch using Delta Lake has been migrated to Lambda architecture using snowflake in Azure in the `Food and Beverage Supply` industry.
What we will discuss:
1. How batch load was performed using
- Talend for raw-data ingestion
- Pyspark for transformations
- Azure Data Lake Storage and Delta Lake as Distributed File System
- How Pyspark was used to move data from Delta Lake to Data Warehouse in Snowflake
Bartosz Marszalek is Junior Big Data Developer at Grid Dynamics.
Big Data never starts with “Big”. The talk will give you a notion of how to get started with a Vector Search ML engine with minimum data — the Data of the highest Value.
What we will discuss:
1. 5 Vs — Value
2. Use-case:
- Generations of search engines
- Vector Search
3. Data needed by vector search
- Where to get it from?
- What to do with them?
4. Data pipelines:
- Processing logic
- Modularity: how flexible the data pipeline can be?
- Some architecture details up to model training
Zapisy: https://bit.ly/3hOIePc