Mastering Polish LLM Development with SpeakLeash

17:00 – 17:15 Registration
17:15 – 18:00 Mastering Polish LLM Development with SpeakLeash: From Data Collection to Model Training
18:00 – 19:00 Networking
Join us for an engaging evening focused on the development of Polish Large Language Models (LLMs) with SpeakLeash, hosted at the Sabre Office in Krakow. This event is perfect for researchers, developers, and AI enthusiasts interested in the intricacies of creating impactful language models.
Abstract:
Building a Polish Large Language Model (LLM) demands a strategic approach from data collection to model training. Join us for a comprehensive overview of the essential steps involved in creating a Polish LLM model. We will delve into the intricacies of sourcing relevant data, preprocessing techniques, model architecture selection, training methodologies, and evaluation metrics. By navigating through each stage meticulously, we will share our experience and valuable insights about crafting efficient data-gathering pipelines and dealing with data itself to make it feasible for training or fine-tuning purposes. Whether for research, business, or societal applications, we will equip you with the knowledge needed to embark on the journey of developing impactful LLMs tailored to the Polish language and culture.
Speaker:
Krzysztof Wróbel, Master of Science in Computer Science, is an expert in Natural Language Processing (NLP) with extensive experience at companies such as SpeakLeash, NameHash Labs, Enelpol, and Cognitum. He has been a Research and Teaching Assistant at Jagiellonian University since 2013 and has contributed to numerous R&D projects funded by The National Centre for Research and Development.
Krzysztof has published widely on AI and NLP, presenting at conferences like FedCSIS and PolEval. He has developed popular open-source Polish language taggers (KRNNT, KFTT) and is the author of a top-performing language model in the KLEJ benchmark and the Open PL LLM Leaderboard on Hugging Face.
About SpeakLeash:
SpeakLeash /ˈspix.lɛʂ/ also known as Spichlerz, is a new initiative to create a Polish Large Language Model (LLM). These models, based on transformers, have many applications and are used for generating and processing natural language.
Our goal is to build new datasets and catalog existing ones to provide researchers with the ability to conduct state-of-the-art language modeling research. The datasets developed within the SpeakLeash framework are provided with manifests describing licensing and containing statistics to ensure better alignment with ongoing research.
We look forward to seeing you at the Sabre Office in Krakow for an evening of learning, networking, and advancing your knowledge in Polish LLM development. Don’t miss this opportunity to connect with experts and like-minded professionals. See you there!