Logo Crossweb

Logowanie

Nie masz konta? Zapomniałem hasła

Przypomnij hasło

close Wypełnij formularz.
Na Twój adres e-mail zostanie wysłane link umożliwiający zmianę hasła.
Wyślij
To wydarzenie już się odbyło. Sprawdź nadchodzące wydarzenia

[AI Alliance] Introducing Gneissweb: A State-Of-The-Art LLM Pre-training Dataset

Wydarzenie:
[AI Alliance] Introducing Gneissweb: A State-Of-The-Art LLM Pre-training Dataset
Typ wydarzenia:
Spotkanie
Kategoria:
IT
Tematyka:
Data:
06.03.2025 (czwartek)
Godzina:
18:00
Język:
angielski
Wstęp:
Bezpłatne
Miasto:
Opis:

Agenda

  • Quick intro about AI Alliance (5 mins)
  • GneissWeb presentation (40 mins)
  • Q&A (10 mins)
  • Wrapup


Session: Introducing GneissWeb - a state-of-the-art LLM pre-training dataset

At IBM, responsible AI implies transparency in training data: Introducing GneissWeb (pronounced “niceWeb”), a state-of-the-art LLM pre-training dataset with ~10 Trillion tokens derived from FineWeb, with open recipes, results, and tools for reproduction!


In this session we will go over how we created GneissWeb and discuss tools and techniques used. We will provide code examples that you can try at your leisure.

  • > 2% avg improvement in benchmark performance over FineWeb
  • Huggingface page
  • Data prep kit detailed recipe
  • Data prep kit bloom filter for quick reproduction
  • Recipe models for reproduction
  • announcement
  • Paper


Session Type

Presentation


Audience

LLM app developers, data scientists, data engineers


Technical Level

Beginner – Intermediate


Prerequisites

None


Speaker: Shahrokh Daijavad, Research Scientist @ IBM Almaden Research Center

Shahrokh Daijavad, a distinguished Research Scientist in the Watsonx Data Engineering group at IBM Almaden Research Center, has a rich background in Edge Computing and Data Engineering. He earned his B.Eng. and Ph.D. in electrical engineering from McMaster University and spent years at IBM T. J. Watson Research Center. His recent research focuses on AI@Edge and Data Engineering for IBM Watsonx AI offerings.


Podobne wydarzenia

Profile pracodawców