Logo Crossweb

Log in

No account yet? Forgot password

Przypomnij hasło

close Wypełnij formularz.
Na Twój adres e-mail zostanie wysłane link umożliwiający zmianę hasła.
Send
This event has already taken place. Check upcoming events

[AI Alliance] Workshop: Preparing High Quality Datasets with Data Prep Kit

Event:
[AI Alliance] Workshop: Preparing High Quality Datasets with Data Prep Kit
Event type:
Meetup
Category:
IT
Topic:
Date:
27.03.2025 (thursday)
Time:
17:00
Language:
English
Price:
Free
City:
Speakers:
Description:

Overview

When building machine learning and data applications, a significant portion of your time will be dedicated to data wrangling - from content extraction and filtering out problematic and low quality data. In this hands-on session we will explore Data Prep Kit - an open source toolkit, designed to streamline these essential tasks. Attendees will learn first hand how to use the Data Prep Kit to improve overall data quality such as removing spam and low quality documents, removing HAP (Hate Abuse Profanity) speech, removing PII (Personally Identifiable Information) data, thus leading to higher quality dataset.


Description

Join us for an interactive, hands-on session where you will learn to clean up data and prepare high quality datasets.

In this workshop we will do the following:

  • Extract content from various documents (PDFs, HTML)
  • cleanup and remove markups
  • Detect and remove SPAM content
  • Score and remove low-quality documents
  • Identify and remove PII data
  • Detect and remove HAP (Hate Abuse Profanity) speech from documents


What do you need to participate in this workshop?

  • Comfortable in python programming language
  • We will run the workshop code using Google Collab (free) - no other setup is needed!


Session Type

Workshop (hands-on)


Audience

LLM app developers, data scientists, data engineers


Technical Level

Beginner - Intermediate


Prerequisites

  • Comfortable in python programming language
  • We will run the workshop using Google Collab (free to use) - no other setup is needed!


Duration

60 mins


Industry

Cross industry


About the AI Alliance

The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.x

Similar events

Profile of employers