Clean up your data screening process with reporteR

Wydarzenie:

Typ wydarzenia:

Spotkanie

Kategoria:

IT

Tematyka:

Big Data

Data:

03.12.2020 (czwartek)

Godzina:

19:00

Język:

angielski

Wstęp:

Bezpłatne

Miasto:

On-line

Miejsce:

Online Event

Adres:

On your computer

Zgłoś zmiany w wydarzeniu

Zaloguj się, by zgłosić zmianę.

Opis:

# Stream URL

- https://youtu.be/djSbNBa2S_c

# Talk

- Clean up your data screening process with _reporteR_

# Event Sponsored by Jumping Rivers

https://www.jumpingrivers.com/

# Details

- webinars http://whyr.pl/webinars/

- donate http://whyr.pl/donate/

- join Why R? Slack http://whyr.pl/slack/

- join Meetup http://tiny.cc/WarsawRUG

- format: 45 min talk + 15 min for Q&A

- comments: ask YouTube live chat

# Speakers

- University of Copenhagen

### Claus Ekstrøm PhD

is a professor in biostatistics at the University of Copenhagen, Denmark. He is the creator and contributor to a number of R packages (**reporteR**, **MESS**, **MethComp**, **SuperRanker**) and is the author of "The R Primer" book. He has previously given R tutorials at useR 2016, eRum 2018, and ASAs Conference on Statistical Practice 2018, and won the C. Oswald George prize from Teaching Statistics in 2014.

### Anne Helby Petersen

is a PhD student in biostatistics at the University of Copenhagen, Denmark. She is the primary author of several R packages, including **reporteR**. She has taught statistics and R in numerous courses at the University of Copenhagen with students coming from a wide range of backgrounds, including science, medicine and mathematics.

# Talk description

## Clean up your data screening process with **reporteR**

Data cleaning and data validation are the first steps in practically any data analysis, as the validity of the conclusions from the analysis hinges on the quality of the input data.

Mistakes in the data can arise for any number of reasons, including erroneous codings, malfunctioning measurement equipment, and inconsistent data generation manuals. Consequently, it is essential to enable topic experts who are knowledgeable about the context and data collection procedure to partake in the data quality assessment since they will be better at identifying potential problems in the data. However, they may not have the technical skills to work with the data themselves.

The reporteR package (formerly known as dataMaid) makes it easy to produce a document that less R-savvy collaborators can read, understand and use to decide “do these data look right?” and documents which potential errors were considered. Both will help ensure subsequent reproducible data science and document the data at all stages of the quality assessment process.

The package includes both very user-friendly one-liner commands that auto-generates data overview reports, as well as a highly customizable suite of data validation and documentation tools that can be moulded to fit most data validation needs. And, perhaps most importantly, it was specifically build to make sure that documentation and validation go hand in hand, so we can clean up any unstructured messy data cleaning process.

Logowanie

Przypomnij hasło

Clean up your data screening process with reporteR

Why R?

Profile pracodawców

Podobne wydarzenia