Delivery Hero Logo

Event Date: 10 Dec 2019

Berlin Women in Machine Learning & Data Science @ DH

Join us for our first Women in Machine Learning & Data Science Meetup at Delivery Hero. The topic of the night will be “Reproducibility in theory and in praxis”.

WiMLDS meetups aim to inspire, educate, regardless of gender, and support women and gender minorities in the field.

All genders may attend this meetup.


18:30 Arrival and network
19:00 Introduction to WiMLDS activities
19:10 A talk by Katharina Rasch
19:40 Q&A
19:50 Open mic +women to share experiences and initiatives
20:00 Network

Katharina Rasch is a computer scientist with a PhD from KTH Stockholm. From 2014 to 2017 she was a data scientist / computer vision researcher at zalando. Now she is a freelance data scientist in Berlin. At the moment, Katharina is obsessed with professionalising AI development. Less chaos, please!

“As a data scientist I often feel envious of the tooling available to software engineers. Tools for build automatisation, continuous integration, code review, etc help software engineers follow established best practices. In contrast, many of us data scientists have taken to building our own tools for things like managing experiments, for tracking data, for enabling reproducibility. Of course, writing such tools is hard and takes a lot of effort.

Fortunately, the good news is: more and more software supporting data science best practices is becoming available to us. From stand-alone packages such as DVC, polyaxon to Software as a Service solutions such as floydhub, valohai. The bad news is: there really are a lot of these tools around and it is hard to know which one to go with.

In this talk I want to show you, how readily available tools can help you follow best practices in data science. I will focus on the model development phase of a data science project, I will not be talking about tooling for model deployment. I will start with an overview of available tools and will then do a deep-dive comparison of 2-3 tools and show how they support you with things like

– Versioning data
– Tracking which data / code / library versions / parameters are used in which experiment
– Easily comparing / visualising experiment results
– Enabling everybody in your team / future you to replicate experiments

I will also compare them on non-technical dimensions such as

– Ease of use / collaboration
– Price (especially for SaaS solutions)
– Vendor lock-in

After this talk you should have a good idea of which tools already are available and which things you can/should look for when deciding if a tool is right for your project.”