Workshop Machine Learning for Research 2020

For one full week we worked with 6 research teams from different disciplines to explore if and how machine learning could help them answer their research question(s). Here is a brief summary of what we did, how it went, and what we learned.

Published in

Netherlands eScience Center

7 min readFeb 10, 2020

The following blog post is split into two parts:
1) What was the aim and what did we work on?
2) Workshop evaluation… did it work?

Most of the mentors and participants of the workshop Machine Learning for Research 2020. In total we were with 27 participants (six teams) and 13 mentors.

1) What was the aim and what did we work on?

Aim and expectation management

While machine learning and AI have become ubiquitous terms in many industries (even seem to be moving past their peak), applications of machine learning still remain the exception in numerous research fields. To some extent, this can be explained by an apparent mismatch of goals (e.g. focus on accuracy vs. focus on causal understanding). But to a large extent, the reason simply might be that researchers lack the training and experience to apply machine learning or to judge what it can and cannot do.

At the eScience Center we work on numerous scientific projects where we help implementing machine-learning (ML) techniques. One of our main contributions here typically is that we (help to) find suitable machine-learning solutions that fit the data and research question. An important aspect of this is also expectation management (see slides of my talk on this here). We have encountered cases where people were unnecessarily skeptical about ML at first, but where ML later turned out to be highly advantageous when compared to “classical” approaches. But insufficient experience with machine learning certainly also often leads to the other extreme of expecting ML (mostly deep learning though) to magically transform data into brilliant results. Giving researchers a good understanding of the possibilities is therefore key to identifying exciting, realistic opportunities to apply machine learning in research.

Expectation management is essential when people are about to gain their first hands-on experiences with machine-learning.

The key aims of this workshop were:

To give researchers a deeper understanding of machine learning – including a better understanding of what ML can and cannot do.
To explore whether each team’s research question(s) can be tackled with the use of machine learning approaches on their data.
To give researchers a clear(er) idea of the strategies that are the most promising to continue with.
And finally, to have a fun week filled with plenty of interesting new experiences.

The format of the workshop

Most of the workshop week was reserved for actual hands-on work. Each team would work together with about 2 mentors from the eScience Center or SURF on their own data and research question(s).

This was complemented by a number of presentations on different aspects of machine-learning. We had a general introduction to machine learning by Maxwell Cai (SURF, see his slides here) and a shorter lecture on the particular aspects of machine learning for research by me (see slides of my talk on this here). On Tuesday and Wednesday, we had invited speakers who came to broaden the view on machine learning and to share their thoughts with us in Q&A sessions. You can find the workshop program here.

The teams and projects

We had received applications from 25 teams (101 researchers) from a large range of disciplines and research institutions across the Netherlands. We were really impressed by the quality of nearly all of the proposals, which also made it difficult to make a final selection. In the end, the following six teams were selected to participate in the workshop:

1 Humorously Evoking a Habermasian Public Sphere? Comparing the Engaging Potential and Dialogue Quality on the Social Network Pages.
Team from: University of Amsterdam (UvA).
Field: Communication Science.

This group, from the UvA’s department of Communication Science, was keen to further analyze online discussions to see whether online debate between citizens lives up to the democratic standards of a deliberative public sphere. The researchers brought data of user discussions on YouTube videos and were aiming to classify the comments, i.e. to see when a debate is constructive or when it turns hostile. During the workshop, the team was able to apply and explore numerous machine learning and natural language processing (NLP) techniques, ranging from tfidf to classical ML models to current NLP workhorses such as fasttext.

2
DIALECT, Diabetes and Lifestyle Cohort Twente
Team from: University of Twente and ZGT Almelo.
Field: Medicine

The team from University of Twente and ZGT Almelo came to the workshop with a number of different data sets. The main work was done on diabetes type 2 patient data containing details on food intake. During the workshop, the team managed to experiment with many key machine learning techniques for such types of data.

3
Detection/Prediction of Freezing of Gait in Parkinson’s Disease
Team from: University of Twente, Orikami, Radboud University
Field: Medicine

This team came with motion sensor data of patients with Parkinson’s disease. Motion data were recorded during a lab experiment during which patients experienced “freezing gait”: a sudden, brief episode of ineffective stepping also described “as if the feet are glued to the floor”. The aim was to classify motion sensor data as freezing of gait epochs in order to improve freezing of gait detection (and prediction) algorithms. The complex nature of the data gave the team the chance to build deep learning networks and work with techniques to handle unbalanced data sets, such as up-sampling.

4
Identification of Source Types from Measured Vibration Signals
Team from: TNO
Field: Earth Sciences, Physics

Presentation of Davide Moretti from TNO on how they had assembled a really nice test data set on vibration signals (left). And presentation of Sanne van den Boom from TNO on the impressive results they got from their deep learning classifier at the end of the workshop.

The team from TNO came with a really interesting data set on vibration signals. They had collected vibration signals from 20 different sources, which included sources such as earthquakes, traffic, drilling, but also bouncing of a ball, a running washing machine or jumping. Using a deep learning model, they were able to beat classical ML tools (random forests etc.) and in the end managed to classify the source types very well! From there the team started to explore Bayesian approaches as possible next steps.

5
Early prediction of psychiatric problems in developing twins
Team from: Leiden University
Field: Medicine

Presentation by Anna van Duijvenvoorde from University of Leiden.

This team of researchers from the University of Leiden used the week to explore whether the features they had collected, among others, from MRI scans could be used to make predictions on psychiatric problems. During the workshop, the team managed to apply a wide range of different machine learning techniques, including unsupervised clustering approaches that revealed interesting correlations and aspects within their patient data.

6Segmentation and Tracking of Single Cells from Live Cell Microscopy Images.
Team from: TU Delft, AMOLF, Wageningen University
Field: Life Sciences

This team was hoping to automate a very labor-intensive process which is an essential part of their research on living cells. Within those cells, the bacteria E. Coli are growing and dividing within microfludic chambers. To follow individual bacteria over longer periods of time, it is necessary to properly segment the light microscopy images. The group brought a nice dataset of time series of bacteria with a large number of hand-corrected masks which we then used as training labels. Using convolutional deep neural networks together with proper data augmentation techniques gave us very good results by the end of the workshop.

First results based on a CNN used for segmenting bacteria from phasecontrast microscopy images.

Thanks to:

All participants and teams! We really had a great week and it was wonderful to see so much enthusiasm, curiosity and interesting research questions.
All the mentors, who were really a driving force behind this workshop! Thanks to all of you for spending so much time mentoring, but also preparing the workshop in advance and reviewing the numerous applications.
Mentors were: Sonja Georgievska, Christiaan Meijer, Jaro Camphuijsen Patrick Bos, Meiert Grootes, Cunliang Geng, Erik Tjong Kim Sang, Faruk Diblen, Dafne van Kuppevelt, Felipe, Bouwe Andela, Maxwell Cai, Florian Huber, Jisk Attema.
SURF, which provided key infrastructure for this workshop. A lot of the model training was done using Jupyter notebooks that were running on GPUs from LISA. Also thanks Maxwell (SURF) for joining us as a mentor and giving a great first day introduction to ML.
eScience Center and SURF staff for helping to set-up and manage the workshop (special thanks to Sacha van Breugel, Mateusz Kuzak, Carlos Martinez, Tom Bakker, Frank Seinstra, Johan Rheeder, Kim-Anh Holthaus).
Peter Steinbach from HZBR, Germany, who gave me the idea to initiate this workshop. Generously shared his own experiences and helped to brainstorm. Here details on their deep learning hackathon in 2019.

Netherlands eScience Center

Workshop Machine Learning for Research 2020

For one full week we worked with 6 research teams from different disciplines to explore if and how machine learning could help them answer their research question(s). Here is a brief summary of what we did, how it went, and what we learned.

1) What was the aim and what did we work on?

Aim and expectation management

The key aims of this workshop were:

The format of the workshop

The teams and projects

Thanks to:

→ Go on to read part 2/2 on our workshop evaluation: how did it work?

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Netherlands eScience Center

Written by Florian Huber

No responses yet

More from Florian Huber and Netherlands eScience Center

Build your own mass spectrometry analysis pipeline in Python using matchms — part I

Quick introduction (with code!) on how to import, process, and analyze a tandem mass spectra dataset using Python and matchms.

How to call Julia code from Python

A series on achieving high performance with high-level code

Automatic differentiation from scratch

A surprisingly simple and elegant way to teach your computer how to perform derivatives, with some Julia (and Python) examples

King -Man +Woman = King ?

Some of the best known examples used to explain the power of prominent NLP tools (like Word2Vec) only seem to work with some cheating.

Recommended from Medium

The Skill That Holds Back (Most) Data Scientists

7 communication tips that made me a better data scientist

Sentiment Analysis of Online Reviews with Different Lexicons using R

This is the third article in a series that explores the topic of sentiment analysis using R. Sentiment analysis is a powerful technique…

Encoder-Decoder Transformer Models: BART and T5

If you’re not a Medium subscriber, click here to read the full article.

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oppenheimer’s forgotten astrophysics work explains why black holes exist

Even with the quantum rules governing the Universe, there are limits to what matter can withstand. Beyond that, black holes are…

LLM Architectures Explained: NLP Fundamentals (Part 1)

Deep Dive into the architecture & building of real-world applications leveraging NLP Models starting from RNN to the Transformers.