Workshop Machine Learning for Research 2020
For one full week we worked with 6 research teams from different disciplines to explore if and how machine learning could help them answer their research question(s). Here is a brief summary of what we did, how it went, and what we learned.
The following blog post is split into two parts:
1) What was the aim and what did we work on?
2) Workshop evaluation… did it work?

1) What was the aim and what did we work on?
Aim and expectation management
While machine learning and AI have become ubiquitous terms in many industries (even seem to be moving past their peak), applications of machine learning still remain the exception in numerous research fields. To some extent, this can be explained by an apparent mismatch of goals (e.g. focus on accuracy vs. focus on causal understanding). But to a large extent, the reason simply might be that researchers lack the training and experience to apply machine learning or to judge what it can and cannot do.
At the eScience Center we work on numerous scientific projects where we help implementing machine-learning (ML) techniques. One of our main contributions here typically is that we (help to) find suitable machine-learning solutions that fit the data and research question. An important aspect of this is also expectation management (see slides of my talk on this here). We have encountered cases where people were unnecessarily skeptical about ML at first, but where ML later turned out to be highly advantageous when compared to “classical” approaches. But insufficient experience with machine learning certainly also often leads to the other extreme of expecting ML (mostly deep learning though) to magically transform data into brilliant results. Giving researchers a good understanding of the possibilities is therefore key to identifying exciting, realistic opportunities to apply machine learning in research.

The key aims of this workshop were:
- To give researchers a deeper understanding of machine learning – including a better understanding of what ML can and cannot do.
- To explore whether each team’s research question(s) can be tackled with the use of machine learning approaches on their data.
- To give researchers a clear(er) idea of the strategies that are the most promising to continue with.
- And finally, to have a fun week filled with plenty of interesting new experiences.
The format of the workshop
Most of the workshop week was reserved for actual hands-on work. Each team would work together with about 2 mentors from the eScience Center or SURF on their own data and research question(s).
This was complemented by a number of presentations on different aspects of machine-learning. We had a general introduction to machine learning by Maxwell Cai (SURF, see his slides here) and a shorter lecture on the particular aspects of machine learning for research by me (see slides of my talk on this here). On Tuesday and Wednesday, we had invited speakers who came to broaden the view on machine learning and to share their thoughts with us in Q&A sessions. You can find the workshop program here.
The teams and projects
We had received applications from 25 teams (101 researchers) from a large range of disciplines and research institutions across the Netherlands. We were really impressed by the quality of nearly all of the proposals, which also made it difficult to make a final selection. In the end, the following six teams were selected to participate in the workshop:
1 Humorously Evoking a Habermasian Public Sphere? Comparing the Engaging Potential and Dialogue Quality on the Social Network Pages.
Team from: University of Amsterdam (UvA).
Field: Communication Science.

This group, from the UvA’s department of Communication Science, was keen to further analyze online discussions to see whether online debate between citizens lives up to the democratic standards of a deliberative public sphere. The researchers brought data of user discussions on YouTube videos and were aiming to classify the comments, i.e. to see when a debate is constructive or when it turns hostile. During the workshop, the team was able to apply and explore numerous machine learning and natural language processing (NLP) techniques, ranging from tfidf to classical ML models to current NLP workhorses such as fasttext.
2
DIALECT, Diabetes and Lifestyle Cohort Twente
Team from: University of Twente and ZGT Almelo.
Field: Medicine
The team from University of Twente and ZGT Almelo came to the workshop with a number of different data sets. The main work was done on diabetes type 2 patient data containing details on food intake. During the workshop, the team managed to experiment with many key machine learning techniques for such types of data.
3
Detection/Prediction of Freezing of Gait in Parkinson’s Disease
Team from: University of Twente, Orikami, Radboud University
Field: Medicine
This team came with motion sensor data of patients with Parkinson’s disease. Motion data were recorded during a lab experiment during which patients experienced “freezing gait”: a sudden, brief episode of ineffective stepping also described “as if the feet are glued to the floor”. The aim was to classify motion sensor data as freezing of gait epochs in order to improve freezing of gait detection (and prediction) algorithms. The complex nature of the data gave the team the chance to build deep learning networks and work with techniques to handle unbalanced data sets, such as up-sampling.
4
Identification of Source Types from Measured Vibration Signals
Team from: TNO
Field: Earth Sciences, Physics


The team from TNO came with a really interesting data set on vibration signals. They had collected vibration signals from 20 different sources, which included sources such as earthquakes, traffic, drilling, but also bouncing of a ball, a running washing machine or jumping. Using a deep learning model, they were able to beat classical ML tools (random forests etc.) and in the end managed to classify the source types very well! From there the team started to explore Bayesian approaches as possible next steps.
5
Early prediction of psychiatric problems in developing twins
Team from: Leiden University
Field: Medicine

This team of researchers from the University of Leiden used the week to explore whether the features they had collected, among others, from MRI scans could be used to make predictions on psychiatric problems. During the workshop, the team managed to apply a wide range of different machine learning techniques, including unsupervised clustering approaches that revealed interesting correlations and aspects within their patient data.
6Segmentation and Tracking of Single Cells from Live Cell Microscopy Images.
Team from: TU Delft, AMOLF, Wageningen University
Field: Life Sciences
This team was hoping to automate a very labor-intensive process which is an essential part of their research on living cells. Within those cells, the bacteria E. Coli are growing and dividing within microfludic chambers. To follow individual bacteria over longer periods of time, it is necessary to properly segment the light microscopy images. The group brought a nice dataset of time series of bacteria with a large number of hand-corrected masks which we then used as training labels. Using convolutional deep neural networks together with proper data augmentation techniques gave us very good results by the end of the workshop.

Thanks to:
- All participants and teams! We really had a great week and it was wonderful to see so much enthusiasm, curiosity and interesting research questions.
- All the mentors, who were really a driving force behind this workshop! Thanks to all of you for spending so much time mentoring, but also preparing the workshop in advance and reviewing the numerous applications.
Mentors were: Sonja Georgievska, Christiaan Meijer, Jaro Camphuijsen Patrick Bos, Meiert Grootes, Cunliang Geng, Erik Tjong Kim Sang, Faruk Diblen, Dafne van Kuppevelt, Felipe, Bouwe Andela, Maxwell Cai, Florian Huber, Jisk Attema. - SURF, which provided key infrastructure for this workshop. A lot of the model training was done using Jupyter notebooks that were running on GPUs from LISA. Also thanks Maxwell (SURF) for joining us as a mentor and giving a great first day introduction to ML.
- eScience Center and SURF staff for helping to set-up and manage the workshop (special thanks to Sacha van Breugel, Mateusz Kuzak, Carlos Martinez, Tom Bakker, Frank Seinstra, Johan Rheeder, Kim-Anh Holthaus).
- Peter Steinbach from HZBR, Germany, who gave me the idea to initiate this workshop. Generously shared his own experiences and helped to brainstorm. Here details on their deep learning hackathon in 2019.