Internship: Improving Generalization Performance of Neural Networks under Multiple Data Distributions and Data Drift

Come work with us as a Research Intern

At BrainCreators, we're at the forefront of applied AI with many years of successful research internship projects that combine cutting edge science with the challenges of applying AI in the real world. The focus of this year’s AI research internship projects will be on the technical challenges at the heart of our Machine Learning platform, BrainMatter.

What we expect from you

  • A full-time commitment to the research internship project.
  • A solid background in the theoretical subjects relevant for your particular project and ML coding skills in pyTorch.
  • Good communication and presentational skills, and a willingness to learn as much as possible in this exciting year.
  • Your project will have a scientific component on which you are encouraged to work towards a publishable paper at the end of the year.
  • Your project will also have an applied component, the result of which is a functional and documented piece of cutting-edge software that can be integrated into BrainMatter.
  • Bachelor’s degree in Artificial Intelligence or related field.

What we can offer you

  • The opportunity to work in our research team as a full time member.
  • A workplace in our Prinsengracht HQ with access to our compute cluster if required.
  • Support and supervision, including a weekly personal supervision meeting and research team group meeting as well as support for integration into our software stack when needed.
  • Internal weekly workshops about scientific and industrial progress.
  • Become part of a vibrant team of AI realists that know how to get things done.
  • Our best interns will be offered a full time job opportunity after graduation.

Project overview

A common assumption in machine learning is that all samples from the training dataset stem from the same generative process that has no memory of past generation events. This is the ubiquitous “i.i.d. assumption” that we often know is violated to some degree and treated as an ideal. Although there is exciting and ongoing academic research to attempt to overcome this problem, see for example [1][2], in Machine Learning practice it is all too common to ignore this assumption, train and deploy the models, and hope for the best. BrainCreators offers a research internship position to assist with our work on this challenge and create a safer, more informed deployment of the models in industrial settings for our clients. Central topics are Out-of-Distribution Generalization (OoDG) and monitoring and responding to data drift during live deployments. 

The research intern is invited to find overlap between these topics, select a number of relevant methods from the literature to approach the common challenges, and work towards integration of these methods into BrainMatter, our Machine Learning platform. In particular, a starting point could be to integrate into BrainMatter the methods of Invariant Risk Minimization [1] and Risk Extrapolation [2]. Other relevant areas of research involve the continuous monitoring of data drift during model deployment, methods to train models that are robust to drift, and methods for automatic response to drift. For a recent overview on learning under concept drift, see [3]. 

Experiments performed by the intern will have to confirm the usability of these methods to our particular use-cases in infrastructure, manufacturing, and real-estate. BrainCreators typically works with multiple clients in each of these domains. The possibility of combining data sets from different clients in the same domain is attractive, but would inevitably run into violations of the i.i.d. assumptions. Other applications could, for example, concern multiple production lines in the same factory, and frequently changing configurations and properties of material resources that influence the data. Finally, robustness of Machine Learning models in the face of changing sensors and camera types is an important strength we want our platform to have in order to scale our activities. 

The practical goal of the project is to work towards a software deliverable that integrates one or more methods from this field into our Machine Learning platform, BrainMatter. The research intern will set up their own experimentation pipeline to assess the strengths and weaknesses of a selection of approaches. There is considerable academic freedom to make this selection. The software should be as modular as possible, housed in a docker container, and integrated into the automated ML pipeline based on our existing KubeFlow/Kubernetes workflows. 

One possible use would be for the end-user to provide a partitioning of the training data into subsets assumed to be drawn from separate distributions, select one or more heuristics, and leave it to the platform to find the best possible way to exploit these assumptions for OoDG. Another use would be to have fully automated methods to answer the question when to retrain during deployment and on which subset of data to retrain. The intern is encouraged to improve upon existing methods by designing their own novel heuristics. Finally, the research intern is encouraged to work towards a publishable academic paper at the end of the project. 

[1] Invariant Risk Minimization  [2] Out-of-Distribution Generalization via Risk Extrapolation (REx)  [3] Learning under Concept Drift: A Review 

Interested?

If you'd like to apply for this internship, send your CV and cover letter to our Head of Research, Maarten Stol.