Internship: Efficient Representation Learning for Video Similarity Ranking

Come work with us as a Research Intern

At BrainCreators, we're at the forefront of applied AI with many years of successful research internship projects that combine cutting edge science with the challenges of applying AI in the real world. The focus of this year’s AI research internship projects will be on the technical challenges at the heart of our Machine Learning platform, BrainMatter.

What we expect from you

  • A full-time commitment to the research internship project.
  • A solid background in the theoretical subjects relevant for your particular project and ML coding skills in pyTorch.
  • Good communication and presentational skills, and a willingness to learn as much as possible in this exciting year.
  • Your project will have a scientific component on which you are encouraged to work towards a publishable paper at the end of the year.
  • Your project will also have an applied component, the result of which is a functional and documented piece of cutting-edge software that can be integrated into BrainMatter.
  • Bachelor’s degree in Artificial Intelligence or related field.

What we can offer you

  • The opportunity to work in our research team as a full time member.
  • A workplace in our Prinsengracht HQ with access to our compute cluster if required.
  • Support and supervision, including a weekly personal supervision meeting and research team group meeting as well as support for integration into our software stack when needed.
  • Internal weekly workshops about scientific and industrial progress.
  • Become part of a vibrant team of AI realists that know how to get things done.
  • Our best interns will be offered a full time job opportunity after graduation.

Project overview

Unlike representation learning for static images, representation learning for video has a different set of non-trivial challenges. Processing video as an ordered set of static frames fed to an image classifier may fail if the class labels correspond to the dynamics of movement and action. The temporal aspect of the data makes it necessary to take into account the order and spread of information over multiple frames in a way that is itself informed by the video content. However, the often much larger size of video datasets and samples does not seem to imply that we also need larger representations. The redundancy in a video fragment is often high, and representation learning is then a matter of selecting relevant aspects of the video fragment to have a stronger impact on the embedding. 

For our Intelligent Automation platform BrainMatter, we are interested in collecting a flexible and powerful toolset for video representation learning. Applications based on some form of similarity ranking include, but are not limited to, near-duplicate video retrieval [1], video recommendation, efficient sorting and searching of video datasets, etc. 

For supervised learning in particular, we are interested in bulk-labeling based on similarity rankings: given a single video with a known label, retrieve unlabeled but visually or semantically similar videos for human confirmation of that same label. This can greatly speed up the annotation process in use cases where labels are scarce or annotation effort is prohibitively expensive. Another relevant issue is how to combine predicted labels for video subfragments into a meaningful form of action recognition for the full video. One example approach that might be useful can be found in [4]. Since some of the use-cases are likely to start with either very few labels, or no labels at all, we are also interested in unsupervised or self-supervised representation learning [2][3]. 

The research intern is invited to help shape relevant research questions in this internship project, and work together with our team to formulate the video roadmap for BrainMatter.  There is considerable academic freedom, as long as the research is based on recent machine learning results, and relevant to some of our current areas of industrial application. For this topic, most of our relevant use-cases are in the application areas of infrastructure, manufacturing, or real-estate.

The practical goal of the project is to work towards a software deliverable that integrates one or more methods from this field into our Machine Learning platform, BrainMatter. The research intern will set up their own experimentation pipeline to assess the strengths and weaknesses of a selection of approaches. The end-user should typically be able to provide a short video fragment as a query, and efficiently retrieve a set of videos via a similarity ranking of available videos in the database. The software should be as modular as possible, housed in a docker container, and integrated into the automated ML pipeline based on our existing KubeFlow/Kubernetes workflows. 

Finally, the research intern is encouraged to work towards a publishable academic paper at the end of the project. 

[1] Near-Duplicate Video Retrieval with Deep Metric Learning [2] Unsupervised Learning from Video with Deep Neural Embeddings [3] Evolving Losses for Unsupervised Video Representation Learning   

[4] Video Representation Learning Using Discriminative Pooling 

Interested?

If you'd like to apply for this internship, send your CV and cover letter to our Head of Research, Maarten Stol.