
25 Dec Using Monolingual Speech Data to Improve Multilingual Translation Models – Internship
Category: Internship
Start date: 1st semester of 2022
Duration: 6 months
Description
A large part of today’s 7000+ languages do not have a writing system, and many more only have a very small amount of available textual data. As an example, while wikipedia exists in 264 languages, only 100 of those have more than 5000 pages.
In this internship we plan to investigate how to leverage monolingual speech data to improve multilingual text translation systems. For that, we will base ourselves on existing work in speech-to-text translation: models that start from large pre-trained models (e.g., [1, 2]) as well as our previous experience in end-to-end speech translation models [3] and the use of monolingual data to improve translation systems for unseen languages [4].
At NLE we proud ourselves of very tight collaborations with our interns, consisting of very regular meetings and joint brainstorming and development. Interns are integrated into existing teams and participate actively in the scientific activities of the centre.
Required Skills
- enrolled in a PhD or research master programme, in the topic of NLP, speech processing or applied machine learning
- experience in at least one of machine translation, ASR or multi-task learning
- good knowledge in tensorflow or (preferably) pytorch
- track record of published papers in top-tier conferences is a plus
References
- [1] Tang, Yun, et al. “Improving speech translation by understanding and learning from the auxiliary text translation task.” arXiv preprint arXiv:2107.05782 (2021).
- [2] Li, Xian, et al. “Multilingual speech translation with efficient finetuning of pretrained models.” arXiv preprint arXiv:2010.12829 (2020).
- [3] Bérard, Alexandre, et al. “Listen and translate: A proof of concept for end-to-end speech-to-text translation.” arXiv preprint arXiv:1612.01744 (2016).
- [4] Üstün, Ahmet, et al. “Multilingual unsupervised neural machine translation with denoising adapters.” arXiv preprint arXiv:2110.10472 (2021).
Application Instructions
Please note that applicants must be registered students at a university or other academic institution and that this establishment will need to sign an ‘Internship Convention’ with NAVER LABS Europe before the student is accepted.
You can apply for this position online. Don’t forget to upload your CV and cover letter before you submit. Incomplete applications will not be accepted.
About NAVER LABS
NAVER LABS is a world class team of self-motivated and highly engaged researchers, engineers and interface designers collaborating together to create next generation ambient intelligence technology and services that are rich with the organic understanding they have of users, their contexts and situations.
Since 2013 LABS has led NAVER’s innovation in technology through products such as the AI-based translation app ‘Papago’, the omni-tasking web browser ‘Whale’, the virtual AI assistant ‘WAVE’, in-vehicle information entertainment system ‘AWAY’ and M1, the 3D indoor mapping robot.
The team in Europe is multidisciplinary and extremely multicultural specializing in artificial intelligence, machine learning, computer vision, natural language processing, UX and ethnography. We collaborate with many partners in the European scientific community on R&D projects.
NAVER LABS Europe is located in the south east of France in Grenoble. The notoriety of Grenoble comes from its exceptional natural environment and scientific ecosystem with 21,000 jobs in public and private research. It is home to 1 of the 4 French national institutes in AI called MIAI (Multidisciplinary Innovation in Ai) It has a large student community (over 62,000 students) and is a lively and cosmopolitan place, offering a host of leisure opportunities. Grenoble is close to both the Swiss and Italian borders and is the ideal place for skiing, hiking, climbing, hang gliding and all types of mountain sports.
Please click here to apply.
Sorry, the comment form is closed at this time.