I am a post-doctoral researcher working in the SNSF project on better language understanding with multi-lingual and multi-task learning with Prof. Dr. Rico Sennrich. I am also a lecturer at the University of Zürich bachelor's studies in Computational Linguistics and Language Technology.
My research focuses on investigating novel methods for multi-lingual language modeling and transfer learning that could be useful in reducing the reliance on data and improving the generalization capability of generative models in low-resource languages. I am particularly interested in designing architectures for unsupervised representation learning based on Bayesian latent-variable and geometric deep learning methods that can express universal structures in morphology and syntax.
- Spring 2020
521-006 Creation and Annotation of Linguistic Resources
- 2020, Vivien Angliker, MA in Multilingual Text Analysis
- 2020, Ariana Dragusha, MA in Multilingual Text Analysis
- Nov, 2020 The first workshop on Multilingual Representation Learning, which I help to co-organize is accepted by ACL to take place at EMNLP 2021. The workshop aims to prosper novel research in multilingual NLP models and their interpretation.
- Mar, 2020 I will be giving talks at the ICLR virtual conference in April and the Turkish National Linguistics Congress (dates TBD).
- Jan, 2020 I will be in Amsterdam on January 24th to give an invited talk at the Institute for Logic, Language and Computation (ILLC), University of Amsterdam.
- Dec, 2019 Our paper A Latent Morphology Model for Open-Vocabulary Neural Machine Translation is accepted to ICLR as a spotlight presentation.
- Nov, 2019 I gave a talk at the Machine Translation Seminar of our department.
- Oct, 2019 Our paper On the Importance of Word Boundaries in Character-level Neural Machine Translation is accepted to appear at the Workshop on Neural Generation and Translation at EMNLP. I will be in Hong Kong to present it.
If you are a bachelor's or master's student in linguistics, computational linguistics or computer science and would like to work with me on one of the following topics in your thesis or as a project:
- Building a new corpus for a low-resource language
- Developing linguistic tools (automatic annotation tools for linguistic features, e.g. POS/morphological taggers)
- Machine learning models for applications in natural language processing (machine translation, language modeling, text summarization, paraphrasing, information retrieval or question answering)
you can send me an email presenting yourself, the topics/languages you are interested in working with and your expected time plan for the project or graduation.
- Ph.D. University of Trento, Italy (2019)
Information and Communication Technologies
Thesis: Learning Morphology for Open-Vocabulary Neural Machine Translation
- Ph.D. (Visitor) University of Edinburgh, UK (2018)
Informatics department, research with the statistical machine translation group
- M.Sc. KU Leuven, Belgium (2015)
Electrical Engineering, with a specialization on Embedded Sytems and Multimedia
Thesis: Effects of Acoustics and Speaker Characteristics in EEG-based Auditory Attention Detection
- B.Sc. Middle East Technical University, Turkey (2013)
Electrical and Electronics Engineering, with a specialization in Computer Science
- Ataman, D., Aziz, W. and Birch, A. (2020) A Latent Morphology Model for Open-Vocabulary Neural Machine Translation. International Conference on Learning Representations (ICLR).
- Ataman, D., Firat, O., Di Gangi, M., Federico, M. and Birch, A. (2019) On the Importance of Word Boundaries in Character-level Neural Machine Translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation (WNGT) at the Conference on Empirical Methods in Natural Language Processing (EMNLP). Hong Kong. p. 187-196.
- Ataman, D. and Federico, M. (2018) Compositional Representation of Morphologically-Rich Input for Neural Machine Translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). p. 305–311.
- Ataman, D., Di Gangi, M. and Federico, M. (2018) Compositional Source Word Representations for Neural Machine Translation. Proceedings of the 21st Annual Conference of the European Association for Machine Translation (EAMT). Alacant, Spain, p. 31-40.
- Ataman, D. (2018) Bianet: A Parallel News Corpus in Turkish, Kurdish and English. In Proceedings of the LREC Workshop MLP-Moment. p. 14-17.
- Ataman, D., and Federico, M. (2018) An Evaluation of Two Vocabulary Reduction Methods for Neural Machine Translation. In Proceedings of the 13th Conference of The Association for Machine Translation in the Americas (AMTA). Boston, US. p. 97-110.
- Ataman, D., Negri, M., Turchi, M. and Federico, M. (2017) Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English. The Prague Bulletin of Mathematical Linguistics 108, European Association for Machine Translation (EAMT), Prague, Czech Republic, p. 331-342.
- Negri, M., Ataman, D., Sabet, M. J.; Turchi, M. and Federico, M. (2017) Automatic translation memory cleaning, Machine Translation. p. 1-23.