Natural Language Processing
in a nutshell
Prof. Ivan Yamshchikov's laboratory works on various aspects of artificial intelligence aligned with verbal cognition. This includes training and inference of large language models, as well as the research on their various applications.
The laboratory has several subfields of work.
- Open and efficient language models
The laboratory partners with Jet Brains to work on efficient tokenization algorithms that could improve LM performance for code generation. We also closely work with French startup Pleias and together we contributed to the creation of Common Corpus — the biggest dataset for LLM pretraining published under permissive license. - AI Safety
NLP@CAIRO works on various aspects of LLM safety both in terms of system alignment as well as evaluation of potential harmful bias that could affect humans using LLMs for their daily needs. - AI and empathy
The third direction of the laboratory that emerged recently is focused on the questions that arise when humans interact with language models. We try to understand how LLMs affect human behavior both in terms of individual decisions as well as on the level of the social fabric.
current project(s)
ERIC — Efficient Representations for Intelligent Coding
| project title | ERIC — Efficient Representations for Intelligent Coding |
| summary | The project is focused on the creation of new tokenization algorithms that could improve the efficiency of generative models for code, but could also have positive impact on a broader set of NLP tasks especially for the low resource languages. |
| key words | tokenization, generative models for code |
| collaborator | Jet Brains |
| funding | Jet Brains |
| duration | 3 years |
AIOLIA
| project title | AIOLIA |
| summary | AIOLIA gives a robust 3-tier response to the complex challenges posed by the need to operationally interpret the EU AI Act and global AI regulation. Resolutely European, AIOLIA's vision propagates beyond EU, embracing global cooperation with leading universities and think tanks in China, South Korea, Japan, and Canada. Utilizing UNESCO platform with its reach to Africa and South Asia, AIOLIA’s guidelines evolve into an analytic toolbox for key international AI dialogues and processes. This global perspective ensures that AIOLIA's impact is not only significant but also sustainable, contributing to fair scientific cooperation and providing concrete and culturally informed ethics instruments to shape the next generation of AI systems. |
| key words | ai ethics |
| collaborators | French Alternative Energies and Atomic Energy Commission (CEA), Research Institute of Sweden (RISE), Karlsruher Insistut für Technologie (KIT), Center for Research and Technology Hellas (CERTH), CENTRIC, Amsterdam University Medical Centers, Center for European Policy Studies (CEPS), European Network of Research Ethics Committees (EUREC), Euractiv, European Research Consortium for Informatics and Mathematics (ERCIM), AI Data Robotics Association (ADRA), Afliant, Oxipit, NIT Institute, McGill University, Chinese Academy for Science and Technology for Development (CASTED), ETICAS.AI, University of Osaka, Science and Technology Policy Institute (STEPI) |
| funding | European Commission Grant Agreement 101187937 |
| duration | 3 years |
| website | aiolia.eu |
