Обо мне
Привет
Меня зовут Игорь Буянов. Я ML-инженер со специализацией в NLP. Работаю в MTS AI старшим разработчиком. Моя основная задача - разрабатывать модели классификации для голосового чат-бота МТС. Я успел два года поработать в команде разметки данных, где выстроил всю техническую часть процесса. Еще я аспирант в ФИЦ ИУ РАН. Там занимаюсь вопросами, как NLP может помочь в решении проблем с психическим здоровьем, особенно депрессии и самоубийства.
Актуальное резюме доступно на LinkedIn.
Мои контакты:
- Email: buyanov.igor.o@yandex.ru
- Telegram: @Astromis
Мои статьи
The dataset for presuicidal signals detection in text and its analysis
Abstract: The paper says about dataset for presuicidal signal detection in Russian posts from social media. To the best of our knowledge, it is a first dataset of a such type for this language. We develop a collection methodology and conduct linguistic analysis of completed dataset. We also build a classification baseline with machine learning models to solve the detection task.
Cite:
@article{Buyanov2022TheDF,
title={The dataset for presuicidal signals detection in text and its analysis},
author={Igor Buyanov and Ilya Sochenkov},
journal={Computational Linguistics and Intellectual Technologies},
year={2022},
month={June},
number={21},
pages={81--92},
url={https://api.semanticscholar.org/CorpusID:253195162},
}
Who is answering to whom? Modeling reply-to relationships in Russian asynchronous chats
Abstract: The study highlights the asynchronous nature of modern group chats and related problems such as retrieving relevant information on the asked question and understanding reply-to relationships. In this work, we formalize the reply recovery task as a building block toward solving described problems. Using simple heuristics, we try to apply the result reply recovery model to a thread reconstruction problem. As a result, we show that modern pre-trained models such as BERT show great results on the task of reply recovery compared to more simple models, though it cannot be applied to thread reconstruction with just simple heuristics. In addition, experiments have shown that model performance depends on the chat domain. We open-sourced a model that can automatically predict which message the particular reply responds to and provide a representative Russian dataset that we built from Telegram chats of different domains. We also provide a test set for a thread reconstruction task.
(paper, doi, code, model, dataset for the reply recovery, dataset for the thread reconstruction)
Cite:
@article{Buyanov2023WhoIA,
title={Who is answering to whom? Modeling reply-to relationships in Russian asynchronous chats},
author={Igor Buyanov and and Darya Yaskova and Ilya Sochenkov},
journal={Computational Linguistics and Intellectual Technologies (Supplementary volume)},
year={2023},
month={June},
number={22},
pages={1052--1060},
url={https://www.dialog-21.ru/media/5871/buyanoviplusetal046.pdf}
}
Нейросетевые методы сжатия векторов для задачи приближенного поиска ближайших соседей
Аннотация: В статье проверяется гипотеза применимости нейросетевых автокодировщиков как метод векторного сжатия для задачи приближенного поиска ближайших соседей. Проверка проводилась на нескольких больших датасетах с различными архитектурами автокодировщиков и индексов. Она показала, что, хотя ни одна из комбинаций автокодировщиков и индексов не может полностью превзойти чистые решения, в некоторых случаях они могут быть полезными. Мы также выявили некоторые эмпирические связи оптимальной размерности скрытого слоя и внутренней размерности наборов данных. Было также показано, что функция потерь является определяющим фактором качества сжатия.
Cite:
@ARTICLE{Buyanov2024-ps,
title = "Neural vector compression in approximate nearest neighbor search
on large datasets",
author = "Buyanov, Igor Olegovich and Yadrinsev, Vasiliy Vladimirovich and
Sochenkov, Ilia Vladimirovich",
abstract = "The paper examines the hypothesis of the applicability of neural
autoencoders as a method of vector compression in the pipeline
of approximate nearest neighbor search. The evaluation was
conducted on several large datasets using various autoencoder
architectures and indexes. It has been demonstrated that,
although none of the combinations of autoencoders and indexes
can fully outperform pure solutions, in some cases, they can be
useful. Additionally, we have identified some empirical
relationships between the optimal dimensionality of the hidden
layer and the internal dimensionality of the datasets. It has
also been shown that the loss function is a determining factor
for compression quality.",
journal = "Proc. Inst. Syst. Program. RAS",
publisher = "Institute for System Programming of the Russian Academy of
Sciences",
volume = 36,
number = 1,
pages = "7--22",
year = 2024
}