Публикации ZIL.digital
Научные исследования о микросервисах, облачных платформах и высоконагруженных системах
Smart Microservice Orchestration in Kubernetes
Ровнягин М.М., Федоров К.А., Синельников Д.М., Ерошев А.А., Ровнягина Т.А., Тихомиров А.В., Яковенко И.А.
Most modern cloud Platform as a Service (PaaS) systems are built on the Docker application containerization technology. In this case, the server application (microservice) is “packaged” together with all its dependencies into a Docker container, which simplifies its transfer from server to server. The servers operate in cluster mode under the control of the Kubernetes container orchestration system. The Kubernetes cluster includes a scheduler that distributes containers across cluster nodes. The schedulers currently used do not have intelligent algorithms for placing containers on nodes. They work on the principle of placing in the “first suitable slot”. This paper presents the architecture, technical implementation and experimental results of a «smart» scheduler that places containers while maintaining locality of interaction between microservices, which has a positive effect on interaction latency.
Optimizing Cache Memory Usage Methods for Chat LLM-models in PaaS Installations
Ровнягин М.М., Синельников Д.М., Ерошев А.А., Ровнягина Т.А., Тихомиров А.В.
Recently, LLM models have become widespread in industry. They are used as the basis for voice assistants, troubleshooting systems, chatbots and much more. The work of LLM is based on the architecture of transformer-type neural networks, where text is supplied as input (for example, text chat), and the model, estimating the character-by-character probability, returns the result also in the form of text. In this case, the model does not retrain. And the context itself is constantly accumulating. This paper proposes two ways to reduce the memory allocated for storing the test chat context. The first method is to periodically launch additional training in order to embed the chat context into the core of the model itself. The article discusses the pros and cons of this approach. The second method is to save in the text chat cache only with those users where this context has already been formed. The article describes the layout for conducting the experiment, provides the results of the experimental study and describes the method for assessing the “maturity” of the chat correspondence context.
Intelligent Docker Container Orchestration for Low Scheduling Latency and Fast Migration in Paas
Ровнягин М.М., Синельников Д.М., Варыханов С.С., Магазов Т.Р., Киямов А.А., Широких Т.А.
Most modern cloud Platform as a Service (PaaS) systems are built on the basis of application containerization technology. The containerization schedulers currently in use do not have intelligent algorithms for placing containers across nodes. They work on the principle of placing in the “first suitable slot”. This article discusses two issues related to intelligent decision-making about moving containers between servers: the lethality of obtaining metrics for monitoring the current state of the platform and taking into account the influence of container deployment parameters on the speed of transformations. The article proposes a way to determine the necessary and sufficient number of monitoring services for latency management. The paper also considers the influence of microservices launch parameters, the size of the container image on the time of transferring the container between nodes, and proposes a model for assessing the complexity of transferring containers between nodes.
Data Exchange Acceleration Methods in a Decentralized File System
Ровнягин М.М., Синельников Д.М., Варыханов С.С., Худоярова А.М., Яковенко И.А., Широких Т.А.
The efficiency of storing and transmitting information in decentralized systems depends on many factors. This article presents methods for accelerating data exchange in a decentralized file system by choosing the optimal file compression algorithm and choosing the optimal server to initialize the connection of a new client. Lossless compression is one of the ways to reduce the load on the channel. In this paper, adaptive compression is considered. This is a special algorithm that, depending on the input data, will adjust the algorithm used to achieve optimal performance in terms of time and compression ratio. To maintain a certain level of quality of service and optimize the use of server resources, it is necessary to adequately predict the workload, assess the current state of the servers, and choose the right balancing strategy. In this article, the ELECTRA IS algorithm is applied to select the optimal server when initializing the connection.
Approach of Program's Concurrency Evaluation in PaaS Cloud Infrastructure
Мингажитдинова Э.Ф., Синельников Д.М., Одинцев В.В., Ровнягин М.М., Варыханов С.С.
In the most cases programs have got different level of concurrency. According to Amdahl's Law, changing the amount of resources may not give a gain in computational efficiency. The article’s goal which was determined by the authors is to find out the balance between increasing the amount of resources and improving the efficiency that can be obtained for computation in the K8s cluster. In an effort of resolving the task authors have decided to explore the correlation between increasing the number of podes and time of Spark program processing (data-intensive and compute-intensive computations) in the local minikube for following deployment in K8s.
Methods for Speeding Up the Retraining of Neural Networks
Варыханов С.С., Синельников Д.М., Одинцев В.В., Ровнягин М.М., Мингажитдинова Э.Ф.
Nowadays, machine learning is widespread and is becoming more complex. Developing and debugging neural networks is becoming more and more time-consuming. Distributed solutions are often used to speed up the learning process. But these solutions do not solve the problem of retraining model from zero if the learning fails. This paper presents a new approach to training models on a large datasets, which can save time and resources during the development. This approach is splitting the model's learning process into separate layers. Each of these layers can be modified and reused for the next layers. The implementation of this approach is based on transfer learning and distributed machine learning techniques. To create reusable network layers, it is proposed to use the methods of automating code parallelization for hybrid computing systems described in the article. These methods include: tracking the readiness and dependencies in the data, speculative execution at the kernel level, creating a DSL.
Algorithm of ML-based Re-scheduler for Container Orchestration System
Ровнягин М.М., Дмитриев С.О., Храпов А.С., Козлов В.К.
Due to the gradual growth of the number of companies that use cloud technologies, there is an increase in the number of enterprises deploying and using an internal private cloud. Due to this trend, there is growth of interest in various technologies that ensure the efficiency of the cloud infrastructure. One of such technologies is the orchestration technology, the core of which is a scheduler - a special component that allows efficiently distribute virtualized entities with running tasks across computational nodes. However, schedulers usually only plan the locations schemes of tasks that was not started yet; often they do not plan to make changes to the arrangement of already running entities. To create the plan of changing the state of already running tasks deschedulers and reschedulers are additionally used. This article proposes a solution using a Reinforcement Learning based rescheduler and an algorithm of its preparation.
Database Storage Format for High Performance Analytics of Immutable Data
Ровнягин М.М., Дмитриев С.О., Храпов А.С., Максутов А.А., Туровский И.А.
Most of modern database management systems offer a set of data manipulation operations, which strictly limits the available methods of data storage optimization. This article describes a database storage format that provides a low latency access to stored data with highly optimized sequential data extraction process by prohibiting any data modification after initially loading the data. The current study is aimed at developing a database management system that is suitable for high performance analytics of immutable data and performs better than database management systems with wider applicability. This paper includes developed data storage formats, data load and extraction algorithms and performance measurements.
Orchestration of CPU and GPU Consumers for High-Performance Streaming Processing
Ровнягин М.М., Гуков А.Д., Тимофеев К.В., Храпов А.С., Митенков Р.А.
In the modern world, there are many systems using streaming data processing. Often, these systems use CPU and GPU devices in their calculations. It should be noted that such systems can fail for various reasons. Therefore, to optimize throughput, system designers need to determine in advance how many CPUs and GPUs to configure the system with. In our article, we present a possible architecture of such a system and present what methods can be used to calculate the optimal number of CPUs and GPUs with optimal throughput and taking into account other factors, for example, the cost of devices and the failure rate of the environment.
Burrows - Wheeler Transform in lossless Data compression Problems on hybrid Computing Systems
Ровнягин М.М., Варыханов С.С., Синельников Д.М., Одинцев В.В.
Currently, hybrid computing systems and clusters based on them are used to solve an increasing number of various tasks. This article addresses the issue of lossless data compression on hybrid computing systems. There are sections describing the development and implementation of a stack of lossless data compression algorithms based on the Burrows - Wheeler transform (BWT), as well as a section on data sorting on hybrid computing systems as one of the BWT steps. At the end are the test results of the proposed algorithms.
Caching and Storage Optimizations for Big Data Streaming Systems
Ровнягин М.М., Козлов В.К., Митенков Р.А., Гуков А.Д., Яковлев А.А.
Data processing is one of the most important processes in Big Data systems. In this paper, we propose a method and its performance model for data deduplication in distributed event driven software systems using Kafka streams and Apache Ignite cache, which reduces network and memory consumption. Also in this article the way of data storage systems optimization is considered by example of Apache Cassandra. The experiments showed that choosing of compression algorithms for different kinds of data with usage of neural network can help to find the balance between memory usage and read speed from the database.
Distributed Fault-tolerant Platform for Web Applications
Ровнягин М.М., Синельников Д.М., Одинцев В.В., Варыханов С.С.
Web applications are software applications, services or microservices that runs on a remote server. The problem of downtime for web application is important and in some cases, is critical for business. Nowadays, cluster solutions are often used to provide fault-tolerance for applications. But these solutions don’t solve the problem of downtime if all instances of application are down. This paper presents a complex approach to provide fault-tolerance for web applications even if all instances of applications in the cluster are down. The approach is based on long-polling and request queueing methods. In this work Apache Kafka and Google Protocol Buffers has been used as the core for the fault-tolerant platform.
ML-based Heterogeneous Container Orchestration Architecture
Ровнягин М.М., Храпов А.С., Гуминская А.В., Орлов А.П.
In recent years, the popularity of containerization technologies has been growing. When they are used, computational tasks are placed in lightweight containers that can be easily moved between different computing nodes. Containerization using Docker is especially popular at the moment. The use of these solutions opens up enormous opportunities for building distributed and cluster computing systems. To maintain the operability of such systems, special tools are used, and one of them is an orchestrator. However, existing orchestrators are focused on not-so-large computing systems in which performance can be maintained by simply moving computational tasks from non-working nodes to working ones. In large systems with many nodes and a huge number of computational tasks, it is also necessary to take into account the uneven consumption of resources by various tasks. This article proposes a system architecture that can solve the problem of container orchestration using machine learning methods and given the uneven consumption of resources by.
Presentation of the PaaS System State for Planning Containers Deployment Based on ML-Algorithms
Ровнягин М.М., Храпов А.С.
In modern world one of the most important technologies is virtualization. And one of the most promising types of virtualization is OS-level virtualization, also known as containerization. Its use greatly simplifies the task of deploying stable computing system services that are performed on suitable hardware depending on the current situation. Various additional tools are used to automating the process of managing the location of the containers. However, most existing container management tools provide only the simplest behaviors. One of the more complex tasks that cannot be solved by such tools can be represented as follows: there are several virtualized entities (containers) that can be executed on cluster nodes. Each entity contains a task that consumes a certain amount of computing resources. It is necessary to distribute entities among nodes in such a way that each of them has enough resources.This paper proposes a more complex methodology that solves the proposed problem of service management using machine learning methods.
Cloud Computing Architecture for High-volume ML-based Solutions
Ровнягин М.М., Тимофеев К.В., Еленкин А.А., Шипугин В.А.
A large number of modern projects use machine learning technology to perform a variety of business calculations. There are two main ways to integrate machine-learning models into the logic of industrial applications. The first way is to rewrite models from the data analysis language (for example R or Python) to the industrial development language (for example Java, Go or Scala). The second way is to equip models with a web-interface and integrate it into the calculation. In this article, we explore the second method. A deployment architecture for machine learning in the clouds is proposed. The possibilities of the proposed scheme for scaling are described. Examples of practical use of the proposed architecture for organizing data storage with compression are also given.
Cloud computing architecture for high-volume monitoring processing
Ровнягин М.М., Одинцев В.В., Федин Д.Ю., Кузьмин А.В
Monitoring systems have historically been used to obtain, process and display metrics coming from applications or devices. Now providers are massively reoriented to SDN / NFV infrastructure. An increasing number of applications operate on a micro-service architecture. Thus, in addition to the natural evolutionary growth of the collected metrics number, there is an abrupt increase in connection with the change of technology. There is a need for correlation analysis, which does not allow scaling of old monitoring systems using fragmentation. This article proposes a linearly scalable architecture of the monitoring system. The proposed approach makes it possible to increase the reliability of processing metrics and to perform cross-domain correlation analysis with the use of computable metrics. The features of real-time control system construction based on the proposed architecture are also described.
NFV chaining technology in hybrid computing clouds
Ровнягин М.М., Варыханов С.С., Маслов Ю.В., Ряховская Ю.С., Мыльцин О.В.
Currently, the network functions virtualization technology (NFV) is becoming more widespread. Hardware devices are replaced with software-based cloud solutions. One way to increase the performance of virtual network functions (VNF) is to use hybrid computing technologies. In this paper, we propose ways to implement the most popular networking functions. The architecture of the orchestration system for hybrid systems and related features is explained. Also, the article presents the experimental results.
Using the ML-based architecture for adaptive containerized system creation
Ровнягин М.М., Гуминская А.В., Плюхин А.А., Орлов А.П., Чернилин Ф.Н., Храпов А.С.
Most of today's applications are built on a micro-service architecture, where one large application is divided into several differently functional parts. Each part is packaged in an isolated container with its own file system, process space and IP address. Contemporary container management tools implement only the simplest strategies for placing containers in cluster systems. In this paper, we propose a method for constructing adaptive containerized infrastructures using machine learning methods. The paper also describes the methodology for designing distributed systems with containers of different types: data-intensive and compute-intensive.
Application of hybrid computing technologies for high-performance distributed NFV systems
Ровнягин М.М., Кузнецов А.А.
Currently, the areas of transmission and distributed processing technology commonly used network functions virtualization (NFV). A key feature of these solutions is that they can use in the Cloud. Modern cloud infrastructure often has the GPGPU-coprocessors for acceleration purposes. There are a number of problems in which it is possible to use hybrid computing technology to speed up the NFV: encryption, pattern matching, media content processing (compression, audio/video processing, etc.). In this paper, we propose a new NFV-GPGPU architecture and provide the results of its study in case of AES network virtualized function acceleration.
Benchmarking of high performance computing clusters with heterogeneous CPU/GPU architecture
Сухарев П.В., Васильев Н.П., Ровнягин М.М., Дурнов М.А.
The most modern supercomputers are clusters of computing nodes with a heterogeneous architecture; in such a node we can see both classical CPUs and specialized computing coprocessors (GPU-or MIC-based). Modern benchmark tests are oriented either for CPU or for specific coprocessor calculations - but not in a whole thing. This article is dedicated to describe an approach to build a unified performance test for a heterogeneous HPC cluster with CPU/GPU computing nodes.
Deep Learning Approach for QRS Wave Detection in ECG Monitoring
Митрохин Максим, Кузьмин Андрей, Митрохина Наталья, Захаров Сергей, Ровнягин Михаил
Paper describes an approach of deep learning for QRS wave detection for using in mobile heart monitoring systems. Authors analyze a deep learning approach and its advantages in the field of feature extraction and detection, and deep network architecture. Two different variants of deep network are proposed. ECG data processing scheme that includes a neural network is described. It presumes preprocessing, filtering, windowing of ECG signal, buffering, QRS wave detection and analysis. Network training process is mathematically founded. Two variants of neural network are experimentally tested. Training sets and test sets are obtained from free ECG data bank PhysioN et.org. Experimental results show that network with decreasing number of neurons in hidden layers has a better generalization capability. Next steps of research will include experiments with training set size and determining of its' influence on the quality of detection.
Intelligent data processing scheme for mobile heart monitoring system
Кузьмин А., Митрохин М., Митрохина Н., Ровнягин М., Алимурадов А.
Paper describes some aspects of data processing in mobile heart monitoring systems. Authors highlight an important cardiological and social problem of arrhythmic pathologies of the heart and the possibilities of new medical equipment to detect the arrythmia. The architecture and the features of up-to-date monitoring systems are investigated. The specificity of long term ECG records is examined via the example of PAF Prediction Challenge Database from physionet.org. The scheme of open architecture modular ECG device is developed for experimental research of HRV analysis and arrhythmia paroxysms prediction. Data processing scheme that enables to design portative monitoring system for the detection of signs of arrhythmia and predict the arrhythmia paroxysm is proposed. Set of required ECG signal processing methods and algorithms is chosen.
Methods for implementation of Pseudo-Random Number Generator based on GOST R 34.12-2015 in hybrid CPU/GPU/FPGA high-performance systems
Скитев А.А., Ровнягин М.М., Мартынова Е.Н., Звягина М.И., Шелопугин К.Д., Чернова А.А.
The architecture of high-performance data storage and processing systems has changed considerably. Modern cloud computing systems are often not just a hybrid but also supports hardware acceleration. The paper describes the scope of information security protocols based on PRNG in industrial systems. The work provides a method for implementing GOST R 34.12-2015 Based Pseudo-Random Number Generator in hybrid systems. The description and results of studies parallel CUDA and FPGA versions of the algorithm for use in hybrid data centers are presented.
Modeling NoSQL Systems in Many-nodes Hybrid Environments
Ровнягин М.М., Чернилин Ф.Н., Гуминская А.В., Кинаш В.М., Мыльцин О.В., Орлов А.П., Кузьмин А.В.
Data search is one of the most important problems in the field of computer science and computer facilities. Classical relational DBMSs (RDBMSs), unfortunately, are not suitable as data storage systems for Big Data. Therefore, the concept NoSQL is now widely spread. A common feature of such systems is a high throughput and linear scalability, depending on the number of storage servers used. One of the most productive NoSQL-systems, at the moment is Apache Cassandra. In this paper, we suggest ways to simulate the performance of such systems in hybrid computing environments.
Increasing the functionality of the modern NoSQL-systems with GPGPU-technology
Козлов А.А., Алешина А.А., Каменских И.С., Ровнягин М.М., Синельников Д.М., Шульга Д.А.
The object of research is a distributed database management system Apache Cassandra and methods to improve its functionality by applying hybrid computing technologies. The encryption standard GOST 28147-89, which has a great potential for parallelization on the graphics core and can be used as the expansion module. Development was carried out in the Java language using the open source project Apache Cassandra and JCuda parallelization library for GPGPU systems. The article contains analysis of the Apache Cassandra architectural features and review of possibilities for connecting the expansion modules. The article presents a mathematical model of the proposed solution and experimental results (graphs of recording data rate to the base: in the standard mode, with using CPU encryption, with using GPU encryption).
Software development framework for a distributed storage and GPGPU data processing infrastructure
Каменских И.С., Синельников Д.М., Калинцев Д.С., Козлов А.А., Ровнягин М.М., Шульга Д.А.
The problem of choosing the cluster or a cluster node for task execution is important for the overall performance of a distributed system. This paper presents a complex approach to the planning of computations on heterogeneous distributed systems - a set of clusters and NoSQL storage systems. Dynamic scheduling algorithm depends on: the inter-cluster network parameters, characteristics of cluster interconnect, compute nodes utilization, co-processors computing capabilities, etc. In this work Hadoop YARN, CUDA technology and NoSQL-system Apache Cassandra has been used as the experimental platform.
Software platform VAR for heterogeneous GPGPU-systems
Васильев Н.П., Ровнягин М.М.
Currently, the process of generation of new data becomes an “avalanche”. Classic RDBMS have little use as storage systems for Data Mining applications. The concept of NoSQL (not only the SQL) storage systems has currently become widespread. This article discusses some of the issues related to the organization of storage and searching in multicomputer systems. The modern principles of NoSQL-systems are described. Examines opportunities to accelerate the work of such systems through the prism of GPGPU-technologies. We present the results of comparative testing of high-performance search and secure storage of data and NoSQL-VAR system Apache Cassandra.
The scheduling based on machine learning for heterogeneous CPU/GPU systems
Шульга Д.А., Капустин А.А., Козлов А.А., Козырев А.А., Ровнягин М.М.
Efficient use all of the available computing devices is an important issue for heterogeneous computing systems. The ability to choose a CPU or GPU processor for a specific task has a positive impact on the performance of GPGPU-systems. It helps to reduce the total processing time and to achieve the uniform system utilization. In this paper, we propose a scheduler that selects the executing device after prior training, based on the size of the input data. The article also contains the plots and time characteristics that demonstrate improvement in overall execution time, depending on the input data. The program modules were developed in C++ using CUDA libraries.
Cloud computing architectures for mobile robotics
Дюмин А.А., Пузиков Л.А., Ровнягин М.М., Урванов Г.А., Чугунков И.В.
In last decade, classic IT-infrastructures in modern enterprises have been changed sufficiently by cloud computing. However, cloud approach can be used effectively in other applications. In this paper, we explore applicability of cloud paradigm in mobile robotics. First approach is using some classic cloud architectures, such as Infrastructure-as-a-Service, Platform-as-a-Service and Software-as-a-Service, with heterogeneous distributed mobile robotic system - for example, it can be used to offload a time consuming computations from mobile platforms to remote nodes or services. Second approach is using some special “robotic” types of clouds, such as Robot-as-a-Service and Function-as-a-Service to leverage abilities of a single robotics platform in mobile robotic system as services to robotic system's users or other mobile robotic platforms. We explore both approaches and give them some architectural examples.
Evaluation of statistical properties of a modified Bloom filter for heterogeneous GPGPU-systems
Дюмин А.А., Кузнецов А.А., Ровнягин М.М.
The object of the research is a modified Bloom filter with counters representing the probabilistic data structure that contains information about the items in a data store. As a result of filter operation false-negative responses are excluded, however occurrence of false-positives is possible. The system is developed in Java using jCuda libraries to parallelize the algorithm on GPGPU systems. The article presents the mathematical evaluation of the probability of false-positives for the modified Bloom filter with counters. A statistical analysis has been made for runs with different parameters: number of hash functions, length of the counter, filter size, number of added elements. The article presents corresponding graphs of the probability distribution of false-positives.
Three-dimensional data stochastic transformation algorithms for hybrid supercomputer implementation
Чугунков И.В., Иванов М.А., Ровнягин М.М., Скитев А.А., Спиридонов А.А., Васильев Н.А., Мулейс Р.Б., Михайлов Д.М.
This paper describes new three-dimensional algorithms of stochastic data transformation, offering a solution for information security problems. The most important feature of these algorithms is a high degree of parallelism at the level of elementary operations. In this paper we present a new 3D stochastic transformation called DOZEN, inspired by AES cipher, and two new constructions of S-box, called 2D and 3D S-boxes respectively.