[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113641467A - Distributed block storage implementation method of virtual machine - Google Patents

Distributed block storage implementation method of virtual machine Download PDF

Info

Publication number
CN113641467A
CN113641467A CN202111213142.2A CN202111213142A CN113641467A CN 113641467 A CN113641467 A CN 113641467A CN 202111213142 A CN202111213142 A CN 202111213142A CN 113641467 A CN113641467 A CN 113641467A
Authority
CN
China
Prior art keywords
node
data storage
fragment
virtual disk
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111213142.2A
Other languages
Chinese (zh)
Other versions
CN113641467B (en
Inventor
张吉祥
程行峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Youyun Technology Co ltd
Original Assignee
Hangzhou Youyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Youyun Technology Co ltd filed Critical Hangzhou Youyun Technology Co ltd
Priority to CN202111213142.2A priority Critical patent/CN113641467B/en
Publication of CN113641467A publication Critical patent/CN113641467A/en
Application granted granted Critical
Publication of CN113641467B publication Critical patent/CN113641467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed block storage implementation method of a virtual machine, which is applied to a distributed storage system comprising 1 main control node, a plurality of client nodes and data storage nodes, wherein the client nodes map virtual disks to the virtual machine and send requests for operating the virtual disks to the main control nodes, the main control nodes process and store the information of the virtual disks, the data storage nodes provide physical storage space for the virtual disks, and the distributed block storage implementation method of the virtual machine comprises the following steps: the main control node receives a command of creating the virtual disk, selects a data storage node for creating the fragment based on a preset rule, and sends a corresponding data storage node address to the client node; the client node creates a shard at the data storage node based on the received address, the shard file being in Qcow2 format. The distributed block storage implementation method of the virtual machine, provided by the invention, has the advantages that a plurality of Qcow2 files are virtualized into a large virtual disk, the I/O balance is satisfied without hot spots, the performance is high, and the code is simple and easy to maintain.

Description

Distributed block storage implementation method of virtual machine
Technical Field
The invention relates to the field of distributed storage of virtual machines, in particular to a distributed block storage implementation method of a virtual machine.
Background
With the rapid development of cloud computing technology, Infrastructure as a Service (IaaS) is becoming more and more important as a base for cloud computing. Virtual machine services are the core of IaaS, and thus the status and requirements for providing storage services for virtual machines are increasing.
The virtual machines of the cloud operator need to store while supporting high reliability, scalability, and cheapness. Traditional virtual machine storage services can be divided into three major categories: open system Direct Attached Storage (DAS), Network Attached Storage (NAS), and Storage Area Network (SAN). However, the conventional storage is difficult to meet the storage requirement of the virtual machine in the IaaS scene. Firstly, they are difficult to expand infinitely and have insufficient reliability. The three kinds of storage are usually closed source technologies of various manufacturers, the manufacturing cost is high, and cloud operators cannot operate and maintain the storage by themselves.
There are currently some open source distributed block storage software, such as the commonly used sheetlog, which was developed by NTT laboratories in japan in 2009 to provide distributed block storage for virtual machines (QEMU/KVM virtual machines). The deployment is convenient, and the code is simple and easy to maintain. Sheepdog architecture As shown in FIG. 1, the I/O of a virtual machine is forwarded to a gateway process through a Qemu process, and then forwarded by the gateway to object manager processes on other nodes through the network.
However, the distributed storage technology using the open source has the following disadvantages, which is described by taking Sheepdog as an example:
1. the sheetlog adopts a consistent hashing algorithm, and the data stored in the blocks can be divided into small pieces to be stored on all nodes in a balanced manner by using the consistent hashing algorithm. However, it has several disadvantages:
1) the data distribution of the method cannot be completely controlled, for example, in order to improve reliability, data is usually stored in three parts, one part of the data is required to be stored in the SSD (in order to improve reading performance), and the other two parts of the data are stored in a mechanical hard disk, so that the sheetlog cannot be realized. 2) When the number of storage nodes is small, the hash algorithm is prone to data imbalance (the difference of the amount of data stored by each node is large). 3) When one node goes down, the load of the adjacent nodes is increased.
2. The large number of slices results in insufficient storage performance. One fragment is a small file at the back end of storage. To facilitate the snapshot function, the sheetlog uses 4 MB-sized fragments to combine into one large virtual disk. Thus a virtual disk will consist of a large number of fragments, and an 8TB hard disk will have two billion fragments if full. The multi-sharded operating system cannot keep all file handles open, so open, read \ write and close are required to be sequentially removed each time reading and writing are carried out. When in Open, the kernel needs to find the storage location of the fragment in the file system, and when in close, the operating system needs to flush the cache of the fragment (file) to the physical disk, so the performance of the method is low.
3. The large number of shards results in management complexity. When part of the hosts are abnormal, if the software cannot be automatically recovered, the manual recovery can hardly be realized when the software faces massive fragments.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a distributed block storage implementation method of a virtual machine, which virtualizes a plurality of Qcow2 files into a large virtual disk, and can simultaneously meet the requirements of read-write balance, no hot spot, high performance, simple code and easy maintenance.
The technical scheme adopted by the invention for overcoming the technical problems is as follows: a distributed block storage implementation method of a virtual machine is applied to a distributed storage system at least comprising 1 main control node, a plurality of client nodes and a plurality of data storage nodes, wherein the client nodes are used for mapping virtual disks to the virtual machine and sending requests for operating the virtual disks to the main control nodes, the main control nodes are used as central nodes for processing and storing information of the virtual disks, the data storage nodes are used for providing physical storage space for the virtual disks, and the distributed block storage implementation method of the virtual machine at least comprises the following steps: after receiving the command of creating the virtual disk, the master control node selects a data storage node for creating the first fragment based on a preset rule and sends a corresponding data storage node address to the client node; the client node creates a shard at the data storage node based on the received data storage node address, wherein the shard file is in Qcow2 format.
Further, the method for implementing distributed block storage of the virtual machine further includes: if the data address continuously written by the client node is not in the first fragment, the client node sends a command for creating a second fragment to the main control node; the master control node selects a data storage node according to a preset rule for storing the second fragment, sends the data storage node address of the corresponding second fragment to the client node, and records the data storage node address of the second fragment; and the client node establishes connection with the corresponding data storage node based on the received data storage node address to which the second fragment belongs, sends a message for creating a new fragment to the data storage node, and then continuously writes data. The data storage node creates a corresponding second slice based on the received message and writes the data.
And if the client continuously writes the data, continuously creating the new fragment according to the steps.
Further, the virtual disk includes a plurality of groups, each group includes a plurality of fragments, and addresses of data blocks at corresponding positions of each fragment in each group are in staggered distribution.
Further, the number of fragments, the size of the fragments and the size of the data blocks of each group of the virtual disk
The configuration information of (2) is set in the first fragment of the virtual disk.
The number of fragments in a packet can be configured, the size of the fragments can be configured, and the size of the blocks can be configured. These configurations can be placed in the extended attribute of the first fragment of the virtual disk, so that the data distribution mode of each virtual disk can be different.
Further, each of the segments of the virtual disk are distributed on the same or different data storage nodes.
Further, the client node is at least used for deploying the distributed block storage driver, and the data storage node is at least used for deploying the agent process; the distributed block storage driver is used for mapping the virtual disk to the virtual machine, sending a virtual disk operation request to the main control node, receiving corresponding fragment information, calculating a fragment address to be operated based on data distribution of the virtual disk, and forwarding data to agent processes of different data storage nodes based on the fragment address to be operated and an operation command of the virtual disk;
the agent process is used for operating the fragments of the data storage nodes based on the request of the distributed block storage driver.
Further, the method for implementing distributed block storage of the virtual machine further includes: if the client node requests to open the virtual disk, the client node requests all fragment information of the virtual disk from the main control node, and sequentially sends commands to the data storage node based on the address of the data storage node to which the obtained fragments belong so as to open the corresponding fragments, the data storage node returns the opened fragment handle to the client node, the client node stores the received fragment handle, if the client requests to read and write again, the fragment handle is sent to the data storage node, and the data storage node directly reads and writes fragment data based on the received fragment handle.
Further, the method for implementing distributed block storage of the virtual machine further includes: if the client node requests to read and write the virtual disk, the fragments needing to be read and written are calculated based on the data distribution of the virtual disk, and the read and write requests are sent to the corresponding fragments based on the data storage node addresses where the fragments to be read and written are inquired from the main control node.
Further, the method for implementing distributed block storage of the virtual machine further includes: and if the client node deletes the virtual disk, the client node requests all fragment information of the virtual disk from the main control node and sends a message to the data storage based on the obtained fragment information, so that the data storage deletes the fragments corresponding to the virtual disk.
Further, the method for implementing distributed block storage of the virtual machine further includes: if the client node carries out snapshot on the virtual disk, the distributed block storage driver of the client node requests all fragment information of the virtual disk from the main control node to obtain data storage node addresses of all fragments of the virtual disk, and the distributed block storage driver of the client node sends snapshot information to corresponding data storage nodes based on the received data storage node addresses, so that the agent process of the data storage nodes carries out snapshot on the fragments corresponding to the virtual disk.
The invention has the beneficial effects that:
1. the data distribution is completely controlled, the algorithm is flexible, and the slicing position is completely controllable.
2. The Qcow2 format of the open source standard is adopted as a storage back end, and the Qcow2 format supports snapshot. Therefore, the method is simple and easy to implement, the code amount is small, and the maintenance cost of the software is greatly reduced.
3. The storage mode of 'fragmentation + interleaving' is adopted for the storage back end, so that the read-write can be uniformly fragmented on different hosts, the read-write balance is realized, and read-write hot spots are avoided.
4. The large fragments are adopted, the capacity of each fragment is at least 1GB, the number of system fragments is reduced, and all fragment positions can be directly recorded in the master control. When a cluster fails, the situation of which virtual disks are damaged by fragments can be easily known, so that manual repair becomes possible. And the data node in which the data is stored does not need to be calculated when the data is read and written each time.
5. The read-write path of the whole client side is carried out in a user mode. The read-write path is short, the efficiency is high, and the performance is better under novel storage scenes such as SSD and NVME. And the user mode program is convenient to upgrade and maintain, and the kernel mode program is very troublesome to upgrade.
Drawings
FIG. 1 is a Sheepdog architecture diagram;
FIG. 2 is a diagram illustrating roles of a distributed storage system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a distributed block store I/O flow according to an embodiment of the present invention;
fig. 4 is a schematic diagram of virtual disk data distribution according to an embodiment of the present invention.
Detailed Description
For further understanding of the invention, some of the terms mentioned in the present application will first be explained:
and (3) block storage: all data in the block device is partitioned into blocks of a fixed size, each block being assigned a number for addressing. The block store, which is typically a hard disk, may have non-sequential access to the stored data.
Distributed block storage: the single-machine block storage is limited by the capacity of the single-machine block storage, infinite capacity expansion cannot be realized, and single-machine faults are easy to occur. Therefore, a distributed storage technology is generally adopted in cloud computing, that is, data is stored in multiple copies, and each copy is stored in different host nodes, so that the reliability and capacity expansion capability of the cloud computing are improved.
Slicing: in order to implement functions such as I/O balancing, infinite capacity expansion, etc., distributed block storage generally divides a virtual disk into many small fragments for storage according to address offset.
User mode/kernel mode: the operating system is divided into a user mode and a kernel mode. In order to improve reliability, the user mode program cannot perform high-authority operations, but performs high-authority operations such as I/O through an interface provided by the kernel mode. When the user mode calls the kernel mode interface to operate, the CPU performs mode switching to protect the kernel mode interface, and the kernel mode interface is switched to the user mode again after the execution is finished. The cost of such a handover is a reduction in performance.
And (4) process: the Qemu process is a widely used piece of open source software that can emulate a wide variety of hardware environments, such as emulating a hard disk, and then running the guest's virtual machine operating system on the emulated hardware.
: the Qcow2 image format is one of the disk image formats supported by the open source software Qemu. It is a file format for emulating a hard disk to a virtual machine, which supports snapshot operations.
The invention is described in further detail below with reference to the figures and the specific embodiments, which are only exemplary and do not limit the scope of the invention.
The method for implementing distributed block storage of the virtual machine is applied to a distributed storage system which at least comprises 1 main control node, a plurality of client nodes and a plurality of data storage nodes, as shown in figure 2.
The three nodes of the main control node, the client node and the data storage node can be separately deployed on different machines or can be centrally deployed on the same machine, wherein only one main control node is provided, and the client node and the data storage node can be transversely expanded. The Master control node corresponds to a Master in fig. 2, the Client node corresponds to a Client in fig. 2, and the data storage node corresponds to a Chunk Server in fig. 2.
In some embodiments, the master node may also be designed as a master-slave structure, and the client nodes and data storage may extend to the scale of thousands of stations.
FIG. 3 is a diagram illustrating a flow of distributed block store I/O according to an embodiment of the present invention.
The distributed block storage drive is deployed on the client node, and the operations of creating a virtual disk, deleting the virtual disk, expanding the virtual disk, carrying out snapshot and the like can be carried out on the client node. The operation is that the client node sends a request to the main control node, and the main control node processes and stores the information of the virtual disk.
The deployed distributed block storage driver is used for mapping the virtual disk to the virtual machine, sending a virtual disk operation request to the main control node, receiving corresponding fragment information, calculating a to-be-operated fragment address based on data distribution of the virtual disk, and forwarding data to agent processes of different data storage nodes based on the to-be-operated fragment address and an operation command of the virtual disk.
The main control node is a collector of all the fragment positions and state information of a virtual disk. The method stores information of which fragments each virtual disk is composed of, which data storage each fragment is located in, and the like in the memory.
In some embodiments, the fragmentation information in the master node is uploaded to it by the data storage node at startup. And the main control node can distribute the positions of the newly-built fragments to different data storage nodes according to a set rule.
And the data storage node runs the agent process, and the data storage node is the real storage back end of the fragment of the virtual disk. A virtual disk is formed by combining a plurality of fragments according to a certain rule. The agent process is used for processing various commands sent by the client, such as opening, reading, writing, closing, deleting and the like.
In one embodiment of the invention, each shard is a Qcow2 file with capacity exceeding 1GB in a file system.
The invention discloses a method for realizing distributed block storage of a virtual machine, which at least comprises the following steps: receiving device of master control node
After a virtual disk establishing command is sent, selecting a data storage node for establishing a first fragment based on a preset rule, and sending a corresponding data storage node address to a client node; the client node creates a first shard at the data storage node based on the received data storage node address, wherein the shard file is in a Qcow2 format.
In an embodiment of the present invention, the shard file is stored in the Qcow2 format, and different from shards in other distributed storage, the Qcow2 mode has the following advantages:
1) qcow2 is a mature, reliable storage format that is widely used in private clouds. The method has simple structure and convenient code realization, and can even directly reference open source codes in some embodiments. 2) Since the Qcow2 supports the snapshot technology, the implementation of the snapshot of the virtual disk after the file in the format is stored is very simple, as long as all the fragments of the virtual disk are simultaneously snapshot. 3) Qcow2 itself supports thin allocation without additional implementation of the thin allocation algorithm.
In addition, the set rule includes selecting the data storage node for creating the fragment based on the read-write performance or the remaining capacity or the vacancy degree.
The following describes a distributed block storage implementation method of a virtual machine by using a process of creating a virtual disk according to an embodiment of the present invention.
And the main control node receives a new virtual disk command of the client node, and selects the data storage node with the maximum residual capacity to store a new fragment based on the residual capacity of the data storage node. The master control node records the created virtual disk and the address of the data storage node corresponding to the new fragment, and sends the address of the data storage node to which the new fragment belongs to the client node.
The client node establishes connection with the data storage node when data is written for the first time based on the address of the data storage node to which the received new fragment belongs, and the data storage node can automatically create the first fragment.
If the client node continues to write data but the data address is not in the first partition, e.g. the data address is
In the second fragment, the client node sends a command for creating the second fragment to the master control node; the main control node selects one data storage node again to store the second fragment according to the idle degree of the data storage node, sends the data storage node address of the corresponding second fragment to the client node, and records the data storage node address of the second fragment; and the client node establishes connection with the corresponding data storage node according to the address of the data storage node to which the second fragment belongs, sends a fragment creating message to the data storage node, and then continues to write data. The data storage node creates a corresponding second slice based on the received message and writes the data.
When the client node writes data, the distributed block storage driver judges according to a preset data distribution mode of the virtual disk in the embodiment of the invention, judges whether a fragment corresponding to the written data position exists, and creates the fragment if the fragment does not exist. After the subsequent write data message arrives on the data storage node, the agent process of the data storage node changes it to Qcow2 format for storage.
In some embodiments, when a disk is created, the main control node completely controls the position allocation of all the fragments, the distribution of the fragments of the virtual disk is completely controlled by a program in the main control node, and an I/O balancing algorithm or a special scheme may be used to control the distribution of the fragments. For example by capacity, for example by disc type.
The data distribution of the virtual disk in the embodiment of the present invention is shown in fig. 4. In the embodiment of the invention, the virtual disk is formed by combining a plurality of fragments, namely, a virtual disk is formed by combining a plurality of Qcow2 files.
In an embodiment shown in fig. 4, the virtual disk is divided into N fragments according to the address offset for storage, and the virtual disk is constructed in a "multi-fragment + staggered distribution" manner. Wherein, four fragments are a group, and the data in each fragment is partitioned according to 8MB size. Data for virtual disks 0-8MB are stored at locations 0-8MB of slice 0, data for virtual disks 8MB-16MB are stored at locations 0-8MB of slice 1, data for virtual disks 32MB-40MB are stored at locations 8MB-16MB of slice 0, and so on. The file storage of the virtual disk is distributed in a staggered way. The configuration information of the number of fragments, the size of the fragments and the size of the data blocks of each group of the virtual disk is set in the extended attribute of the virtual disk fragment 0, so that the data distribution mode of each virtual disk can be different. And each shard may be located at a different data storage node or may be located at the same data storage node. The number of the fragments and the size of the fragments in the grouping can be configured, and the size of the fragments can be configured according to actual requirements.
The first fragment and the second fragment described in the embodiments of the present invention do not specifically refer to a specific one of the fragments, and in the process of creating a new virtual disk described in the above embodiments, the first fragment corresponds to the fragment 0, and the second fragment corresponds to the fragment 1.
The virtual disk is constructed by adopting multi-fragmentation and staggered distribution, and the following advantages are provided. Many times the reading and writing is continuous, that is, the virtual machine reads and writes continuously offset addresses while reading and writing data. For example, 0-32M data is read continuously, so that read and write requests can be performed simultaneously in four fragments in the first packet shown in fig. 4, and it is avoided that all read and write requests are concentrated on one fragment, which results in that the I/O of the data storage node where the fragment is located is busy, and the I/O of other data storage nodes is idle. The read-write throughput of the system can be improved because each read-write request can be split into a plurality of small read-write requests to be completed by a plurality of machines in parallel. And each fragment is set to be larger, and the minimum is 1 GB. Therefore, open, read/write and close are not needed each time during reading and writing, and the open and the close are not needed after the open, thereby improving the performance. However, some open source software such as the sheetlog cannot do this, and they usually adopt 4MB of one slice, if all the open state is maintained, billions of file handles are generated for data with the size of 4TB, when the file handles are closed, the operating system forcibly writes the written data into the hard disk from the cache, and the performance of the hard disk is not continuously written, which consumes a lot of resources. By adopting the technical scheme of the invention, the 4TB data only has 4000 file handles, so the data does not need to be closed after open, the read/write is directly carried out during each read-write, the file handles do not need to be closed, the cache does not need to be flushed, and the performance is higher. If the Sheepdog uses such a large file as 1GB as a shard, there is a very large delay in writing data into the shard after the shard snapshot. Because it needs to copy a complete copy of a slice as a snapshot and then write data into the slice (copy-on-write, record only to make a snapshot at the time of snapshot, and then wait until the data is written to make a data backup), large file copy takes a long time, and thus a very large I/O delay occurs. Therefore, the sheetlog can only adopt small files such as 4MB as fragments to avoid huge delay.
In an embodiment of the present invention, a client node requests an open virtual disk, when a virtual disk is opened for I/O, a distributed block storage driver configured by the client node requests all fragment information of the virtual disk from a main control node, and based on the obtained fragment information, sends a request of the open virtual disk to an agent process of a data storage node where the fragment is located, as shown in fig. 3, opens all fragments of the data storage and stores handles of the fragments, and directly reads and writes the handles of the fragments if the client requests reading and writing again, thereby performing reading and writing.
In an embodiment of the present invention, when reading and writing a virtual disk, because all the fragment location information and handles of the virtual disk are already stored in the open virtual disk, the distributed block storage driver calculates, according to the location to be read and written, which fragment the read and write request needs to be forwarded to according to the data location mapping rule of the virtual disk in fig. 4, for example, if the read and write location is 1MB, the read and write request is directly forwarded to the 0 th fragment; if the read-write location is 42MB, then the read-write request is forwarded to slice 1 and the read-write location is changed to 10 MB.
In an embodiment of the present invention, when a client node deletes a virtual disk, a distributed block storage driver of the client node requests all pieces of information of the virtual disk from a main control node to obtain data storage node addresses of all pieces of the virtual disk, and then the distributed block storage driver of the client node sends a deletion message to a corresponding data storage node, so that an agent process of the data storage node operates to delete the pieces corresponding to the virtual disk.
In another embodiment of the present invention, when the client node performs snapshot on the virtual disk, the distributed block storage driver of the client node requests all the fragment information of the virtual disk from the main control node to obtain the data storage node addresses of all the fragments of the virtual disk, and then the distributed block storage driver of the client node sends a snapshot message to the corresponding data storage node, so that the agent process of the data storage node performs snapshot on the fragment corresponding to the virtual disk.
The foregoing merely illustrates the principles and preferred embodiments of the invention and many variations and modifications may be made by those skilled in the art in light of the foregoing description, which are within the scope of the invention.

Claims (10)

1. A distributed block storage implementation method of a virtual machine is applied to a distributed storage system at least comprising 1 main control node, a plurality of client nodes and a plurality of data storage nodes, wherein the client nodes are used for mapping virtual disks to the virtual machine and sending requests for operating the virtual disks to the main control nodes, the main control nodes are used as central nodes for processing and storing information of the virtual disks, and the data storage nodes are used for providing physical storage space for the virtual disks, and the distributed block storage implementation method of the virtual machine at least comprises the following steps:
after receiving the command of creating the virtual disk, the master control node selects a data storage node for creating the first fragment based on a preset rule and sends a corresponding data storage node address to the client node;
the client node creates a first shard at the data storage node based on the received data storage node address, wherein the shard file is in a Qcow2 format.
2. The method for implementing distributed block storage of a virtual machine according to claim 1, further comprising: if the data address continuously written by the client node is not in the first fragment, the client node sends a command for creating a second fragment to the main control node;
the master control node selects a data storage node according to a preset rule for storing the second fragment, sends the data storage node address of the corresponding second fragment to the client node, and records the data storage node address of the second fragment;
the client node establishes connection with the corresponding data storage node based on the received data storage node address to which the second fragment belongs, sends a message for creating a new fragment to the data storage node, and then continues to write data;
the data storage node creates a corresponding second slice based on the received message and writes the data.
3. The method for implementing distributed block storage of the virtual machine according to any one of claims 1 to 2, wherein the virtual disk includes a plurality of groups, each group includes a plurality of fragments, and addresses of data blocks in corresponding positions of each fragment in each group are distributed in a staggered manner.
4. The method according to claim 3, wherein configuration information of the number of fragments, the size of the fragments, and the size of the data blocks in each group of the virtual disk is stored in an extended attribute of a first fragment of the virtual disk.
5. The method of claim 4, wherein each partition of the virtual disk is distributed on the same or different data storage nodes.
6. The distributed block storage implementation method of the virtual machine according to claim 3, wherein the client node is at least used for deploying the distributed block storage driver, and the data storage node is at least used for deploying the agent process;
the distributed block storage driver is used for mapping the virtual disk to the virtual machine, sending a virtual disk operation request to the main control node, receiving corresponding fragment information, calculating a fragment address to be operated based on data distribution of the virtual disk, and forwarding data to agent processes of different data storage nodes based on the fragment address to be operated and an operation command of the virtual disk;
the agent process is used for operating the fragments of the data storage nodes based on the request of the distributed block storage driver.
7. The method of claim 6, further comprising:
if the client node requests to open the virtual disk, the client node requests all fragment information of the virtual disk from the main control node, and sequentially sends commands to the data storage node based on the address of the data storage node to which the obtained fragments belong so as to open the corresponding fragments, the data storage node returns the opened fragment handle to the client node, the client node stores the received fragment handle, if the client requests to read and write again, the fragment handle is sent to the data storage node, and the data storage node directly reads and writes fragment data based on the received fragment handle.
8. The method of claim 6, further comprising:
if the client node requests to read and write the virtual disk, the fragments needing to be read and written are calculated based on the data distribution of the virtual disk, and the read and write requests are sent to the corresponding fragments based on the data storage node addresses where the fragments to be read and written are inquired from the main control node.
9. The method of claim 6, further comprising:
and if the client node deletes the virtual disk, the client node requests all fragment information of the virtual disk from the main control node and sends a message to the data storage based on the obtained fragment information, so that the data storage deletes the fragments corresponding to the virtual disk.
10. The method of claim 6, further comprising:
if the client node carries out snapshot on the virtual disk, the distributed block storage driver of the client node requests all fragment information of the virtual disk from the main control node to obtain data storage node addresses of all fragments of the virtual disk, and the distributed block storage driver of the client node sends snapshot information to corresponding data storage nodes based on the received data storage node addresses, so that the agent process of the data storage nodes carries out snapshot on the fragments corresponding to the virtual disk.
CN202111213142.2A 2021-10-19 2021-10-19 Distributed block storage implementation method of virtual machine Active CN113641467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111213142.2A CN113641467B (en) 2021-10-19 2021-10-19 Distributed block storage implementation method of virtual machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111213142.2A CN113641467B (en) 2021-10-19 2021-10-19 Distributed block storage implementation method of virtual machine

Publications (2)

Publication Number Publication Date
CN113641467A true CN113641467A (en) 2021-11-12
CN113641467B CN113641467B (en) 2022-02-11

Family

ID=78427365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111213142.2A Active CN113641467B (en) 2021-10-19 2021-10-19 Distributed block storage implementation method of virtual machine

Country Status (1)

Country Link
CN (1) CN113641467B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114647388A (en) * 2022-05-24 2022-06-21 杭州优云科技有限公司 High-performance distributed block storage system and management method
CN115146318A (en) * 2022-09-02 2022-10-04 麒麟软件有限公司 Virtual disk safe storage method
CN117130980A (en) * 2023-10-24 2023-11-28 杭州优云科技有限公司 Virtual machine snapshot management method and device
CN117591246A (en) * 2024-01-18 2024-02-23 杭州优云科技股份有限公司 Method and device for realizing WEB terminal of KVM (keyboard video mouse) virtual machine

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103516755A (en) * 2012-06-27 2014-01-15 华为技术有限公司 Virtual storage method and equipment thereof
CN103761059A (en) * 2014-01-24 2014-04-30 中国科学院信息工程研究所 Multi-disk storage method and system for mass data management
WO2018054079A1 (en) * 2016-09-23 2018-03-29 华为技术有限公司 Method for storing file, first virtual machine and namenode
CN112148206A (en) * 2019-06-28 2020-12-29 北京金山云网络技术有限公司 Data reading and writing method and device, electronic equipment and medium
CN112527492A (en) * 2019-09-18 2021-03-19 华为技术有限公司 Data storage method and device in distributed storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103516755A (en) * 2012-06-27 2014-01-15 华为技术有限公司 Virtual storage method and equipment thereof
CN103761059A (en) * 2014-01-24 2014-04-30 中国科学院信息工程研究所 Multi-disk storage method and system for mass data management
WO2018054079A1 (en) * 2016-09-23 2018-03-29 华为技术有限公司 Method for storing file, first virtual machine and namenode
CN112148206A (en) * 2019-06-28 2020-12-29 北京金山云网络技术有限公司 Data reading and writing method and device, electronic equipment and medium
CN112527492A (en) * 2019-09-18 2021-03-19 华为技术有限公司 Data storage method and device in distributed storage system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114647388A (en) * 2022-05-24 2022-06-21 杭州优云科技有限公司 High-performance distributed block storage system and management method
CN115146318A (en) * 2022-09-02 2022-10-04 麒麟软件有限公司 Virtual disk safe storage method
CN115146318B (en) * 2022-09-02 2022-11-29 麒麟软件有限公司 Virtual disk safe storage method
WO2024045407A1 (en) * 2022-09-02 2024-03-07 麒麟软件有限公司 Virtual disk-based secure storage method
CN117130980A (en) * 2023-10-24 2023-11-28 杭州优云科技有限公司 Virtual machine snapshot management method and device
CN117130980B (en) * 2023-10-24 2024-02-27 杭州优云科技有限公司 Virtual machine snapshot management method and device
CN117591246A (en) * 2024-01-18 2024-02-23 杭州优云科技股份有限公司 Method and device for realizing WEB terminal of KVM (keyboard video mouse) virtual machine
CN117591246B (en) * 2024-01-18 2024-05-03 杭州优云科技股份有限公司 Method and device for realizing WEB terminal of KVM (keyboard video mouse) virtual machine

Also Published As

Publication number Publication date
CN113641467B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
CN113641467B (en) Distributed block storage implementation method of virtual machine
US10346081B2 (en) Handling data block migration to efficiently utilize higher performance tiers in a multi-tier storage environment
CN107122127B (en) Storage operation offload to storage hardware
US9934108B2 (en) System and method for optimizing mirror creation
US8504670B2 (en) Virtualized data storage applications and optimizations
US9348842B2 (en) Virtualized data storage system optimizations
US8799557B1 (en) System and method for non-volatile random access memory emulation
EP2905709A2 (en) Method and apparatus for replication of files and file systems using a deduplication key space
US20220083247A1 (en) Composite aggregate architecture
US20050071560A1 (en) Autonomic block-level hierarchical storage management for storage networks
US11860791B2 (en) Methods for managing input-output operations in zone translation layer architecture and devices thereof
US20180260154A1 (en) Selectively storing data into allocations areas using streams
WO2012114338A1 (en) Cloud storage arrangement and method of operating thereof
JP2003162377A (en) Disk array system and method for taking over logical unit among controllers
CN109313538A (en) Inline duplicate removal
EP3322155B1 (en) Virtual disk processing method and apparatus
CN111164584B (en) Method for managing distributed snapshots for low latency storage and apparatus therefor
US20110088029A1 (en) Server image capacity optimization
JP4226350B2 (en) Data migration method
US11614879B2 (en) Technique for replicating oplog index among nodes of a cluster
CN117348968A (en) Cache data acceleration method, device and equipment of virtual disk
WO2016088258A1 (en) Storage system, backup program, and data management method
US12032849B2 (en) Distributed storage system and computer program product
JP5278254B2 (en) Storage system, data storage method and program
WO2024061212A1 (en) Data storage method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 310000 room 611-612, Zhuoxin building, No. 3820, South Ring Road, Puyan street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Youyun Technology Co.,Ltd.

Country or region after: China

Address before: 310053 room 611-612, Zhuoxin building, 3820 South Ring Road, Puyan street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee before: Hangzhou Youyun Technology Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address