US20220100669A1 - Smart storage device - Google Patents
Smart storage device Download PDFInfo
- Publication number
- US20220100669A1 US20220100669A1 US17/403,862 US202117403862A US2022100669A1 US 20220100669 A1 US20220100669 A1 US 20220100669A1 US 202117403862 A US202117403862 A US 202117403862A US 2022100669 A1 US2022100669 A1 US 2022100669A1
- Authority
- US
- United States
- Prior art keywords
- data
- smart
- memory
- protocol
- computation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001133 acceleration Effects 0.000 claims abstract description 17
- 230000004044 response Effects 0.000 claims abstract description 13
- 230000015654 memory Effects 0.000 claims description 134
- 238000012545 processing Methods 0.000 claims description 31
- 238000012546 transfer Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 description 12
- 101100232371 Hordeum vulgare IAT3 gene Proteins 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 101000711846 Homo sapiens Transcription factor SOX-9 Proteins 0.000 description 5
- 102100034204 Transcription factor SOX-9 Human genes 0.000 description 5
- 230000003936 working memory Effects 0.000 description 5
- 238000000034 method Methods 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 4
- 239000000470 constituent Substances 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000001693 membrane extraction with a sorbent interface Methods 0.000 description 2
- 238000011017 operating method Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/28—DMA
Definitions
- the present disclosure relates to a storage device, and more particularly, to a storage device using a Computer eXpress Link (CXL) interface.
- CXL Computer eXpress Link
- Modern storage devices are capable of storing larger amounts of data and are equipped to operate at faster speeds.
- PCIe peripheral component interconnect express
- a smart storage device includes a smart interface connected to a host device, an accelerator circuit connected to the smart interface through a data bus conforming to compute express link (CXL).cache protocol and a CXL.mem protocol, and configured to perform acceleration computation in response to a computation command of the host device and a storage controller connected to the smart interface through a data bus conforming to CXL.io protocol and configured to control a data access operation for a storage device in response to a data access command of the host device.
- the accelerator circuit is directly accessible to the storage device through an internal bus connected directly to the storage controller.
- a smart storage device includes a smart interface connected to a host device, a memory controller circuit connected to the smart interface through a data bus conforming to CXL.cache protocol and a CXL.mem protocol, and configured to control a first access operation for a memory device.
- a storage controller is connected to the smart interface through a data bus conforming to CXL.io protocol and configured to control a second access operation for a storage device.
- the smart interface includes an internal connection directly connecting the data bus conforming to the CXL.mem protocol and the CXL.io protocol to directly access the memory controller and the storage controller.
- a smart storage device includes a smart interface connected to a host device.
- An accelerator circuit is connected to the smart interface through a data bus conforming to CXL.cache protocol and CXL.mem protocol, and configured to perform acceleration computation in response to a computation command of the host device.
- a storage controller is connected to the smart interface through a data bus conforming to a CXL.io protocol and configured to control a data access operation for a storage device in response to a data access command of the host device.
- An accelerator memory controller circuit is connected to the smart interface through the data bus conforming to the CXL.cache protocol and the CXL.mem protocol, and configured to control a second access operation for an accelerator memory device.
- the storage controller is directly accessible to the accelerator circuit and the accelerator memory controller circuit.
- a method of operating a smart storage device includes receiving a command from a host device, transmitting the command to an accelerator circuit through a compute express link (CXL) interface, requesting, by the accelerator circuit, data access from a storage controller through an internal bus based on computation information extracted by decoding the command, accessing, by the storage controller, data from a storage device according to the request and receiving, by the accelerator circuit, a data access result received from the storage device to perform acceleration computation based on the command.
- CXL compute express link
- FIG. 1 is a block diagram illustrating a smart storage device in accordance with example embodiments of the present disclosure
- FIG. 2 is a block diagram illustrating the smart storage device of FIG. 1 ;
- FIG. 3 is a block diagram illustrating the accelerator circuit of FIG. 2 ;
- FIG. 4 is a block diagram illustrating the storage controller of FIG. 2 ;
- FIGS. 5 and 6 are flowcharts illustrating a method of operating the smart storage device of FIG. 2 ;
- FIG. 7 is a block diagram illustrating the smart storage device of FIG. 1 ;
- FIG. 8 is a block diagram illustrating the smart interface of FIG. 1 ;
- FIGS. 9 to 11 are flowcharts illustrating a method of operating the smart storage device of FIG. 7 .
- FIG. 1 is a block diagram illustrating a smart storage device according to embodiments of the present disclosure.
- a host device 10 may correspond to a central processing unit (CPU), a graphic processing unit (GPU), a neural processing unit (NPU), a field-programmable gate array (FPGA), a processor, a microprocessor, an application processor (AP), or the like.
- the host device 10 may be implemented as a system-on-a-chip (SoC).
- SoC system-on-a-chip
- the host device 10 may be a mobile system such as a portable communication terminal (mobile phone), a smart phone, a tablet computer, a wearable device, a healthcare device, an Internet of Things (IoT) device, a personal computer, a laptop/notebook computer, a server, a media player, or an automotive device such as a satellite navigation system.
- the host device 10 may include a communication device configured to transmit and receive signals between other devices outside the host device 10 according to various communication protocols.
- the communication device is a device that connects the host device 10 to a wired or wireless connection, and may include, for example, an antenna, a transceiver, and/or a modem.
- the host device 10 may be connected to, for example, an Ethernet network or may be connected to a wireless network through the communication device.
- the host device 10 may include a host processor 11 and a host memory 12 .
- the host processor 11 may control the overall operation of the host device 10
- the host memory 12 is a working memory and may store instructions, programs, data, or the like, that may be necessary for the operation of the host processor 11 .
- a smart storage device 1000 may be a data center or an artificial intelligence learning data device according to embodiments of the present disclosure.
- the smart storage device 1000 may be a semiconductor device capable of performing computations and storing data, such as processing-in-memory (PIM) or computing-in-memory (CIM).
- PIM processing-in-memory
- CCM computing-in-memory
- the smart storage device 1000 may include a smart interface 100 , an accelerator circuit 200 , a storage controller 300 , and a memory controller 400 .
- the smart storage device 1000 may include the smart interface 100 , the accelerator 200 , and the storage controller 300 according to some embodiments, or may include the smart interface 100 , the storage controller 300 , and the memory controller according to some embodiments or may include the smart interface 100 , the accelerator 200 , the storage controller 300 , and the memory controller according to some embodiments.
- the smart storage device 1000 illustrated in FIG. 1 is a semiconductor device using a Computer eXpress Link (CXL) interface according to some embodiments.
- the smart interface 100 uses the CXL interface according to some embodiments.
- the CXL interface is a computer device interconnector standard, and is an interface that may reduce the overhead and waiting time of the host device and the smart storage device 1000 and may allow the storage space of the host memory and the memory device to be shared in a heterogeneous computing environment in which the host device 10 and the smart storage device 1000 operate together.
- FPGA field-programmable gate array
- the smart storage device 1000 of the present specification is based on the CXL standard.
- the host device 10 may be connected to at least one of the accelerator circuit 200 , the storage controller 300 , or the memory controller 400 through the smart interface 100 to control the overall operation of the smart storage device 1000 .
- the smart interface 100 is configured to utilize CXL sub-protocols such as CXL.io, CXL.cache, and CXL.mem.
- the CXL.io protocol is a PCIe transaction layer, which is used in the system for device discovery, interrupt management, providing access by registers, initialization processing, signal error processing, or the like.
- the CXL.cache protocol may be used when the accelerator circuit 200 accesses the host memory 12 of the host device.
- the CXL.mem protocol may be used when the host device 10 accesses an accelerator memory 290 of the accelerator circuit 200 (see FIG. 2 ) or the memory device 490 connected to the memory controller 400 (see FIG. 7 ).
- the accelerator circuit 200 may perform an acceleration computation according to a computation command of the host device 10 .
- the accelerator circuit 200 may be a neural network processing unit, an AI accelerator, a CPU, a graphical processing unit (GPU), a digital signal processing unit (DSP), a neural processing unit (NPU), a coprocessor, or another suitable processor.
- the storage controller 300 may be connected to at least one storage device 390 to control an operation of the storage device 390 .
- an access operation such as reading or deleting data stored in the storage device 390 or writing data may be included.
- the at least one storage device 390 may include a non-volatile memory device (for example, NAND memory device) or some other suitable form of memory.
- the memory controller 400 may be connected to at least one memory device 490 (see FIG. 7 ) to control an operation of the memory device 490 .
- an access operation such as reading or deleting data stored in the memory device 490 or writing data may be included.
- At least one storage device 390 connected to the storage controller 300 and at least one memory device 490 connected to the memory controller 400 may be included in the smart storage device 1000 , may be embedded, or may be implemented to be detachable. A detailed description is provided below.
- the memory controller 400 may maintain data coherence between the memory device 490 and the host memory 12 of the host device 10 with a very high bandwidth through the host device 10 and the CXL interface.
- the host device 10 may use the memory included in the smart storage device 1000 as a working memory of a host device 10 that supports cache coherence, and may access data through the memory or a load/store memory command.
- Data coherence may be performed by, for example, coherence processing according to the MESI protocol.
- the MESI protocol may define an inter-memory state between the memory device and the host device by including an invalid state, a shared state, a modified state, and an exclusive state, and may perform the coherence operation according to the defined state.
- the smart storage device 1000 may perform direct access through an internal connection between the accelerator circuit 200 and the storage controller 300 , or between the storage controller 300 and the memory controller 400 , without the intervention of the host device 10 .
- FIG. 2 is a block diagram showing the smart storage device of FIG. 1 according to some embodiments.
- FIG. 3 is a block diagram illustrating the accelerator circuit of FIG. 2 according to some embodiments, and
- FIG. 4 is a block diagram illustrating the storage controller of FIG. 2 according to some embodiments.
- the accelerator circuit 200 may be connected to the host device 10 through the CXL.cache protocol and the CXL.mem protocol of the smart interface 100 .
- the accelerator circuit 200 may transmit and receive a command (A.CMD) and computation data (A.cache/mem) to and from the host device 10 , and depending on the subject sending the data, may transmit and receive data by selecting one of the CXL.cache protocol or the CXL.mem protocol.
- CXL sub-protocols might be used to refer to a data bus conforming to the respective CXL sub-protocol.
- the accelerator circuit 200 is connected to the host device 10 through the CXL.cache protocol and the CXL.mem protocol, it may be understood that the accelerator circuit 200 is connected to the host device 10 though a data but that operates pursuant to the CXL.cache and CXL.mem sub-protocols of the CXL protocol.
- the accelerator circuit 200 may include at least one accelerator memory 290 .
- the accelerator memory 290 of the accelerator circuit 200 may be dedicated to the accelerator circuit 200 , which may be understood to mean that the memory 290 is only accessible by the accelerator circuit 200 and is not accessible by any other device independent of the accelerator circuit 200 . Thus, accelerator memory is not shared memory.
- the accelerator memory 290 may be a non-volatile memory or a volatile memory according to various embodiments.
- the accelerator memory 290 as a working memory may be a volatile memory such as dynamic RAM (DRAM), static RAM (SRAM), or synchronous dynamic RAM (SDRAM) according to some embodiments, or may be at least one of non-volatile memories according to some embodiments.
- DRAM dynamic RAM
- SRAM static RAM
- SDRAM synchronous dynamic RAM
- the accelerator memory 290 may be implemented by being embedded in the accelerator circuit 200 according to various embodiments, may be electrically connected by being disposed outside the accelerator circuit 200 , or may be implemented as a detachable/removable memory to the accelerator circuit 200 .
- the storage controller 300 may be connected to the host device 10 through the CXL.io protocol of the smart interface 100 .
- the host device 10 and the storage controller 300 may transmit and receive a data access request (S.CMD) and data (S.Data) through the CXL.io protocol of the smart interface 100 .
- S.CMD data access request
- S.Data data access request
- the storage controller 300 may include at least one storage device 390 .
- the storage device 390 may be a non-volatile memory device, and the non-volatile memory may include, for example, a flash memory (e.g., NAND flash or NOR flash, or the like), a hard drive, or a solid state drive (SSD) or other storage technology.
- a flash memory e.g., NAND flash or NOR flash, or the like
- NOR flash e.g., NAND flash or NOR flash, or the like
- SSD solid state drive
- the storage device 390 may perform delete, write, or read operation, or the like of data under the control of the storage controller 300 .
- the storage device 390 receives a command CMD and an address ADDR from the storage controller 300 through an input/output line, and transmits and receives data DATA for a program operation or a read operation to and from the storage controller 300 .
- the storage device 390 may receive a control signal CTRL through the control line, and the storage device 390 may receive power PWR from the storage controller 300 .
- the accelerator circuit 200 and the storage controller 300 may be connected to each other through an internal bus Ipath 1 .
- the accelerator circuit 200 may directly access the storage controller 300 through the internal bus Ipath 1 .
- the accelerator circuit 200 may directly request access to data of the storage device 390 without intervention of the host device 10 .
- the accelerator circuit 200 may include a command decoder circuit 210 , a coherency engine 220 (which may also be referred to as a coherence engine), a direct memory access (DMA) engine 230 , a accelerator memory controller 240 , and a computation module 250 according to some embodiments, and the respective components may be electrically connected to each other through an accelerator system bus 201 .
- a coherency engine 220 which may also be referred to as a coherence engine
- DMA direct memory access
- accelerator memory controller 240 the accelerator memory controller 240
- computation module 250 may be electrically connected to each other through an accelerator system bus 201 .
- the phrase “engine” may refer to a logic circuit executing commands to perform a particular function.
- the command decoder circuit 210 When receiving a command, for example, a computation command from the host device 10 , the command decoder circuit 210 decodes the received computation command to extract computation information.
- the computation information may include, for example, a computation type, an address of data to be computed, or the like.
- the coherency engine 220 maintains coherency between the data stored in the accelerator memory 290 of the accelerator circuit 200 and the data in the memory 12 of the host device 10 .
- coherence processing is performed so that the host device 10 uses the data stored in the accelerator memory 290 of the accelerator circuit 200 as a host-attached memory.
- the coherency engine 220 may perform coherence processing through the CXL.cache protocol to store the computation data also in the memory 12 of the host device in the same manner.
- the host device 10 may perform coherence processing for sharing data in the memory 12 of the host device to the accelerator memory 290 through the CXL.mem protocol.
- the DMA engine 230 may be connected to the internal bus Ipath 1 and may directly access the storage controller 300 . When it is necessary to write or read data to or from the storage device 390 according to the request of the computation module 250 or the host device 10 , the DMA engine 230 may request data access to the storage controller 300 .
- the accelerator memory controller 240 may control an operation of the accelerator memory 290 . For example, control may be performed so that computation data stored in the accelerator memory 290 is read or deleted, or new computation data is written.
- the computation module 250 may perform acceleration computation according to the decoded computation command. Acceleration computation may include signal processing and image signal processing according to some embodiments as well as computation processing based on various types of networks such as neural processing, for example, convolution neural network (CNN), region with convolution neural network (R-CNN), region proposal network (RPN), recurrent neural network (RNN), stacking-based deep neural network (S-DNN), state-space dynamic neural network (S-SDNN), deconvolution network, deep belief network (DBN), restricted Boltzman machine (RBM), fully convolutional network, long short-term memory (LSTM) network, classification network, or the like.
- CNN convolution neural network
- R-CNN region with convolution neural network
- RPN region proposal network
- RNN recurrent neural network
- S-DNN stacking-based deep neural network
- S-SDNN state-space dynamic neural network
- deconvolution network deep belief network
- DNN deep belief network
- RBM restricted Boltzman machine
- LSTM long short
- the storage controller 300 may include a scheduler 310 , a control unit 320 , an internal memory 330 , and a non-volatile memory controller 340 according to some embodiments, and the respective components may be electrically connected to each other through an internal system bus 301 .
- the scheduler 310 may be connected to each of the internal bus Ipath 1 and the smart interface 100 , and may schedule the operation sequence according to a preset policy when receiving an access request from the host device 10 and an access request from the accelerator circuit 200 .
- the preset policy may be to give priority to an access request from the accelerator circuit 200 over an access request from the host device 10 according to some embodiments. Alternatively, priority may be given to process the urgent request of the host device 10 before other requests that have already been ordered.
- the control unit 320 may control the overall operation of the storage controller 300 , and may perform, for example, data access operations such as writing, reading, or deleting data in the storage device 390 and the internal operation of the storage device 390 , or the like.
- the internal memory 330 may be a working memory of the storage controller 300 and may store operation data generated while the storage controller 300 is driven.
- the non-volatile memory controller 340 may control at least one non-volatile memory device 390 connected to the storage controller 300 .
- FIGS. 5 and 6 are flowcharts illustrating an operating method of the smart storage device of FIG. 2 .
- the host device 10 transmits a command to the smart storage device 1000 (step S 10 ).
- the smart interface 100 of the smart storage device 1000 checks which constituent the command is for and selects and transmits a protocol of the corresponding component (step S 11 ). For example, when the host device 10 sends a computation command, the smart interface 100 connects through a protocol (CXL.cache or CXL.mem) for the accelerator circuit 200 .
- a protocol CXL.cache or CXL.mem
- the accelerator circuit 200 extracts computation information by decoding a received computation command CMD 1 (step S 12 ).
- the computation information may include, for example, a computation type, an address of data necessary for the computation, or the like.
- the computation command may include at least one operation to be performed by the accelerator circuit 200 .
- the computation command CMD 1 indicates a case where acceleration computation is performed based on data of the storage device 390 .
- the accelerator circuit 200 transmits a data access request to the storage controller 300 (step S 13 ).
- the access request may be directly requested to the storage controller 300 through the internal bus Ipath 1 without intervention of the host device 10 .
- the storage controller 300 When receiving the access request from the accelerator circuit 200 (step S 14 ), the storage controller 300 performs an operation according to the access request on the storage device 390 in an operation order determined according to a preset policy (step S 15 ). For example, the storage controller 300 schedules a plurality of access requests according to a preset policy through a scheduler to determine an operation order.
- the control unit 320 and the non-volatile memory controller 340 perform an access operation on the non-volatile memory device 390 according to an order determined by the scheduler 310 .
- the storage controller 300 transmits the performance result of the access to the accelerator circuit 200 (step S 16 ). For example, in the case of a data read request, the read data (hereinafter, first data) is returned, and in the case of a data write or deletion request, the performance completion is returned.
- the read data hereinafter, first data
- the performance completion is returned.
- the accelerator circuit 200 When receiving a performance result, for example, the read first data (step S 17 ), the accelerator circuit 200 performs coherence processing with the host device 10 to store the data in the accelerator memory 290 (step S 18 ). At this time, coherence processing may be performed through the CXL.cache protocol. The coherence processing may be performed by the coherence-related component on the side of the host device 10 and the coherency engine 220 , and after the coherency engine 220 confirms completion of the coherence processing from the host device 10 , the first data may be stored in the accelerator memory 290 through the accelerator memory controller 240 (step S 19 ).
- the accelerator circuit 200 reads the first data stored in the accelerator memory 290 as a subsequent operation and performs a computation (step S 20 ). In this case, the computation may be based on the type of computation included in the computation information.
- the accelerator circuit 200 performs coherence processing with the host device 10 to store the second data generated by performing the computation in the accelerator memory 290 (step S 21 ). At this time, coherence processing may be performed through the CXL.cache protocol.
- the accelerator memory controller 240 stores the second data in the accelerator memory 290 (step S 22 ).
- the accelerator circuit 200 transmits a completion message to the host device 10 through the smart interface 100 (step S 23 ).
- the completion message may include the second data or a value set based on the second data.
- the completion message is thereafter received by the host device 10 (step S 25 ).
- the above-described embodiment assumes a case where an acceleration computation is performed using data stored in the storage device 390 , but the embodiment of the present disclosure is not limited thereto, and the acceleration computation may be performed based on the accelerator memory 290 or the initial data of the memory 12 of the host device.
- sharing the acceleration computation result with the host device 10 may be performed as in the steps S 19 to S 25 , but the steps S 13 to S 17 might not be performed depending on the position of the initial data to be read.
- the smart storage device 1000 checks which constituent is targeted at the smart interface 100 , and selects and transmits the protocol of the corresponding component (step S 31 ). For example, when the host device 10 requests data access, the smart interface 100 connects to the storage controller 300 through the CXL.io protocol.
- the scheduler 310 determines an operation sequence according to a preset policy.
- the control unit 320 and the non-volatile memory controller 340 perform a data access operation according to an order determined by the scheduler 310 (step S 32 ).
- the storage controller 300 transmits the performance result of the step S 32 to the host device 10 (step S 33 ). For example, when the command CMD 2 is a data read request, the read data is transmitted to the host device 10 , and when it is a data write or deletion request, the performance completion is transmitted to the host device 10 .
- the host device 10 receives the performance result through the storage controller 300 and the CXL.io protocol (step S 34 ).
- FIG. 7 is a block diagram showing the smart storage device of FIG. 1 according to some embodiments.
- FIG. 8 is a block diagram showing the smart interface of FIG. 1 according to some embodiments.
- the smart storage device 1000 may transform a signal received from the host device 10 into a signal of the CXL.mem protocol, the CXL.io protocol, or the CXL.cache protocol in the smart interface 100 , and may transmit the signal to each of the components 200 , 300 , and 400 .
- the smart interface 100 may include a plurality of layers to communicate with the host device 10 .
- Each layer may interpret the electrical signal transmitted and received based on a preset definition, and may transform the signal into a signal for operating each of the components (e.g., 200 , 300 , and 400 ) in the smart storage device 1000 .
- the smart interface 100 may include a physical layer 110 , an arbiter 120 , a link layer 130 , and a transaction layer 140 , and each configuration will be said to operate based on the CXL interface standard.
- the smart interface 100 may further include various other communication layers.
- the physical layer 110 interprets an electrical signal transmitted to the host device 10 (TX) or received from the host device 10 (RX).
- the arbiter 120 may multiplex to decide which sub-protocol is used to send the signal outputted from the physical layer 110 .
- the accelerator circuit 200 it is outputted to a CXL.cache or CXL.mem link layer 131
- the memory device 490 , the storage device 390 , or a heterogeneous device using a PCI interface it is outputted to a CXL.io link layer 132 or a PCIe link layer 133 .
- the transaction layer 140 receives a signal transmitted through the CXL.cache or CXL.mem link layer 131 , the CXL.io link layer 132 , or the PCIe link layer 133 through transaction layers 141 , 142 , and 143 corresponding to each protocol, and generates an output.
- the smart interface 100 includes an internal connection Ipath 2 directly connecting the CXL.mem protocol and the CXL.io protocol, and the internal connection Ipath 2 directly connects data access between the memory controller 400 and the storage controller 300 .
- the CXL.cache or CXL.mem link layer 131 and the CXL.io link layer 132 may be directly connected to each other through an internal connection bus IPath 2 .
- the storage controller 300 may be connected to the host device 10 through the CXL.io protocol of the smart interface 100 .
- the memory controller 400 may be connected through the CXL.mem protocol or the CXL.io protocol
- the storage controller 300 may be connected through the CXL.io protocol.
- the smart storage device 1000 may further include a router 500 , a memory protocol handler 700 , and a storage protocol handler 600 for more efficient data access among the components 200 , 300 , and 400 .
- the router 500 may be connected to the CXL.io transaction layer 142 and may route a signal received from the transaction layer to the memory controller 400 or the storage controller 300 .
- the router 500 may be disposed within the smart interface 100 according to some embodiments, and may be separately disposed and implemented with respect to each of the smart interface 100 , the storage controller 300 , and the memory controller 400 according to some embodiments.
- the memory protocol handler 700 may be connected between the CXL.mem transaction layer 141 and the router 500 , and the memory controller 400 , may receive and transfer a data access request for the memory device 490 to the memory controller 400 , and may return a request result from the memory controller 400 to the transaction layer 141 or the router 500 .
- the memory protocol handler 700 may be disposed within the smart interface 100 according to some embodiments, may be separately disposed and implemented with respect to the memory controller 400 according to some embodiments, respectively, and may be disposed within the memory controller 400 according to some embodiments.
- the storage protocol handler 600 may be connected between the router 500 and the storage controller 300 , may receive and transfer a data access request for the storage device 390 to the storage controller 300 , and may return the request result to the transaction layer 142 .
- the storage protocol handler 600 may be disposed within the smart interface 100 according to some embodiments, may be separately disposed and implemented with respect to the storage controller 300 according to some embodiments, respectively, and may be disposed within the storage controller 300 according to some embodiments.
- the storage protocol handler 600 parses the access command to check the address of the requested data.
- the access command may include an operation type, an address of data, or data.
- the storage protocol handler 600 transfers the access command to the storage controller 300 .
- the storage protocol handler 600 transfers the access command to the memory protocol handler 700 through the internal connection Ipath 2 .
- the memory controller 400 may perform a data access operation for the memory device 490 based on an access command transferred through the memory protocol handler 700 and notify the storage protocol handler 600 of the performance result.
- the storage protocol handler 600 may notify the host device 10 of the performance completion through the smart interface 100 .
- the memory controller 400 may control an operation of the memory device 490 . For example, control may be performed so that computation data stored in the memory device 490 may be read or deleted, or new computation data may be written.
- the memory device 490 may be a volatile memory such as dynamic RAM (DRAM), static RAM (SRAM), or synchronous dynamic RAM (SDRAM) according to some embodiments, or may be at least one of non-volatile memories according to some embodiments.
- a nonvolatile memory may be implemented as at least one of, for example, one time programmable ROM (OTPROM), programmable ROM (PROM), erasable and programmable ROM (EPROM), electrically erasable and programmable ROM (EEPROM), mask ROM, flash ROM, flash memory (e.g., NAND flash or NOR flash, or the like), a hard drive, or a solid state drive (SSD).
- OTPROM one time programmable ROM
- PROM programmable ROM
- EPROM erasable and programmable ROM
- EEPROM electrically erasable and programmable ROM
- mask ROM mask ROM
- flash ROM flash memory
- flash memory e.g., NAND flash or
- a memory card e.g., compact flash (CF), secure digital (SD), micro secure digital (Micro-SD)), mini secure digital (Mini-SD), extreme digital (xD), multi-media card (MMC), or the like
- CF compact flash
- SD secure digital
- Micro-SD micro secure digital
- Mini-SD mini secure digital
- xD extreme digital
- MMC multi-media card
- USB memory an external memory
- the memory device 490 may be, for example, a working memory of the smart storage device 1000 .
- the memory device 490 may be implemented in the form of an embedded memory according to the purpose of storing data, or may be implemented in the form of a removable memory.
- data for driving the smart storage device 1000 may be stored in a memory embedded in the smart storage device 1000
- data for an extended function of the smart storage device 1000 may be stored in a memory that is removable in the smart storage device 1000 .
- FIGS. 9 to 11 are flowcharts illustrating an operating method of the smart storage device of FIG. 7 .
- step S 100 when the host device 10 transmits a command (step S 100 ), the smart interface 100 checks a protocol based on a target to which the command is to be transmitted (step S 101 ).
- the storage protocol handler 600 checks the address information of the data to be requested from the parsed command (step S 111 ), and when the address information (Storage ID) belongs to the storage device 390 , the data access command CMD 1 is transmitted to the storage controller 300 .
- the storage controller 300 reads the first data based on the address information (step S 113 ) based on the data access command CMD 1 (step S 112 ) and transmits the data to the storage protocol handler 600 .
- the storage protocol handler 600 notifies the host device 10 of the performance completion by sending a completion message (step S 119 ) that is received by the host 10 (step S 120 ).
- the storage protocol handler 600 transfers the data access command CMD 2 and the first data to the memory protocol handler 700 through the internal connection IPath 2 .
- the memory protocol handler 700 receives the data access command CMD 2 and first data (step S 115 ), and requests data access to the memory device 490 from the memory controller 400 (step S 116 ).
- the memory controller 400 writes the second data to the memory device 490 (step S 117 ) and transmits the access completion message to the memory protocol handler 700 .
- the memory protocol handler 700 notifies the storage protocol handler 600 of the performance completion (step S 118 ), and the storage protocol handler 600 finally notifies the host device 10 of the performance completion (step S 120 ).
- the storage controller 300 receives the data access command CMD 2 from the storage protocol handler 600 (S 130 ) and generates a request for data (step S 131 ).
- the storage protocol handler 600 transfers the data access command CMD 3 to the memory protocol handler 700 through the internal connection IPath 2 .
- the memory protocol handler 700 receives the data access command CMD 3 (step S 133 ), and requests data access to the memory device 490 from the memory controller 400 (step S 134 ).
- the memory controller 400 reads the third data from the memory device 490 (step S 135 ).
- the storage controller 300 writes the third data received through the internal connection Ipath 2 to the storage device 390 (step S 136 ) and transmits a completion message to the memory protocol handler 700 (step S 137 ).
- the storage protocol handler 600 finally notifies the host device 10 of the performance completion (step S 138 ).
- the command CMD sent from the host device 10 is transmitted to the memory protocol handler 700 (step S 140 ).
- the memory protocol handler 700 transmits an access request according to a command CMD 4 to the memory controller 400 (step S 141 ), and the memory controller 400 performs an operation corresponding to the request for the memory device 490 (step S 142 ), and then notifies the memory protocol handler 700 of the performance result.
- the memory protocol handler 700 transfers the performance result to the host device 10 (step S 143 ) and the host 10 receives the performance result (step S 144 ).
- the above-described smart storage device 1000 may allow the host device 10 to consider at least two constituents among components of the smart storage device 1000 , for example, an accelerator circuit, a storage device, and a memory device, as a separate device or a single device through a single smart interface. Accordingly, the latency overhead in which the host device 10 must intervene to transfer data may be reduced, and since the physical connection between the components is provided, software overhead for maintaining data coherence may be reduced.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
- This application claims priority under 35 U.S.C. 119 to Korean Patent Application No. 10-2020-0126199, filed on Sep. 28, 2020, and Korean Patent Application No. 10-2021-0007897, filed on Jan. 20, 2021, the contents of which are herein incorporated by reference in their entirety.
- The present disclosure relates to a storage device, and more particularly, to a storage device using a Computer eXpress Link (CXL) interface.
- Modern storage devices are capable of storing larger amounts of data and are equipped to operate at faster speeds.
- However, host devices, such as central processing units (CPU) and graphics processing units (GPU), are most often connected to semiconductor devices, such as memory devices, through data buses operating pursuant to peripheral component interconnect express (PCIe) protocols. Data buses such as PCIe have a relatively low bandwidth and long delays, and problems related to coherency and memory sharing may commonly occur with semiconductor devices arranged in this manner.
- A smart storage device includes a smart interface connected to a host device, an accelerator circuit connected to the smart interface through a data bus conforming to compute express link (CXL).cache protocol and a CXL.mem protocol, and configured to perform acceleration computation in response to a computation command of the host device and a storage controller connected to the smart interface through a data bus conforming to CXL.io protocol and configured to control a data access operation for a storage device in response to a data access command of the host device. The accelerator circuit is directly accessible to the storage device through an internal bus connected directly to the storage controller.
- A smart storage device includes a smart interface connected to a host device, a memory controller circuit connected to the smart interface through a data bus conforming to CXL.cache protocol and a CXL.mem protocol, and configured to control a first access operation for a memory device. A storage controller is connected to the smart interface through a data bus conforming to CXL.io protocol and configured to control a second access operation for a storage device. The smart interface includes an internal connection directly connecting the data bus conforming to the CXL.mem protocol and the CXL.io protocol to directly access the memory controller and the storage controller.
- A smart storage device includes a smart interface connected to a host device. An accelerator circuit is connected to the smart interface through a data bus conforming to CXL.cache protocol and CXL.mem protocol, and configured to perform acceleration computation in response to a computation command of the host device. A storage controller is connected to the smart interface through a data bus conforming to a CXL.io protocol and configured to control a data access operation for a storage device in response to a data access command of the host device. An accelerator memory controller circuit is connected to the smart interface through the data bus conforming to the CXL.cache protocol and the CXL.mem protocol, and configured to control a second access operation for an accelerator memory device. The storage controller is directly accessible to the accelerator circuit and the accelerator memory controller circuit.
- A method of operating a smart storage device includes receiving a command from a host device, transmitting the command to an accelerator circuit through a compute express link (CXL) interface, requesting, by the accelerator circuit, data access from a storage controller through an internal bus based on computation information extracted by decoding the command, accessing, by the storage controller, data from a storage device according to the request and receiving, by the accelerator circuit, a data access result received from the storage device to perform acceleration computation based on the command.
- The above and other aspects and features of the present disclosure will become more apparent by describing in detail various embodiments thereof with reference to the attached drawings, in which:
-
FIG. 1 is a block diagram illustrating a smart storage device in accordance with example embodiments of the present disclosure; -
FIG. 2 is a block diagram illustrating the smart storage device ofFIG. 1 ; -
FIG. 3 is a block diagram illustrating the accelerator circuit ofFIG. 2 ; -
FIG. 4 is a block diagram illustrating the storage controller ofFIG. 2 ; -
FIGS. 5 and 6 are flowcharts illustrating a method of operating the smart storage device ofFIG. 2 ; -
FIG. 7 is a block diagram illustrating the smart storage device ofFIG. 1 ; -
FIG. 8 is a block diagram illustrating the smart interface ofFIG. 1 ; and -
FIGS. 9 to 11 are flowcharts illustrating a method of operating the smart storage device ofFIG. 7 . - Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings.
-
FIG. 1 is a block diagram illustrating a smart storage device according to embodiments of the present disclosure. - In some embodiments, a
host device 10 may correspond to a central processing unit (CPU), a graphic processing unit (GPU), a neural processing unit (NPU), a field-programmable gate array (FPGA), a processor, a microprocessor, an application processor (AP), or the like. According to some embodiments, thehost device 10 may be implemented as a system-on-a-chip (SoC). For example, thehost device 10 may be a mobile system such as a portable communication terminal (mobile phone), a smart phone, a tablet computer, a wearable device, a healthcare device, an Internet of Things (IoT) device, a personal computer, a laptop/notebook computer, a server, a media player, or an automotive device such as a satellite navigation system. In addition, thehost device 10 may include a communication device configured to transmit and receive signals between other devices outside thehost device 10 according to various communication protocols. The communication device is a device that connects thehost device 10 to a wired or wireless connection, and may include, for example, an antenna, a transceiver, and/or a modem. Thehost device 10 may be connected to, for example, an Ethernet network or may be connected to a wireless network through the communication device. - The
host device 10 may include ahost processor 11 and ahost memory 12. Thehost processor 11 may control the overall operation of thehost device 10, and thehost memory 12 is a working memory and may store instructions, programs, data, or the like, that may be necessary for the operation of thehost processor 11. - A
smart storage device 1000 may be a data center or an artificial intelligence learning data device according to embodiments of the present disclosure. Thesmart storage device 1000 may be a semiconductor device capable of performing computations and storing data, such as processing-in-memory (PIM) or computing-in-memory (CIM). - The
smart storage device 1000 may include asmart interface 100, anaccelerator circuit 200, astorage controller 300, and amemory controller 400. Thesmart storage device 1000 may include thesmart interface 100, theaccelerator 200, and thestorage controller 300 according to some embodiments, or may include thesmart interface 100, thestorage controller 300, and the memory controller according to some embodiments or may include thesmart interface 100, theaccelerator 200, thestorage controller 300, and the memory controller according to some embodiments. - The
smart storage device 1000 illustrated inFIG. 1 is a semiconductor device using a Computer eXpress Link (CXL) interface according to some embodiments. Thesmart interface 100 uses the CXL interface according to some embodiments. - The CXL interface is a computer device interconnector standard, and is an interface that may reduce the overhead and waiting time of the host device and the
smart storage device 1000 and may allow the storage space of the host memory and the memory device to be shared in a heterogeneous computing environment in which thehost device 10 and thesmart storage device 1000 operate together. For example, thehost device 10 and the system-on-chip, GPU, which performs complex computations, and an acceleration module, such as a field-programmable gate array (FPGA), directly communicate and share memory. Thesmart storage device 1000 of the present specification is based on the CXL standard. - The
host device 10 may be connected to at least one of theaccelerator circuit 200, thestorage controller 300, or thememory controller 400 through thesmart interface 100 to control the overall operation of thesmart storage device 1000. - The
smart interface 100 is configured to utilize CXL sub-protocols such as CXL.io, CXL.cache, and CXL.mem. The CXL.io protocol is a PCIe transaction layer, which is used in the system for device discovery, interrupt management, providing access by registers, initialization processing, signal error processing, or the like. The CXL.cache protocol may be used when theaccelerator circuit 200 accesses thehost memory 12 of the host device. The CXL.mem protocol may be used when thehost device 10 accesses anaccelerator memory 290 of the accelerator circuit 200 (seeFIG. 2 ) or thememory device 490 connected to the memory controller 400 (seeFIG. 7 ). - The
accelerator circuit 200 may perform an acceleration computation according to a computation command of thehost device 10. According to some embodiments, theaccelerator circuit 200 may be a neural network processing unit, an AI accelerator, a CPU, a graphical processing unit (GPU), a digital signal processing unit (DSP), a neural processing unit (NPU), a coprocessor, or another suitable processor. - The
storage controller 300 may be connected to at least onestorage device 390 to control an operation of thestorage device 390. For example, an access operation such as reading or deleting data stored in thestorage device 390 or writing data may be included. The at least onestorage device 390 may include a non-volatile memory device (for example, NAND memory device) or some other suitable form of memory. - The
memory controller 400 may be connected to at least one memory device 490 (seeFIG. 7 ) to control an operation of thememory device 490. For example, an access operation such as reading or deleting data stored in thememory device 490 or writing data may be included. - According to some embodiments, at least one
storage device 390 connected to thestorage controller 300 and at least onememory device 490 connected to thememory controller 400 may be included in thesmart storage device 1000, may be embedded, or may be implemented to be detachable. A detailed description is provided below. - The
memory controller 400 may maintain data coherence between thememory device 490 and thehost memory 12 of thehost device 10 with a very high bandwidth through thehost device 10 and the CXL interface. For example, thehost device 10 may use the memory included in thesmart storage device 1000 as a working memory of ahost device 10 that supports cache coherence, and may access data through the memory or a load/store memory command. Data coherence may be performed by, for example, coherence processing according to the MESI protocol. The MESI protocol may define an inter-memory state between the memory device and the host device by including an invalid state, a shared state, a modified state, and an exclusive state, and may perform the coherence operation according to the defined state. - When performing data access among the
accelerator circuit 200, thestorage controller 300, and thememory controller 400, thesmart storage device 1000 may perform direct access through an internal connection between theaccelerator circuit 200 and thestorage controller 300, or between thestorage controller 300 and thememory controller 400, without the intervention of thehost device 10. -
FIG. 2 is a block diagram showing the smart storage device ofFIG. 1 according to some embodiments.FIG. 3 is a block diagram illustrating the accelerator circuit ofFIG. 2 according to some embodiments, andFIG. 4 is a block diagram illustrating the storage controller ofFIG. 2 according to some embodiments. - Referring to
FIGS. 2 to 4 , according to some embodiments, theaccelerator circuit 200 may be connected to thehost device 10 through the CXL.cache protocol and the CXL.mem protocol of thesmart interface 100. Theaccelerator circuit 200 may transmit and receive a command (A.CMD) and computation data (A.cache/mem) to and from thehost device 10, and depending on the subject sending the data, may transmit and receive data by selecting one of the CXL.cache protocol or the CXL.mem protocol. - As described herein, reference to the various CXL sub-protocols might be used to refer to a data bus conforming to the respective CXL sub-protocol. Thus, when it is said that the
accelerator circuit 200 is connected to thehost device 10 through the CXL.cache protocol and the CXL.mem protocol, it may be understood that theaccelerator circuit 200 is connected to thehost device 10 though a data but that operates pursuant to the CXL.cache and CXL.mem sub-protocols of the CXL protocol. - The
accelerator circuit 200 may include at least oneaccelerator memory 290. Theaccelerator memory 290 of theaccelerator circuit 200 may be dedicated to theaccelerator circuit 200, which may be understood to mean that thememory 290 is only accessible by theaccelerator circuit 200 and is not accessible by any other device independent of theaccelerator circuit 200. Thus, accelerator memory is not shared memory. Theaccelerator memory 290 may be a non-volatile memory or a volatile memory according to various embodiments. Theaccelerator memory 290 as a working memory may be a volatile memory such as dynamic RAM (DRAM), static RAM (SRAM), or synchronous dynamic RAM (SDRAM) according to some embodiments, or may be at least one of non-volatile memories according to some embodiments. - The
accelerator memory 290 may be implemented by being embedded in theaccelerator circuit 200 according to various embodiments, may be electrically connected by being disposed outside theaccelerator circuit 200, or may be implemented as a detachable/removable memory to theaccelerator circuit 200. - According to some embodiments, the
storage controller 300 may be connected to thehost device 10 through the CXL.io protocol of thesmart interface 100. Thehost device 10 and thestorage controller 300 may transmit and receive a data access request (S.CMD) and data (S.Data) through the CXL.io protocol of thesmart interface 100. - The
storage controller 300 may include at least onestorage device 390. Thestorage device 390 may be a non-volatile memory device, and the non-volatile memory may include, for example, a flash memory (e.g., NAND flash or NOR flash, or the like), a hard drive, or a solid state drive (SSD) or other storage technology. - The
storage device 390 may perform delete, write, or read operation, or the like of data under the control of thestorage controller 300. To this end, thestorage device 390 receives a command CMD and an address ADDR from thestorage controller 300 through an input/output line, and transmits and receives data DATA for a program operation or a read operation to and from thestorage controller 300. In addition, thestorage device 390 may receive a control signal CTRL through the control line, and thestorage device 390 may receive power PWR from thestorage controller 300. - According to some embodiments, the
accelerator circuit 200 and thestorage controller 300 may be connected to each other through an internal bus Ipath1. Theaccelerator circuit 200 may directly access thestorage controller 300 through the internal bus Ipath1. For example, theaccelerator circuit 200 may directly request access to data of thestorage device 390 without intervention of thehost device 10. - As is shown in
FIG. 3 , theaccelerator circuit 200 may include acommand decoder circuit 210, a coherency engine 220 (which may also be referred to as a coherence engine), a direct memory access (DMA)engine 230, aaccelerator memory controller 240, and acomputation module 250 according to some embodiments, and the respective components may be electrically connected to each other through anaccelerator system bus 201. As used herein, the phrase “engine” may refer to a logic circuit executing commands to perform a particular function. - When receiving a command, for example, a computation command from the
host device 10, thecommand decoder circuit 210 decodes the received computation command to extract computation information. The computation information may include, for example, a computation type, an address of data to be computed, or the like. - The
coherency engine 220 maintains coherency between the data stored in theaccelerator memory 290 of theaccelerator circuit 200 and the data in thememory 12 of thehost device 10. For example, coherence processing is performed so that thehost device 10 uses the data stored in theaccelerator memory 290 of theaccelerator circuit 200 as a host-attached memory. For example, when new computation data is stored in theaccelerator memory 290, thecoherency engine 220 may perform coherence processing through the CXL.cache protocol to store the computation data also in thememory 12 of the host device in the same manner. Similarly, thehost device 10 may perform coherence processing for sharing data in thememory 12 of the host device to theaccelerator memory 290 through the CXL.mem protocol. - The
DMA engine 230 may be connected to the internal bus Ipath1 and may directly access thestorage controller 300. When it is necessary to write or read data to or from thestorage device 390 according to the request of thecomputation module 250 or thehost device 10, theDMA engine 230 may request data access to thestorage controller 300. - The
accelerator memory controller 240 may control an operation of theaccelerator memory 290. For example, control may be performed so that computation data stored in theaccelerator memory 290 is read or deleted, or new computation data is written. - The
computation module 250 may perform acceleration computation according to the decoded computation command. Acceleration computation may include signal processing and image signal processing according to some embodiments as well as computation processing based on various types of networks such as neural processing, for example, convolution neural network (CNN), region with convolution neural network (R-CNN), region proposal network (RPN), recurrent neural network (RNN), stacking-based deep neural network (S-DNN), state-space dynamic neural network (S-SDNN), deconvolution network, deep belief network (DBN), restricted Boltzman machine (RBM), fully convolutional network, long short-term memory (LSTM) network, classification network, or the like. - As is shown in
FIG. 4 , thestorage controller 300 may include ascheduler 310, acontrol unit 320, aninternal memory 330, and anon-volatile memory controller 340 according to some embodiments, and the respective components may be electrically connected to each other through aninternal system bus 301. - The
scheduler 310 may be connected to each of the internal bus Ipath1 and thesmart interface 100, and may schedule the operation sequence according to a preset policy when receiving an access request from thehost device 10 and an access request from theaccelerator circuit 200. The preset policy may be to give priority to an access request from theaccelerator circuit 200 over an access request from thehost device 10 according to some embodiments. Alternatively, priority may be given to process the urgent request of thehost device 10 before other requests that have already been ordered. - The
control unit 320 may control the overall operation of thestorage controller 300, and may perform, for example, data access operations such as writing, reading, or deleting data in thestorage device 390 and the internal operation of thestorage device 390, or the like. - The
internal memory 330 may be a working memory of thestorage controller 300 and may store operation data generated while thestorage controller 300 is driven. - The
non-volatile memory controller 340 may control at least onenon-volatile memory device 390 connected to thestorage controller 300. -
FIGS. 5 and 6 are flowcharts illustrating an operating method of the smart storage device ofFIG. 2 . - Referring to
FIG. 5 , first, thehost device 10 transmits a command to the smart storage device 1000 (step S10). Thesmart interface 100 of thesmart storage device 1000 checks which constituent the command is for and selects and transmits a protocol of the corresponding component (step S11). For example, when thehost device 10 sends a computation command, thesmart interface 100 connects through a protocol (CXL.cache or CXL.mem) for theaccelerator circuit 200. - The
accelerator circuit 200 extracts computation information by decoding a received computation command CMD1 (step S12). The computation information may include, for example, a computation type, an address of data necessary for the computation, or the like. According to some embodiments, the computation command may include at least one operation to be performed by theaccelerator circuit 200. In the embodiment described below, it is assumed that the computation command CMD1 indicates a case where acceleration computation is performed based on data of thestorage device 390. - The
accelerator circuit 200 transmits a data access request to the storage controller 300 (step S13). In this case, the access request may be directly requested to thestorage controller 300 through the internal bus Ipath1 without intervention of thehost device 10. - When receiving the access request from the accelerator circuit 200 (step S14), the
storage controller 300 performs an operation according to the access request on thestorage device 390 in an operation order determined according to a preset policy (step S15). For example, thestorage controller 300 schedules a plurality of access requests according to a preset policy through a scheduler to determine an operation order. Thecontrol unit 320 and thenon-volatile memory controller 340 perform an access operation on thenon-volatile memory device 390 according to an order determined by thescheduler 310. - The
storage controller 300 transmits the performance result of the access to the accelerator circuit 200 (step S16). For example, in the case of a data read request, the read data (hereinafter, first data) is returned, and in the case of a data write or deletion request, the performance completion is returned. - When receiving a performance result, for example, the read first data (step S17), the
accelerator circuit 200 performs coherence processing with thehost device 10 to store the data in the accelerator memory 290 (step S18). At this time, coherence processing may be performed through the CXL.cache protocol. The coherence processing may be performed by the coherence-related component on the side of thehost device 10 and thecoherency engine 220, and after thecoherency engine 220 confirms completion of the coherence processing from thehost device 10, the first data may be stored in theaccelerator memory 290 through the accelerator memory controller 240 (step S19). - The
accelerator circuit 200 reads the first data stored in theaccelerator memory 290 as a subsequent operation and performs a computation (step S20). In this case, the computation may be based on the type of computation included in the computation information. Theaccelerator circuit 200 performs coherence processing with thehost device 10 to store the second data generated by performing the computation in the accelerator memory 290 (step S21). At this time, coherence processing may be performed through the CXL.cache protocol. When thecoherency engine 220 confirms completion of the coherence processing from thehost device 10, theaccelerator memory controller 240 stores the second data in the accelerator memory 290 (step S22). - When all of one or more operations according to the computation command CMD1 are completed, the
accelerator circuit 200 transmits a completion message to thehost device 10 through the smart interface 100 (step S23). In this case, the completion message may include the second data or a value set based on the second data. The completion message is thereafter received by the host device 10 (step S25). - The above-described embodiment assumes a case where an acceleration computation is performed using data stored in the
storage device 390, but the embodiment of the present disclosure is not limited thereto, and the acceleration computation may be performed based on theaccelerator memory 290 or the initial data of thememory 12 of the host device. In this case, sharing the acceleration computation result with thehost device 10 may be performed as in the steps S19 to S25, but the steps S13 to S17 might not be performed depending on the position of the initial data to be read. - Meanwhile, referring to
FIG. 6 , when thehost device 10 transmits a command CMD2 (step S30), thesmart storage device 1000 checks which constituent is targeted at thesmart interface 100, and selects and transmits the protocol of the corresponding component (step S31). For example, when thehost device 10 requests data access, thesmart interface 100 connects to thestorage controller 300 through the CXL.io protocol. - When the
storage controller 300 receives the command CMD2 from thehost device 10, thescheduler 310 determines an operation sequence according to a preset policy. Thecontrol unit 320 and thenon-volatile memory controller 340 perform a data access operation according to an order determined by the scheduler 310 (step S32). - The
storage controller 300 transmits the performance result of the step S32 to the host device 10 (step S33). For example, when the command CMD2 is a data read request, the read data is transmitted to thehost device 10, and when it is a data write or deletion request, the performance completion is transmitted to thehost device 10. Thehost device 10 receives the performance result through thestorage controller 300 and the CXL.io protocol (step S34). -
FIG. 7 is a block diagram showing the smart storage device ofFIG. 1 according to some embodiments.FIG. 8 is a block diagram showing the smart interface ofFIG. 1 according to some embodiments. - Referring to
FIGS. 7 and 8 , thesmart storage device 1000 may transform a signal received from thehost device 10 into a signal of the CXL.mem protocol, the CXL.io protocol, or the CXL.cache protocol in thesmart interface 100, and may transmit the signal to each of thecomponents - The
smart interface 100 may include a plurality of layers to communicate with thehost device 10. Each layer may interpret the electrical signal transmitted and received based on a preset definition, and may transform the signal into a signal for operating each of the components (e.g., 200, 300, and 400) in thesmart storage device 1000. - The
smart interface 100 may include aphysical layer 110, anarbiter 120, alink layer 130, and atransaction layer 140, and each configuration will be said to operate based on the CXL interface standard. In addition, thesmart interface 100 may further include various other communication layers. - The
physical layer 110 interprets an electrical signal transmitted to the host device 10 (TX) or received from the host device 10 (RX). Thearbiter 120 may multiplex to decide which sub-protocol is used to send the signal outputted from thephysical layer 110. For example, for theaccelerator circuit 200, it is outputted to a CXL.cache or CXL.mem link layer 131, and for thememory device 490, thestorage device 390, or a heterogeneous device using a PCI interface, it is outputted to aCXL.io link layer 132 or aPCIe link layer 133. - The
transaction layer 140 receives a signal transmitted through the CXL.cache or CXL.mem link layer 131, theCXL.io link layer 132, or thePCIe link layer 133 through transaction layers 141, 142, and 143 corresponding to each protocol, and generates an output. - The
smart interface 100 includes an internal connection Ipath2 directly connecting the CXL.mem protocol and the CXL.io protocol, and the internal connection Ipath2 directly connects data access between thememory controller 400 and thestorage controller 300. According to some embodiments, the CXL.cache or CXL.mem link layer 131 and theCXL.io link layer 132 may be directly connected to each other through an internal connection bus IPath2. - According to some embodiments, the
storage controller 300 may be connected to thehost device 10 through the CXL.io protocol of thesmart interface 100. - For example, the
memory controller 400 may be connected through the CXL.mem protocol or the CXL.io protocol, and thestorage controller 300 may be connected through the CXL.io protocol. - According to some embodiments, the
smart storage device 1000 may further include arouter 500, amemory protocol handler 700, and astorage protocol handler 600 for more efficient data access among thecomponents - The
router 500 may be connected to theCXL.io transaction layer 142 and may route a signal received from the transaction layer to thememory controller 400 or thestorage controller 300. Therouter 500 may be disposed within thesmart interface 100 according to some embodiments, and may be separately disposed and implemented with respect to each of thesmart interface 100, thestorage controller 300, and thememory controller 400 according to some embodiments. - The
memory protocol handler 700 may be connected between the CXL.mem transaction layer 141 and therouter 500, and thememory controller 400, may receive and transfer a data access request for thememory device 490 to thememory controller 400, and may return a request result from thememory controller 400 to thetransaction layer 141 or therouter 500. Thememory protocol handler 700 may be disposed within thesmart interface 100 according to some embodiments, may be separately disposed and implemented with respect to thememory controller 400 according to some embodiments, respectively, and may be disposed within thememory controller 400 according to some embodiments. - The
storage protocol handler 600 may be connected between therouter 500 and thestorage controller 300, may receive and transfer a data access request for thestorage device 390 to thestorage controller 300, and may return the request result to thetransaction layer 142. Thestorage protocol handler 600 may be disposed within thesmart interface 100 according to some embodiments, may be separately disposed and implemented with respect to thestorage controller 300 according to some embodiments, respectively, and may be disposed within thestorage controller 300 according to some embodiments. - When receiving a data access command from the
router 500, thestorage protocol handler 600 parses the access command to check the address of the requested data. In this case, the access command may include an operation type, an address of data, or data. - When the address of the data parsed by the access command is the
storage device 390, thestorage protocol handler 600 transfers the access command to thestorage controller 300. - When the address of the data parsed by the access command is the
memory device 490, thestorage protocol handler 600 transfers the access command to thememory protocol handler 700 through the internal connection Ipath2. Thememory controller 400 may perform a data access operation for thememory device 490 based on an access command transferred through thememory protocol handler 700 and notify thestorage protocol handler 600 of the performance result. When the performance operation corresponding to the parsed access command is completed, thestorage protocol handler 600 may notify thehost device 10 of the performance completion through thesmart interface 100. - The descriptions of the
storage controller 300 and thenon-volatile memory device 390 overlap those ofFIG. 2 , and thus to the extent that descriptions of various elements is omitted, it may be assumed that those elements are at least similar to corresponding elements that are described elsewhere within the instant disclosure. - The
memory controller 400 may control an operation of thememory device 490. For example, control may be performed so that computation data stored in thememory device 490 may be read or deleted, or new computation data may be written. - The
memory device 490 may be a volatile memory such as dynamic RAM (DRAM), static RAM (SRAM), or synchronous dynamic RAM (SDRAM) according to some embodiments, or may be at least one of non-volatile memories according to some embodiments. A nonvolatile memory may be implemented as at least one of, for example, one time programmable ROM (OTPROM), programmable ROM (PROM), erasable and programmable ROM (EPROM), electrically erasable and programmable ROM (EEPROM), mask ROM, flash ROM, flash memory (e.g., NAND flash or NOR flash, or the like), a hard drive, or a solid state drive (SSD). Alternatively, it may be implemented in a form such as a memory card (e.g., compact flash (CF), secure digital (SD), micro secure digital (Micro-SD)), mini secure digital (Mini-SD), extreme digital (xD), multi-media card (MMC), or the like) or an external memory (e.g., USB memory) that may be connected to a USB port. - The
memory device 490 may be, for example, a working memory of thesmart storage device 1000. Thememory device 490 may be implemented in the form of an embedded memory according to the purpose of storing data, or may be implemented in the form of a removable memory. For example, data for driving thesmart storage device 1000 may be stored in a memory embedded in thesmart storage device 1000, and data for an extended function of thesmart storage device 1000 may be stored in a memory that is removable in thesmart storage device 1000. -
FIGS. 9 to 11 are flowcharts illustrating an operating method of the smart storage device ofFIG. 7 . - Referring to
FIG. 9 , when thehost device 10 transmits a command (step S100), thesmart interface 100 checks a protocol based on a target to which the command is to be transmitted (step S101). - When the command is for the
storage device 390, the CXL.io protocol is selected (Yes in step S102), and the storage protocol handler parses the command (step S110). Thestorage protocol handler 600 checks the address information of the data to be requested from the parsed command (step S111), and when the address information (Storage ID) belongs to thestorage device 390, the data access command CMD1 is transmitted to thestorage controller 300. - The
storage controller 300 reads the first data based on the address information (step S113) based on the data access command CMD1 (step S112) and transmits the data to thestorage protocol handler 600. - On the other hand, when there is no additional operation to be performed based on the parsed command (step S114), the
storage protocol handler 600 notifies thehost device 10 of the performance completion by sending a completion message (step S119) that is received by the host 10 (step S120). - On the other hand, when it is necessary to write the first data to the
memory device 490 based on the parsed command (step S114 inFIG. 9 ), thestorage protocol handler 600 transfers the data access command CMD2 and the first data to thememory protocol handler 700 through the internal connection IPath2. Thememory protocol handler 700 receives the data access command CMD2 and first data (step S115), and requests data access to thememory device 490 from the memory controller 400 (step S116). - The
memory controller 400 writes the second data to the memory device 490 (step S117) and transmits the access completion message to thememory protocol handler 700. Thememory protocol handler 700 notifies thestorage protocol handler 600 of the performance completion (step S118), and thestorage protocol handler 600 finally notifies thehost device 10 of the performance completion (step S120). - On the other hand, as is shown in
FIG. 10 , thestorage controller 300 receives the data access command CMD2 from the storage protocol handler 600 (S130) and generates a request for data (step S131). When it is necessary to read the third data from thememory device 490 based on the parsed command (step S114 inFIG. 10 ), thestorage protocol handler 600 transfers the data access command CMD3 to thememory protocol handler 700 through the internal connection IPath2. Thememory protocol handler 700 receives the data access command CMD3 (step S133), and requests data access to thememory device 490 from the memory controller 400 (step S134). - The
memory controller 400 reads the third data from the memory device 490 (step S135). According to some embodiments, thestorage controller 300 writes the third data received through the internal connection Ipath2 to the storage device 390 (step S136) and transmits a completion message to the memory protocol handler 700 (step S137). Thestorage protocol handler 600 finally notifies thehost device 10 of the performance completion (step S138). - In
FIG. 11 , when the protocol checked by thesmart interface 100 is the CXL.mem protocol (step S102), the command CMD sent from thehost device 10 is transmitted to the memory protocol handler 700 (step S140). Thememory protocol handler 700 transmits an access request according to a command CMD4 to the memory controller 400 (step S141), and thememory controller 400 performs an operation corresponding to the request for the memory device 490 (step S142), and then notifies thememory protocol handler 700 of the performance result. Thememory protocol handler 700 transfers the performance result to the host device 10 (step S143) and thehost 10 receives the performance result (step S144). - The above-described
smart storage device 1000 may allow thehost device 10 to consider at least two constituents among components of thesmart storage device 1000, for example, an accelerator circuit, a storage device, and a memory device, as a separate device or a single device through a single smart interface. Accordingly, the latency overhead in which thehost device 10 must intervene to transfer data may be reduced, and since the physical connection between the components is provided, software overhead for maintaining data coherence may be reduced.
Claims (21)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20200126199 | 2020-09-28 | ||
KR10-2020-0126199 | 2020-09-28 | ||
KR10-2021-0007897 | 2021-01-20 | ||
KR1020210007897A KR20220042991A (en) | 2020-09-28 | 2021-01-20 | Smart storage device |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220100669A1 true US20220100669A1 (en) | 2022-03-31 |
Family
ID=80624245
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/403,862 Pending US20220100669A1 (en) | 2020-09-28 | 2021-08-16 | Smart storage device |
Country Status (3)
Country | Link |
---|---|
US (1) | US20220100669A1 (en) |
CN (1) | CN114328306A (en) |
DE (1) | DE102021121105A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230297520A1 (en) * | 2022-03-21 | 2023-09-21 | Micron Technology, Inc. | Compute express link memory and storage module |
CN116880773A (en) * | 2023-09-05 | 2023-10-13 | 苏州浪潮智能科技有限公司 | Memory expansion device and data processing method and system |
CN117112466A (en) * | 2023-10-25 | 2023-11-24 | 浪潮(北京)电子信息产业有限公司 | Data processing method, device, equipment, storage medium and distributed cluster |
EP4287029A1 (en) * | 2022-05-31 | 2023-12-06 | Samsung Electronics Co., Ltd. | Storage system and operation method therefor |
WO2024129541A1 (en) * | 2022-12-12 | 2024-06-20 | Micron Technology, Inc. | Data storage device with memory services based on storage capacity |
WO2024182188A1 (en) * | 2023-02-27 | 2024-09-06 | Micron Technology, Inc. | Data storage devices with file system managers |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117493237B (en) * | 2023-12-29 | 2024-04-09 | 苏州元脑智能科技有限公司 | Computing device, server, data processing method, and storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4935866A (en) * | 1981-04-03 | 1990-06-19 | Compagnie Industrial Des Telecommunications Cit-Alcatel | Multiprocessor control system |
US20170109286A1 (en) * | 2012-10-22 | 2017-04-20 | Intel Corporation | High performance interconnect coherence protocol |
US20190042518A1 (en) * | 2017-09-01 | 2019-02-07 | Intel Corporation | Platform interface layer and protocol for accelerators |
US20200012604A1 (en) * | 2019-09-19 | 2020-01-09 | Intel Corporation | System, Apparatus And Method For Processing Remote Direct Memory Access Operations With A Device-Attached Memory |
US20200081660A1 (en) * | 2011-08-09 | 2020-03-12 | Seagate Technology Llc | I/o device and computing host interoperation |
US20200136996A1 (en) * | 2018-06-29 | 2020-04-30 | Intel Corporation | Offload of storage node scale-out management to a smart network interface controller |
US20200133909A1 (en) * | 2019-03-04 | 2020-04-30 | Intel Corporation | Writes to multiple memory destinations |
US10698842B1 (en) * | 2019-04-10 | 2020-06-30 | Xilinx, Inc. | Domain assist processor-peer for coherent acceleration |
US20200226018A1 (en) * | 2019-11-27 | 2020-07-16 | Intel Corporation | Multi-protocol support on common physical layer |
US20210014324A1 (en) * | 2020-09-24 | 2021-01-14 | Intel Corporation | Cache and memory content management |
US20210042228A1 (en) * | 2019-07-17 | 2021-02-11 | Intel Corporation | Controller for locking of selected cache regions |
US20210073129A1 (en) * | 2020-10-30 | 2021-03-11 | Intel Corporation | Cache line demote infrastructure for multi-processor pipelines |
US20210373951A1 (en) * | 2020-05-28 | 2021-12-02 | Samsung Electronics Co., Ltd. | Systems and methods for composable coherent devices |
US20220197831A1 (en) * | 2019-05-23 | 2022-06-23 | Hewlett Packard Enterprise Development Lp | System and method for facilitating efficient host memory access from a network interface controller (nic) |
-
2021
- 2021-08-13 DE DE102021121105.0A patent/DE102021121105A1/en active Pending
- 2021-08-16 US US17/403,862 patent/US20220100669A1/en active Pending
- 2021-09-28 CN CN202111144318.3A patent/CN114328306A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4935866A (en) * | 1981-04-03 | 1990-06-19 | Compagnie Industrial Des Telecommunications Cit-Alcatel | Multiprocessor control system |
US20200081660A1 (en) * | 2011-08-09 | 2020-03-12 | Seagate Technology Llc | I/o device and computing host interoperation |
US20170109286A1 (en) * | 2012-10-22 | 2017-04-20 | Intel Corporation | High performance interconnect coherence protocol |
US20190042518A1 (en) * | 2017-09-01 | 2019-02-07 | Intel Corporation | Platform interface layer and protocol for accelerators |
US20200136996A1 (en) * | 2018-06-29 | 2020-04-30 | Intel Corporation | Offload of storage node scale-out management to a smart network interface controller |
US20200133909A1 (en) * | 2019-03-04 | 2020-04-30 | Intel Corporation | Writes to multiple memory destinations |
US10698842B1 (en) * | 2019-04-10 | 2020-06-30 | Xilinx, Inc. | Domain assist processor-peer for coherent acceleration |
US20220197831A1 (en) * | 2019-05-23 | 2022-06-23 | Hewlett Packard Enterprise Development Lp | System and method for facilitating efficient host memory access from a network interface controller (nic) |
US20210042228A1 (en) * | 2019-07-17 | 2021-02-11 | Intel Corporation | Controller for locking of selected cache regions |
US20200012604A1 (en) * | 2019-09-19 | 2020-01-09 | Intel Corporation | System, Apparatus And Method For Processing Remote Direct Memory Access Operations With A Device-Attached Memory |
US20200226018A1 (en) * | 2019-11-27 | 2020-07-16 | Intel Corporation | Multi-protocol support on common physical layer |
US20210373951A1 (en) * | 2020-05-28 | 2021-12-02 | Samsung Electronics Co., Ltd. | Systems and methods for composable coherent devices |
US20210014324A1 (en) * | 2020-09-24 | 2021-01-14 | Intel Corporation | Cache and memory content management |
US20210073129A1 (en) * | 2020-10-30 | 2021-03-11 | Intel Corporation | Cache line demote infrastructure for multi-processor pipelines |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230297520A1 (en) * | 2022-03-21 | 2023-09-21 | Micron Technology, Inc. | Compute express link memory and storage module |
EP4287029A1 (en) * | 2022-05-31 | 2023-12-06 | Samsung Electronics Co., Ltd. | Storage system and operation method therefor |
WO2024129541A1 (en) * | 2022-12-12 | 2024-06-20 | Micron Technology, Inc. | Data storage device with memory services based on storage capacity |
WO2024182188A1 (en) * | 2023-02-27 | 2024-09-06 | Micron Technology, Inc. | Data storage devices with file system managers |
CN116880773A (en) * | 2023-09-05 | 2023-10-13 | 苏州浪潮智能科技有限公司 | Memory expansion device and data processing method and system |
CN117112466A (en) * | 2023-10-25 | 2023-11-24 | 浪潮(北京)电子信息产业有限公司 | Data processing method, device, equipment, storage medium and distributed cluster |
Also Published As
Publication number | Publication date |
---|---|
CN114328306A (en) | 2022-04-12 |
DE102021121105A1 (en) | 2022-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220100669A1 (en) | Smart storage device | |
US10296217B2 (en) | Techniques to configure a solid state drive to operate in a storage mode or a memory mode | |
JP6431536B2 (en) | Final level cache system and corresponding method | |
CN112783818B (en) | On-line upgrading method and system of multi-core embedded system | |
KR20200030325A (en) | Storage device and system | |
JP5681782B2 (en) | On-die system fabric block control | |
CN115495389B (en) | Memory controller, calculation memory device, and operation method of calculation memory device | |
US20120054380A1 (en) | Opportunistic improvement of mmio request handling based on target reporting of space requirements | |
US12056066B2 (en) | System, device, and method for accessing memory based on multi-protocol | |
US9864687B2 (en) | Cache coherent system including master-side filter and data processing system including same | |
CN117546149A (en) | System, apparatus, and method for performing shared memory operations | |
CN110781107B (en) | Low-delay fusion IO control method and device based on DRAM interface | |
EP4105771A1 (en) | Storage controller, computational storage device, and operational method of computational storage device | |
US10909056B2 (en) | Multi-core electronic system | |
US20220114099A1 (en) | System, apparatus and methods for direct data reads from memory | |
CN114078497A (en) | System, apparatus and method for memory interface including reconfigurable channel | |
US11822474B2 (en) | Storage system and method for accessing same | |
US9411725B2 (en) | Application-reserved cache for direct I/O | |
KR20220042991A (en) | Smart storage device | |
CN106325377B (en) | The data processing method of Principle of External Device Extension card and I/O peripheral equipment | |
US20220147458A1 (en) | Semiconductor device | |
KR101041838B1 (en) | Mobile storage control device and method | |
CN118363914B (en) | Data processing method, solid state disk device and host | |
US20240241850A1 (en) | Interface device and method, data computing device and data processing system including the same | |
CN115687195A (en) | Hybrid system architecture for implementing host operating system and real-time operating system within a system-on-a-chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHOE, HYEOK JUN;JEON, YOUN HO;YOO, YOUNG GEON;AND OTHERS;SIGNING DATES FROM 20210804 TO 20210811;REEL/FRAME:057195/0820 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |