[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112579342B - Memory error correction method, memory controller and electronic equipment - Google Patents

Memory error correction method, memory controller and electronic equipment Download PDF

Info

Publication number
CN112579342B
CN112579342B CN202011461460.6A CN202011461460A CN112579342B CN 112579342 B CN112579342 B CN 112579342B CN 202011461460 A CN202011461460 A CN 202011461460A CN 112579342 B CN112579342 B CN 112579342B
Authority
CN
China
Prior art keywords
memory
error correction
particles
channel
redundant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011461460.6A
Other languages
Chinese (zh)
Other versions
CN112579342A (en
Inventor
周鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hygon Information Technology Co Ltd
Original Assignee
Hygon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hygon Information Technology Co Ltd filed Critical Hygon Information Technology Co Ltd
Priority to CN202011461460.6A priority Critical patent/CN112579342B/en
Publication of CN112579342A publication Critical patent/CN112579342A/en
Application granted granted Critical
Publication of CN112579342B publication Critical patent/CN112579342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1044Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/073Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • For Increasing The Reliability Of Semiconductor Memories (AREA)

Abstract

The application provides a memory error correction method, a memory controller and electronic equipment, wherein the method comprises the following steps: using an accurate synchronous mode to read and write a plurality of channels of the memory; the memory is a memory with a redundancy ratio greater than or equal to 1 to 4; and reading the data of each memory grain in each channel, when the error memory grain is found through an ECC error correction algorithm according to the read data, replacing the error memory grain with redundant memory grain, and carrying out useless processing on the error memory grain. In this way, redundant memory particles are adopted to replace wrong memory particles to work, so that the normal operation of the channel can be ensured. Since each channel has redundant memory granule substitution, the error correction of multiple error memory granules can be realized in the implementation process.

Description

Memory error correction method, memory controller and electronic equipment
Technical Field
The present disclosure relates to the field of memory technologies, and in particular, to a memory error correction method, a memory controller, and an electronic device.
Background
In the field of computers, a memory is one of important components in a computer, and all programs in the computer are run in the memory, and the memory is used for temporarily storing operation data in a CPU (Central Processing Unit/Processor, central processing unit) and data exchanged with an external memory such as a hard disk. As long as the computer is in operation, the CPU will call the data to be operated to the memory for operation, and after the operation is completed, the CPU will send out the result. Therefore, the reliability of the data in the memory is critical to the performance of the whole system, and the operation of the whole system is directly influenced.
For this reason, the memory error correction is generally implemented in the memory by using the ECC (Error Correction Code ) technology.
Although the ECC error correction technology has certain error checking capability and error correction capability, the error correction capability is only suitable for the situation that only one error memory granule exists in the memory, and if 2 or more error memory granules exist in the memory currently, the correct data cannot be corrected by applying the ECC error correction flow. That is, the error correction capability of the ECC error correction process is limited only in the case that only one error memory granule exists in the memory, and the error correction capability is limited, so that the requirement of the actual application on the reliability of the memory cannot be met.
Disclosure of Invention
An objective of the embodiments of the present application is to provide a memory error correction method, a memory controller and an electronic device, so as to solve the above-mentioned problems.
The embodiment of the application provides a memory error correction method, which comprises the following steps: using an accurate synchronous mode to read and write a plurality of channels of the memory; the memory is a memory with a redundancy ratio greater than or equal to 1 to 4; reading data of each memory particle in each channel; the read data meets the minimum data requirement of an ECC error correction algorithm; when the error memory particles are found through an ECC error correction algorithm according to the read data, redundant memory particles are used for replacing the error memory particles, and useless processing is carried out on the error memory particles.
In the implementation process, when the redundancy ratio of the memory is greater than or equal to 1 to 4, in the memory in the accurate synchronization mode, each channel has redundant memory grains in addition to data grains used for data storage and ECC grains used as ECC check codes. In the embodiment of the application, when the wrong memory granule is found, the redundant memory granule is used for replacing the wrong memory granule, and useless processing is carried out on the wrong memory granule. In the implementation process, the redundant memory particles are used for replacing the wrong memory particles to work, so that the normal operation of the channel can be ensured. And each channel is provided with redundant memory particles, so that the error correction of the memory particles with a plurality of errors can be realized in the realization process, the error correction capability of the memory particles with the plurality of errors is realized, and the requirement of the actual application on the memory reliability is further met.
Further, the method further comprises: and when the error memory particles are found through the ECC error correction algorithm and redundant memory particles are not existed, correcting the error memory particles by using the ECC error correction algorithm.
In the implementation manner, when the wrong memory particles are found and the redundant memory particles are not existed (namely, the redundant memory particles are exhausted), the wrong memory particles can be corrected through an ECC error correction algorithm, so that the error correction capability of the wrong memory particles is further increased, the number of the memory particles capable of correcting errors is increased, and the requirement of the practical application on the memory reliability can be better met.
Further, when an erroneous memory granule is found by the ECC error correction algorithm, before using the redundant memory granule to replace the erroneous memory granule, the method further includes: it is determined that the ECC error correction algorithm has been used for error correction.
In the implementation process, when the wrong memory particles are found, an ECC error correction algorithm is performed first. And then when the error memory particles are found, the redundant memory particles are used for replacing the error memory particles, so that the error correction of the newly found error memory particles is realized by combining the redundant memory particles on the basis that an ECC error correction algorithm can correct one memory particle, the number of the memory particles capable of correcting errors is increased, and the requirement on the memory reliability in practical application can be better met.
Further, when the error memory granule is found by the ECC error correction algorithm, using redundant memory granules to replace the error memory granule includes: and when the error memory granules are found through the ECC error correction algorithm, using redundant memory granules in the channel where the error memory granules are located to replace the error memory granules.
It should be appreciated that by replacing the erroneous memory granule with a redundant memory granule within the channel in which the erroneous memory granule is located, the replaced memory granule is still within the same channel as the normal memory granule, and thus the channel identification logic of the data may not have to be altered.
Further, the replacing the erroneous memory granule with the redundant memory granule further includes: and when the redundant memory particles do not exist in the channel where the error memory particles exist, using the redundant memory particles in the other channel except the channel where the error memory particles exist in each channel of the unified read-write to replace the error memory particles.
In the implementation process, redundant memory particles are allowed to be adopted to replace the wrong memory particles in the cross-channel mode, so that each redundant memory particle can be fully utilized for error correction, and even if the wrong memory particle exists on one channel but the redundant memory particle does not exist, the redundant memory particles in the other channels can be used for replacing the wrong memory particle, and the adaptability to different error scenes is improved.
Further, the memory is a memory with the data bit width of 32+8 bits and the storage size of a single memory particle of 4 bits; the minimum data processing unit of the ECC error correction algorithm is 128+16 bits.
For a memory with the data bit width of 32+8 bits and the storage size of a single memory granule of 4 bits, such as a DDR5 (Double Data Rate SDRAM, 5 th generation double rate synchronous dynamic random access memory) memory, the minimum data processing unit of an ECC error correction algorithm is 128+16 bits, each channel has one redundant memory granule for error correction, so that the memory granule with 3 errors can be used for error correction at most, and compared with the existing mode of only adopting ECC error correction, the number of memory granules capable of error correction is increased, and the requirement on the reliability of the memory in practical application can be better met.
The embodiment of the application also provides a memory controller, which comprises: an ECC error correction circuit and an accurate synchronization mode circuit; the accurate synchronous mode circuit is used for reading and writing a plurality of channels of the memory by using an accurate synchronous mode; the memory is a memory with a redundancy ratio greater than or equal to 1 to 4; the ECC error correction circuit is used for reading each memory particle in each channel, and when the error memory particles are found through an ECC error correction algorithm according to the read data, using redundant memory particles to replace the error memory particles, and carrying out useless processing on the error memory particles; the read data meets the minimum data requirement of an ECC error correction algorithm.
The memory controller can realize the error correction of the wrong memory particles by using the redundant memory particles in the channels, and can realize the error correction of a plurality of wrong memory particles because each channel is provided with the redundant memory particles for replacement, thereby realizing the error correction capability of the wrong memory particles, improving the memory reliability and further meeting the requirement of the memory reliability in practical application.
Further, the memory granules include data granules and redundant memory granules; the memory controller also comprises a gating device which is respectively distributed to each channel; the gate is respectively connected with the data particle interface and the redundant particle interface of the channel, and is used for disconnecting the passage between the data particle interfaces corresponding to the erroneous data particles and connecting the data particle interfaces with the redundant particle interfaces when the ECC error correction circuit finds the erroneous data particles in the channel; wherein: the data particle interface is an interface used for accessing data particles in the memory controller; the redundant grain interface is an interface used for accessing redundant memory grains in the memory controller.
In the implementation process, the on-off control of the memory controller to different memory particles in each channel in the memory can be realized through the corresponding gate of each channel, so that redundant memory particles can be adopted to replace error memory particles. The circuit is simple to realize, the circuit structure of the existing memory controller is not required to be changed too much, and the circuit has good universality.
Further, the gates are each also interfaced with redundant particles in the remaining channels outside the channel to which they belong.
In this way, the substitution capability of the redundant memory granules of the cross-channel can be provided, so that even if the error memory granules appear on one channel but the redundant memory granules are not existed, the redundant memory granules in the rest channels can be used for replacing the error memory granules, thereby improving the adaptability to different error scenes.
The embodiment of the application also provides electronic equipment, which comprises the memory and a memory controller of any one of the above; the memory is a memory with a redundancy ratio greater than or equal to 1 to 4; the memory controller is used for executing any memory error correction method to realize the error correction of the memory.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a memory error correction method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a basic structure of a memory controller according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a connection structure of a strobe of a memory controller according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a gate connection structure of another memory controller according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 6 is a schematic diagram of an initial state of a DRR5 memory according to an embodiment of the present application;
FIG. 7 is a schematic diagram of channel states of a DRR5 memory when error memory particles respectively appear on two channels of the memory;
fig. 8 is a schematic diagram of a channel state of 3 memory granules with errors on channel 0 of the DRR5 memory according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Embodiment one:
an embodiment of the present application provides a memory error correction method, which may be shown in fig. 1, including:
s101: and using a precise synchronous mode to read and write the multiple channels of the memory.
It should be noted that, the precise synchronization mode is also called Lockstep mode (LockStep Channel Mode), which is to process the same instruction at the same time by using the same, redundant hardware component, so that the data on one CPU Cache line (Cache-line) is distributed to several memory channels.
After the precise synchronization mode is used, multiple channels of the memory can be used for realizing the storage of data on the same CPU cache line, so that memory particles among the multiple channels have the possibility of sharing.
S102: and reading the data of each memory particle in each channel.
It should be noted that, in the embodiment of the present application, the read data should meet the minimum data requirement of the ECC error correction algorithm, so that the ECC error correction algorithm can operate normally, and an error in the read data can be identified, so as to locate whether there is an erroneous memory granule, and specifically which memory granule has an error.
S103: when the error memory particles are found through an ECC error correction algorithm according to the read data, redundant memory particles are used for replacing the error memory particles, and useless processing is carried out on the error memory particles.
It should be noted that, in the embodiment of the present application, the chipkill algorithm may be used to implement using redundant memory granules instead of erroneous memory granules.
Furthermore, in the present embodiment, the ECC error correction algorithm may be implemented, but is not limited to, using the RS (144, 128) algorithm.
It should be noted that, in the embodiment of the present application, the memory should be a memory with a redundancy ratio greater than or equal to 1 to 4. It should be noted that the data bit width of the memory includes two portions, namely, the bit width of the required data portion and the bit width of the redundant data portion. For example, for a conventional DDR4 memory, the data bit width is 64+8 bits, i.e., the data bit width is made up of 64bit data bits+8 bit redundancy bits. Also for example, for conventional DDR5 memory, the data bit width is 32+8 bits, i.e., the data bit width is comprised of 32bit data bits+8 bit redundancy bits.
And the redundancy ratio refers to the ratio between the bit width of the redundancy bits and the bit width of the data bits in the memory data bit width. For example, for DDR4 memory, the redundancy ratio is 1 to 8; for DDR5 memory, the redundancy ratio is 1 to 4.
It should be further noted that, since the minimum data requirement (hereinafter referred to as ECC word) of the ECC error correction algorithm is 128+16 bits, for the DDR4 memory, all the redundant bits thereof need to be saved for the test data of the ECC error correction algorithm, so that there are no more redundant memory particles in the memory.
For a memory with a redundancy ratio of 1 to 4, such as a DDR5 memory, after each channel stores the test data of the ECC error correction algorithm, there is a redundant memory granule. Therefore, in the embodiment of the present application, the memory is a memory with a redundancy ratio greater than or equal to 1 to 4.
It should be understood that, in the embodiment of the present application, when the data of each memory granule in each channel is read, the read data can meet the requirement of ECC Word, so that the ECC error correction algorithm can perform processing.
For example, assuming that ECC Word is 128+16 bits, the memory is a DDR5 memory with a single memory granule having two channels and a storage size of 4 bits, two channels of the memory may be read twice each, thereby obtaining 128+32 bits of data, and thus satisfying the 128+16 bits requirement of ECC Word. At this time, each of the two channels has one redundant memory granule, so that the memory will have two redundant memory granules.
Of course, for a DDR5 memory with 4-channel single memory granule and a storage size of 4 bits, only 4 channels need to be read once to obtain 128+32 bits of data, thereby satisfying the 128+16 bits requirement of ECC Word. At this time, each of the 4 channels has one redundancy memory granule, so that the DDR5 memory will have 4 redundancy memory granules.
It should be noted that in the embodiment of the present application, the error correction capability of the ECC error correction algorithm itself for a memory granule with one error may be further combined, so as to further improve the error correction capability for the memory.
For example, for DDR5 memory with dual channels and a single memory granule of 4 bits, after two channels are read and written together using the precise synchronization mode, there is one redundant memory granule per channel. Error correction for both memory granules can be achieved by replacing the memory granule with a redundant memory granule. In addition, by using the ECC error correction algorithm, an error can be additionally corrected by using an error memory granule.
In a possible implementation manner of the embodiment of the present application, an ECC error correction algorithm may be first used to correct the first found memory granule with errors, and then the memory granule with errors found later is used to replace the memory granule with errors.
In addition, in a possible implementation manner of the embodiment of the present application, after detecting the wrong memory granule, error correction may be performed by using the redundant memory granule instead of the memory granule with the error, and after the redundant memory granule is used, error correction may be performed on the newly found wrong memory granule by using an ECC error correction algorithm.
It should be noted that in this possible embodiment, it may be configured to replace the erroneous memory granule with only the redundant memory granule in the channel in which the erroneous memory granule is located, so as to achieve the effect of error correction by using the redundant memory granule in the channel.
At this time, after the redundant memory granules in the present channel are used (i.e. there is no redundant memory granule), if an error memory granule is found in the present channel, and the ECC error correction algorithm is not used to correct an error of a certain memory granule, then the ECC error correction algorithm may be used to correct the error memory granule. However, if the ECC error correction algorithm is used to correct a certain memory granule, even if there is still redundant memory granule in another channel, the error correction cannot be performed on the erroneous memory granule, and at this time, a memory failure may be reported, and the engineer may perform the tasks such as memory maintenance or replacement.
Furthermore, in this possible embodiment, it may also be configured that redundant memory granules may be used across channels instead of erroneous memory granules. That is, when there is no redundant memory grain in the channel where the found erroneous memory grain is located, the redundant memory grain in another channel except the channel where the erroneous memory grain is located in each channel of the unified read/write is used to replace the erroneous memory grain. At this time, all redundant memory particles can be fully utilized, so that the scheme can be suitable for various memory particle faults.
For example, referring to fig. 8, after the redundant memory granule in channel 0 is used, the redundant memory granule in channel 1 may be used instead of the erroneous memory granule.
It should be noted that, in the embodiment of the present application, when the redundant memory granule is allowed to be used to replace the erroneous memory granule across channels, the redundant memory granule in the channel where the erroneous memory granule is located may be preferentially used to replace the erroneous memory granule in the above manner, and when the redundant memory granule does not exist in the channel where the erroneous memory granule is located, the redundant memory granule in another channel is used to replace the erroneous memory granule.
However, the order of use of the redundant memory granules may not be limited, that is, when there is a redundant memory granule in the channel in which the erroneous memory granule is located, the redundant memory granule in another channel may be used to replace the erroneous memory granule.
It should be noted that, in the embodiment of the present application, when the redundant memory granule replaces the erroneous memory granule, the redundant memory granule is used to implement the function of the erroneous memory granule, that is, the redundant memory granule is no longer included. For the wrong memory granule, since useless processing (such as marking the memory granule as faulty, unavailable, etc. in the memory controller) is performed, the memory granule is discarded in the memory, and thus the wrong memory granule does not belong to redundant memory granule after being replaced. Redundant memory particles are therefore in embodiments of the present application "consumable", and the label "redundant memory particles" is automatically lost after use.
The embodiment of the application also provides a memory controller, which can be seen in fig. 2. The memory controller may include: ECC error correction circuit and accurate synchronization mode circuit. Wherein:
the fine synchronization pattern circuit may be configured to read and write a plurality of channel systems of the memory using a fine synchronization pattern.
The ECC error correction circuit can be used for reading each memory grain in each channel, and when the error memory grain is found through an ECC error correction algorithm according to the read data, the redundant memory grain is used for replacing the error memory grain, and useless processing is carried out on the error memory grain.
In the memory, the memory grain includes a data grain for storing data, an ECC grain for storing test data of an ECC error correction algorithm, and a redundancy grain (i.e., a redundant memory grain) left in addition to the data grain and the ECC grain.
In the embodiment of the present application, in order to better implement that redundant memory grains replace erroneous memory grains, as shown in fig. 3, the memory controller further includes gates that are allocated to each channel.
And each gating device is respectively connected with the data particle interface and the redundant particle interface of the channel and is used for disconnecting the passage between the data particle interfaces corresponding to the error data particles and connecting the data particle interfaces with the redundant particle interfaces when the ECC error correction circuit finds the error data particles in the channel.
It should be appreciated that in most cases, the memory and the memory controller are not directly connected, but that no matter which circuits are present between the memory and the memory controller, data interaction between the memory and the memory controller may be achieved, i.e. each memory granule of the memory still accesses the memory controller. In this embodiment of the present application, an interface for accessing data particles in the memory controller is a data particle interface, and an interface for accessing redundancy particles is a redundancy particle interface.
It should be understood that in the embodiment of the present application, each gate may be connected only to the data granule interface and the redundant granule interface in the present channel, and not to the redundant granule interfaces in the remaining channels, such as shown in fig. 3.
In addition, in the embodiment of the present application, each gate may also be connected to the data granule interface and the redundant granule interface in the present channel, and also be connected to the redundant granule interfaces in the other channels, so that after the wrong memory granule is found in a certain channel, the memory controller may replace the wrong memory granule with the redundant memory granule in the other channels, for example, as shown in fig. 4.
It should be noted that, in the embodiment of the present application, the ECC error correction circuit may also implement an ECC error correction algorithm, so as to implement error correction for the memory.
It should also be noted that, in the embodiments of the present application, the ECC error correction circuit and the precise synchronization mode circuit may be implemented by using existing ECC error correction circuits and precise synchronization mode circuits. The gate can be realized by a common multi-path gate circuit.
In an embodiment of the present application, an electronic device is further provided, which may be seen in fig. 5, and includes a memory and a memory controller.
The memory is a memory with a redundancy ratio of 1 to 4. For example, it may be DDR5 memory.
The memory controller can adopt the memory controller, so that the memory error correction can be realized according to the memory error correction method provided by the embodiment of the application.
In the embodiment of the present application, the electronic device may be an electronic device such as a mobile phone, a computer, or a server, which has a memory and a memory controller.
It should be understood that the memory with the highest redundancy ratio in the market at present is the DDR5 memory, and the redundancy ratio is 1 to 4. If a memory with larger redundancy is developed in the future, the scheme provided in the embodiments of the present application may also be used to implement the method.
According to the memory error correction method, the memory controller and the electronic device, the memory particles with redundancy in the channels can be used for correcting errors of the memory particles with redundancy, and as each channel is provided with the substitution of the memory particles with redundancy, the error correction of the memory particles with multiple errors can be achieved, so that the error correction capability of the memory particles with multiple errors is achieved, the memory reliability can be improved, and the requirement of the actual application on the memory reliability is further met.
Embodiment two:
the embodiment takes a specific memory error correction process applied to the DDR5 memory as an example, and the application is illustrated.
The DDR5 memory is uniformly addressed by using an accurate synchronous mode, 2 channels (channel 0 and channel 1) are read and written together, and data are distributed on the 2 channels.
The ECC error correction algorithm uses the RS (144, 128) algorithm, with an ECC word of 128+16 bits.
Since DDR5 memory data bit width is 32+8 bits. The 2 channels each read twice, then have 128+32bits. Whereas the ECC error correction algorithm uses only 16 bits of 32bit redundancy data. Thus, 2 channels leave 16bit redundancy data. Each channel has 8 bits of redundant data left. I.e. each channel is redundant by one redundant memory granule.
In the initial state, as shown in fig. 6, 18 memory particles with the size of 4 bits in total of D0, D1, D2, D3, D4, D5, D6, D7 and C0 are read twice in two channels to form a complete ECC word. The C1 memory granule in both channels is a redundant memory granule.
Using the RS (144, 128) x8chipkill algorithm (x 8chipkill algorithm is chipkill algorithm for 8bit sizes), when the algorithm finds that channel 0 has an erroneous memory granule, the D0 memory granule of channel 0 as illustrated in fig. 7 is erroneous. At this point the C1 memory granule of channel 0 may be used to replace the erroneous D0 memory granule. The read and write commands for the D0 memory granule are all loaded onto the C1 memory granule afterwards. And D0 memory particles do useless treatment.
At this time, the ECC word becomes composed of D1, D2, D3, D4, D5, D6, D7, C0, C1 of the channel 0 and D0, D1, D2, D3, D4, D5, D6, D7, C0 of the channel 1.
It should be appreciated that the D0 memory granule error case is merely illustrated in fig. 7. In fact, any of the 8 memory granules D0, D1, D2, D3, D4, D5, D6, D7 of channel 0 are faulty, and the C0 memory granules can be used for replacement.
Under the new ECC word, when channel 1 is found to also have an erroneous memory granule, the D3 memory granule of channel 1 as illustrated in FIG. 7 is erroneous. The C1 memory granule of channel 1 may be used to replace the erroneous D3 memory granule. All read and write commands to the D3 memory granule of channel 1 are thereafter loaded onto the C1 memory granule of channel 1. The D3 memory particles of channel 1 do not have to be processed.
At this time, the ECC word becomes composed of D1, D2, D3, D4, D5, D6, D7, C0, C1 of the channel 0 and D0, D1, D2, D4, D5, D6, D7, C0, C1 of the channel 1.
The new composition of ECC word can also be used to correct an erroneous memory grain based on an ECC error correction algorithm. The erroneous memory granule may be on channel 0 or channel 1.
In extreme cases, three erroneous memory granules may all be on the same channel. For example, as shown in FIG. 8, three erroneous memory granules are all on channel 0. For this case, by adopting the scheme of this embodiment, the first erroneous memory granule may be replaced by the C1 memory granule of channel 0, and the second erroneous memory granule of channel 0 is found, so as to determine whether the current C1 memory granule of channel 1 is used. In this example, the C1 memory granule of channel 1 is not used, so the C1 memory granule of channel 1 may be used to replace the errant memory granule on channel 0. And for the third error memory granule, the error can be corrected by an ECC error correction algorithm.
According to the scheme provided by the embodiment, for DDR5 memory, an accurate synchronous mode is used, at most 3 memory particles can be corrected in 128bit data, and various error scenes are supported. Compared with the traditional scheme of only adopting an ECC error correction algorithm, the scheme can only correct 1 memory particle, remarkably improves the error correction capability and improves the reliability of the DDR memory.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Herein, a plurality refers to two or more.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (9)

1. A memory error correction method, comprising:
using an accurate synchronous mode to read and write a plurality of channels of the memory; the memory is a memory with a redundancy ratio greater than or equal to 1 to 4; each channel has redundant memory granules and a plurality of memory granules;
reading data of each memory particle in each channel; the read data meets the minimum data requirement of an ECC error correction algorithm;
when the error memory particles are found through an ECC error correction algorithm according to the read data, the redundant memory particles in each channel are used for replacing the error memory particles, and useless processing is carried out on the error memory particles.
2. The memory error correction method of claim 1, further comprising:
when the error memory particles are found through the ECC error correction algorithm and redundant memory particles do not exist, correcting the error memory particles by using the ECC error correction algorithm;
alternatively, when the erroneous memory grain is found by the ECC error correction algorithm, before using the redundant memory grain to replace the erroneous memory grain, the method further includes: it is determined that the ECC error correction algorithm has been used for error correction.
3. The memory error correction method of claim 1, wherein using redundant memory granules instead of erroneous memory granules comprises:
and when redundant memory particles exist in the channel where the error memory particles exist, replacing the error memory particles by the redundant memory particles in the channel where the error memory particles exist.
4. The memory error correction method of claim 1, wherein using redundant memory granules instead of erroneous memory granules comprises:
and when the redundant memory particles do not exist in the channel where the error memory particles exist, using the redundant memory particles in the other channel except the channel where the error memory particles exist in each channel of the unified read-write to replace the error memory particles.
5. The memory error correction method as claimed in any one of claims 1 to 4, wherein the memory is a memory having a data bit width of 32+8 bits and a single memory grain storage size of 4 bits; the minimum data processing unit of the ECC error correction algorithm is 128+16 bits.
6. A memory controller, comprising: an ECC error correction circuit and an accurate synchronization mode circuit;
the accurate synchronous mode circuit is used for reading and writing a plurality of channels of the memory by using an accurate synchronous mode; the memory is a memory with a redundancy ratio greater than or equal to 1 to 4; each channel has redundant memory granules and a plurality of memory granules;
the ECC error correction circuit is used for reading each memory particle in each channel, and when the error memory particles are found through an ECC error correction algorithm according to the read data, replacing the error memory particles with redundant memory particles in each channel, and carrying out useless processing on the error memory particles; the read data meets the minimum data requirement of an ECC error correction algorithm.
7. The memory controller of claim 6, wherein the memory granules comprise data granules and redundant memory granules; the memory controller also comprises a gating device which is respectively distributed to each channel;
the gate is respectively connected with the data particle interface and the redundant particle interface of the channel, and is used for disconnecting the passage between the data particle interfaces corresponding to the erroneous data particles and connecting the data particle interfaces with the redundant particle interfaces when the ECC error correction circuit finds the erroneous data particles in the channel; wherein:
the data particle interface is an interface used for accessing data particles in the memory controller; the redundant grain interface is an interface used for accessing redundant memory grains in the memory controller.
8. The memory controller of claim 7 wherein the gates are each further interfaced with redundant particles in a remaining channel other than the channel to which they belong.
9. An electronic device, comprising: a memory and a memory controller;
the memory is a memory with a redundancy ratio greater than or equal to 1 to 4;
the memory controller is configured to perform the memory error correction method according to any one of claims 1 to 5, so as to implement error correction on the memory.
CN202011461460.6A 2020-12-07 2020-12-07 Memory error correction method, memory controller and electronic equipment Active CN112579342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011461460.6A CN112579342B (en) 2020-12-07 2020-12-07 Memory error correction method, memory controller and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011461460.6A CN112579342B (en) 2020-12-07 2020-12-07 Memory error correction method, memory controller and electronic equipment

Publications (2)

Publication Number Publication Date
CN112579342A CN112579342A (en) 2021-03-30
CN112579342B true CN112579342B (en) 2024-02-13

Family

ID=75131557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011461460.6A Active CN112579342B (en) 2020-12-07 2020-12-07 Memory error correction method, memory controller and electronic equipment

Country Status (1)

Country Link
CN (1) CN112579342B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114518972B (en) * 2022-02-14 2024-06-18 海光信息技术股份有限公司 Memory error processing method and device, memory controller and processor
CN116820830A (en) * 2022-03-22 2023-09-29 华为技术有限公司 Data writing method and processing system
CN116954982A (en) * 2022-04-19 2023-10-27 华为技术有限公司 Data writing method and processing system
CN117238356A (en) * 2022-06-08 2023-12-15 成都华为技术有限公司 Memory module and electronic equipment
CN118733312A (en) * 2023-03-31 2024-10-01 华为技术有限公司 Memory error correction method, memory bank, memory controller and processor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001297594A (en) * 2000-04-06 2001-10-26 Hewlett Packard Co <Hp> Method for providing capability of reprogramming memory redundancy in field
CN102332302A (en) * 2011-07-19 2012-01-25 北京时代全芯科技有限公司 Phase change memory and redundancy replacing method for same
CN103295649A (en) * 2013-04-28 2013-09-11 上海宏力半导体制造有限公司 Method for improving reliability of a nonvolatile memory
CN109328340A (en) * 2017-09-30 2019-02-12 华为技术有限公司 Detection method, device and the server of memory failure
CN111294059A (en) * 2019-12-26 2020-06-16 成都海光集成电路设计有限公司 Encoding method, decoding method, error correction method and related device
CN111312321A (en) * 2020-03-02 2020-06-19 电子科技大学 Memory device and fault repairing method thereof
CN111459712A (en) * 2020-04-16 2020-07-28 上海安路信息科技有限公司 SRAM type FPGA single event upset error correction method and single event upset error correction circuit

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001297594A (en) * 2000-04-06 2001-10-26 Hewlett Packard Co <Hp> Method for providing capability of reprogramming memory redundancy in field
CN102332302A (en) * 2011-07-19 2012-01-25 北京时代全芯科技有限公司 Phase change memory and redundancy replacing method for same
CN103295649A (en) * 2013-04-28 2013-09-11 上海宏力半导体制造有限公司 Method for improving reliability of a nonvolatile memory
CN109328340A (en) * 2017-09-30 2019-02-12 华为技术有限公司 Detection method, device and the server of memory failure
CN111294059A (en) * 2019-12-26 2020-06-16 成都海光集成电路设计有限公司 Encoding method, decoding method, error correction method and related device
CN111312321A (en) * 2020-03-02 2020-06-19 电子科技大学 Memory device and fault repairing method thereof
CN111459712A (en) * 2020-04-16 2020-07-28 上海安路信息科技有限公司 SRAM type FPGA single event upset error correction method and single event upset error correction circuit

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
In-memory computing to break the memory wall;Huang Xiaohe等;《Chinese Physics B》;1-10 *

Also Published As

Publication number Publication date
CN112579342A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN112579342B (en) Memory error correction method, memory controller and electronic equipment
US7017017B2 (en) Memory controllers with interleaved mirrored memory modes
US7130229B2 (en) Interleaved mirrored memory systems
EP1984822B1 (en) Memory transaction replay mechanism
US8341499B2 (en) System and method for error detection in a redundant memory system
US8103900B2 (en) Implementing enhanced memory reliability using memory scrub operations
US6981173B2 (en) Redundant memory sequence and fault isolation
US9262284B2 (en) Single channel memory mirror
US7587658B1 (en) ECC encoding for uncorrectable errors
US8566672B2 (en) Selective checkbit modification for error correction
US20130339820A1 (en) Three dimensional (3d) memory device sparing
US20040237001A1 (en) Memory integrated circuit including an error detection mechanism for detecting errors in address and control signals
US12032443B2 (en) Shadow DRAM with CRC+RAID architecture, system and method for high RAS feature in a CXL drive
US20210406126A1 (en) Low latency availability in degraded redundant array of independent memory
US7428689B2 (en) Data memory system and method for transferring data into a data memory
US11520659B2 (en) Refresh-hiding memory system staggered refresh
CN115729746A (en) Data storage protection method based on CRC and ECC
CN111142797B (en) Solid state disk refreshing method and device and solid state disk
US11030061B2 (en) Single and double chip spare
CN109753239B (en) Semiconductor memory module, semiconductor memory system, and method of accessing the same
CN116263643A (en) Storage class memory, data processing method and processor system
US8848470B2 (en) Memory operation upon failure of one of two paired memory devices
US20240086090A1 (en) Memory channel disablement
CN118838738A (en) Memory error correction method, memory bank, memory controller and processor
JPH0922387A (en) Memory unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant