CN112866154B

CN112866154B - Method and apparatus for finding codeword decoding order in a serial interference cancellation receiver using reinforcement learning

Info

Publication number: CN112866154B
Application number: CN202010391970.4A
Authority: CN
Inventors: K·R·帕萨德
Original assignee: Marvell Asia Pte Ltd
Current assignee: Marvell Asia Pte Ltd
Priority date: 2019-05-04
Filing date: 2020-05-11
Publication date: 2024-04-05
Anticipated expiration: 2040-05-11
Also published as: CN112866154A; US10771122B1

Abstract

Embodiments of the present disclosure relate to methods and apparatus for finding codeword decoding order in a Serial Interference Cancellation (SIC) receiver using reinforcement learning. In one embodiment, a method for decoding codewords in a multiple-input multiple-output (MIMO) communication network is provided. The method comprises the following steps: determining a decoding order based on the state space and the decoding strategy; decoding the selected codeword based on the decoding order; updating a decoding strategy based on the decoding result and the state space; updating the state space based on the decoding result; and updating the decoding order based on the state space and the decoding strategy.

Description

Method and apparatus for finding codeword decoding order in a serial interference cancellation receiver using reinforcement learning

Cross Reference to Related Applications

The present application claims the benefit of priority from U.S. patent application Ser. No.16/681,622, entitled "method and apparatus for finding codeword decoding order in SIC receiver Using reinforcement learning," filed on 11/12 2019, which is incorporated herein by reference in its entirety.

Technical Field

Exemplary embodiments of the present invention relate to telecommunications networks. More particularly, exemplary embodiments of the present invention relate to receiving and processing data streams via a wireless communication network.

Background

With the rapidly growing trend of mobile and remote data access over high speed communication networks such as LTE or 5G cellular services, accurate delivery and decryption of data streams becomes increasingly challenging and difficult. High-speed communication networks capable of communicating information include, but are not limited to, wireless networks, cellular networks, wireless personal area networks ("WPANs"), wireless local area networks ("WLANs"), wireless metropolitan area networks ("MANs"), and the like.

A system providing high-speed communication through a multiple-input multiple-output (MIMO) network may utilize a Serial Interference Cancellation (SIC) receiver to equalize the MIMO channels. For example, the SIC receiver receives codewords transmitted on a MIMO channel and performs interference cancellation, followed by a decoder to successfully decode the received codewords.

Accordingly, it is desirable to efficiently decode received codewords and facilitate interference cancellation and decoding accuracy in a MIMO system.

Disclosure of Invention

In various embodiments, methods and apparatus for finding codeword decoding order in a SIC receiver are provided. For example, a plurality of codewords are received on a plurality of channels at a plurality of antennas of a MIMO receiver. The decoding order determination circuit obtains state information about the channel and codeword and a decoding strategy and generates a decoding order. The decoding strategy is learned using reinforcement learning based on a set of state criteria and rewards derived from the decoding results. In each iteration, the decoding order determination circuit determines a codeword that is a decoding candidate in that iteration. The plurality of decoders implement a decoding order to decode the candidate codewords. At the end of decoding, rewards will be calculated and policies updated accordingly. The status information is updated to reflect the success or failure of the decoding attempt. This process is repeated until all codewords have been decoded or other decoding conditions are met. As codewords are decoded, the channels associated with those codewords are removed from the channel equalization process, allowing codewords on weaker channels to be accurately decoded.

In one embodiment, a method for decoding codewords in a multiple-input multiple-output (MIMO) communication network is provided. The method comprises the following steps: determining a decoding order based on the state space and the decoding strategy; decoding the selected codeword based on the decoding order; updating a decoding strategy based on the decoding result and the state space; updating the state space based on the decoding result; and updating the decoding order based on the state space and the decoding strategy.

In one embodiment, an apparatus for decoding codewords in a multiple-input multiple-output (MIMO) communication network is provided. The device comprises: a decoding order determination circuit that determines a decoding order of the decoded codeword based on the state space and the decoding strategy; and a prize-determining circuit that receives the decoded codeword and determines a prize based on the decoding result. The apparatus further comprises: a policy updating circuit that updates a decoding policy based on the decoding result and the state space; and a state interface updating a state space based on the decoding result. The decoding order determination circuit also generates an updated decoding order based on the state space and the decoding policy.

Additional features and benefits of exemplary embodiments of the present invention will become apparent from the detailed description, figures, and claims set forth below.

Drawings

Exemplary aspects of the present invention will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.

Fig. 1 illustrates a MIMO communication network having a transceiver that includes decoding order determination circuitry operative to determine a decoding order for received codewords.

Fig. 2 illustrates a detailed exemplary embodiment of the second MIMO transceiver illustrated in fig. 1.

Fig. 3 illustrates an exemplary embodiment of the decoding order determination circuit illustrated in fig. 2.

FIG. 4 illustrates an exemplary embodiment of a state space for use with the DODC shown in FIG. 3.

FIG. 5 illustrates an exemplary embodiment of a decoding strategy for use with the DODC shown in FIG. 3.

Fig. 6 illustrates an exemplary method for determining a decoding order according to an exemplary embodiment of the present invention.

Fig. 7 illustrates an exemplary apparatus for determining a decoding order according to an exemplary embodiment of the present invention.

Detailed Description

The following detailed description is intended to provide an understanding of one or more embodiments of the invention. Those of ordinary skill in the art will realize that the following detailed description is illustrative only and is not intended to be in any way limiting. Other embodiments will be apparent to those skilled in the art from consideration of the present disclosure and/or description.

In the interest of clarity, not all of the routine features of the implementations described herein are shown and described. Of course, it will be appreciated that in the development of any such actual implementation, numerous implementation-specific decisions may be made in order to achieve the developer's specific goals, such as compliance with application-and business-related constraints, and that these specific goals will vary from one implementation to another and from one developer to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking of engineering for those of ordinary skill in the art having the benefit of embodiments of this disclosure.

The various embodiments of the invention shown in the figures may not be drawn to scale. On the contrary, the dimensions of the various features may be exaggerated or reduced for clarity. In addition, some of the figures may be simplified for clarity. Accordingly, the figures may not depict all of the components of a given apparatus (e.g., device) or method. The same reference indicators will be used throughout the drawings and the following detailed description to refer to the same or like parts.

The term "system" or "device" is used generically herein to describe any number of components, elements, sub-systems, devices, packet switched elements, packet switches, access switches, routers, networks, modems, base stations, enbs (enodebs), computer and/or communication devices or mechanisms, or combinations of components thereof. The term "computer" includes processors, memory, and buses capable of executing instructions, where a computer refers to one or a cluster of computers, personal computers, workstations, mainframes, or a combination of computers thereof.

In various embodiments, methods and apparatus are provided for finding codeword decoding order in a serial interference cancellation ("SIC") receiver using a reinforcement learning process.

Fig. 1 shows a MIMO communication network 100 having a transceiver that includes a Decoding Order Determination Circuit (DODC) for determining a decoding order of received codewords. The network 100 includes a first MIMO transceiver 102 and a second MIMO transceiver 104. The first MIMO transceiver 102 is coupled to a plurality of antennas 106 for transmitting information to/receiving information from the second MIMO transceiver 104.

The MIMO transceiver 104 is coupled to a plurality of antennas 108. The MIMO transceiver 104 receives codewords transmitted from the first transceiver 102 and received by the antennas 108. Transceiver 104 includes codeword decoder 110 and Decoding Order Determination Circuit (DODC) 112. Codewords received through antenna 108 are processed by transceiver 104 and decoded by decoder 110. The Decoded Codeword (DCW) 114 is output to other entities coupled to the transceiver 104.

During operation, the decoding order of codewords affects MIMO channel equalization. For example, decoding codewords received on strong channels allows those channels to be removed from the equalization process, allowing codewords received on weaker channels to be successfully decoded. DODC112 operates to determine the decoding order such that MIMO channel equalization results in higher decoding accuracy.

In one embodiment, DODC112 uses a repeating process, which may be referred to as "reinforcement learning" (RL) to determine the codeword decoding order to be used by decoder 110. A more detailed description of DODC112 and its operation is provided below.

Fig. 2 illustrates a detailed exemplary embodiment of the second MIMO transceiver 104 illustrated in fig. 1. The MIMO transceiver 104 includes a MIMO equalizer 202 coupled to the antennas 108.MIMO equalizer 202 operates to equalize the channels received from antennas 108. The equalized channels output from the MIMO equalizer 202 are input to the codeword decoder 110 including a plurality of decoders (1-n). The decoder operates to decode codewords received by transceiver 104. The output of the decoder is input to an output circuit 204. Output circuitry 204 provides decoded codeword 206 to other processing elements of the MIMO transceiver. The decoded codeword 206 is also input to the decoding order determination circuit 112. The decoding order determination circuit 112 holds state information 208 and policy information 210. During operation, state information 208 is updated to include information about the received MIMO channel, noise level, interference, signal-to-noise ratio, and/or other state information about the received transmission, channel conditions, and antenna information. The policy information 210 is used to determine the decoding order and is updated based on decoding success.

During operation, decoding order determination circuit 112 uses policy information 210 and state information 208 to determine a decoding order to be used by decoder 110. For example, DODC112 outputs decoder control signal 218 that enables/disables one or more decoders 110. Thus, DODC112 determines the decoding order and generates decoder control signals 218 to control decoder 110 to implement the decoding order.

In one embodiment, output circuit 204 outputs a Decoded Codeword (DCW) 206, which is input to DODC 112.DODC 112 tests each codeword to determine if it has been successfully decoded. For example, a Cyclic Redundancy Check (CRC) is performed on the decoded codeword to determine whether the decoding was successful. Internal metrics, including interaction information based on an external information transfer (EXIT) graph, may be used in conjunction with the CRC to determine the reward. Based on the success of the decoding operation, DODC112 determines rewards that are stored and used to update the policy information. The updated strategy is used to determine the subsequent decoding order.

In one embodiment, DODC112 also updates state information 208 with the results of the decoding operation and other state parameters, such as updated channel estimation and signal-to-noise ratio (SNR) information. DODC112 then uses the updated state information 208 and the updated policy information to determine a subsequent decoding order. Successfully decoded codewords 216 are input to the MIMO equalizer 202 so that the channels associated with those codewords can be eliminated from the equalization process. DODC112 operates in a iterative manner until all received codewords have been successfully decoded or other decoding metrics have been met.

Fig. 3 illustrates an exemplary embodiment of the decoding order determination circuit 112 illustrated in fig. 2. The circuit 112 includes a decode sequence detector (DOD) 302, a memory 304, a prize determination circuit 306, a policy update circuit 308, a status interface 310, an equalizer interface 312, and a decoder interface 314, all coupled to communicate via a bus 316. Memory 304 includes any suitable memory, such as RAM, and stores information for rewards 318, policies 210, status 208, and decoding order 324. In other embodiments, DOD 302 may be implemented in programmable logic or a neural network.

In one embodiment, DOD 302 includes at least one of a processor, programmable logic, state machine, firmware, logic, and discrete components. During operation, DOD 302 obtains state information 208 and policy information 210 from memory 304 and determines decoding order 324 that is stored back to memory 304. Decoding order 324 is also provided to decoder interface 314, which generates decoder control signals 214 that control the operation of decoder 110 to implement the decoding order. In one embodiment, DOD 302 receives policy target 326 from another entity at a transceiver. Policy target 326 is used to update policy 210 and/or configure how policy 210 is applied.

After the decoding process, the decoded codeword is input to the prize-determining circuit 306. In one embodiment, the reward determination circuitry 306 comprises at least one of a processor, programmable logic, a state machine, firmware, logic, and discrete components. The prize-determining circuit 306 generates a prize based on the success of the decoding process. For example, the decoded codeword may be analyzed using a CRC check, an EXIT map, and/or other information to calculate a reward. A reward 318 is generated for successfully decoded codewords and stored in the memory 304.

In one embodiment, policy update circuit 308 includes at least one of a processor, programmable logic, a state machine, firmware, logic, and discrete components. Policy update circuit 306 obtains rewards 318 from memory 304 and processes the rewards to update policies 210 in memory 304.

In one embodiment, the status interface 310 receives various status information from various entities at the transceiver. For example, the state interface 310 receives MIMO channel information, antenna information, noise levels, signal-to-noise ratios, and other information that is combined to form the state 208 stored in the memory 304. State 208 also includes: a list of codewords, and a status of whether each codeword has been successfully decoded. In one embodiment, the state 208 is continuously updated as new information is available.

The decoder interface 314 interfaces with the decoder 110 to provide the decoder control signal 214 that implements the determined decoding order. For example, decoder control signal 214 enables and disables selected decoders from decoding selected codewords according to a decoding order.

Equalizer interface 312 interfaces with MIMO equalizer 202 to provide EQ candidate signal 212, which indicates that the codeword has been successfully decoded and thus can be removed from the equalization process. In one aspect, a SIC receiver providing MIMO channel equalization utilizes iterative decoding of codewords and then performs interference cancellation on the successfully decoded codewords. In one embodiment, DOD 302 controls equalizer interface 312 to output a new list of equalizer candidates that reflects codewords on a particular channel that have been successfully decoded based on state information 208.

Enhanced learning

In various embodiments, an reinforcement learning process is used to find the best order for decoding codewords that are candidates for decoding in each iteration. In one embodiment, various types of iterative processes may be employed, such as a Markov Decision Process (MDP). For example, the following procedure utilizes a state-behavior-rewards procedure, which may be defined as follows.

State space

The state space 208 is stored in the memory 304 and includes at least the following.

1. Set of codewords containing all undecoded serial candidates (s-cand)

2. Set containing all successfully decoded codewords (s-success)

3. And the criterion set reflects the channel condition seen by the code word. This may include the raw channel estimate, post-processing SINR from the MIMO equalizer, these functions, or any criterion reflecting codeword quality with respect to its decodability (e.g., ability to be successfully decoded).

4. Coding rate of code words, modulation order of data on code words

Action space

The action space includes one or more codewords from a set of codewords to be decoded. These codewords will be arranged to be decoded.

Rewards

Successful decoding of the codeword will result in a positive prize. Unsuccessful decoding will result in a negative or zero prize. If the codeword is made up of a plurality of code blocks, successful decoding of the code blocks will contribute to the reward. For example, a successful decoding will remove the channel from the MIMO equalization process. In one embodiment, an inner decoder criterion, such as external information (EXIT diagram), may be used in conjunction with the CRC check to determine the reward.

It should be noted that the optimal strategy for MDP, which results in a strategy for the decoding order of codewords, may be learned via any of a variety of reinforcement learning algorithms.

Fig. 4 illustrates an exemplary embodiment of a state space 400. For example, state space 400 is suitable for use as state 208 shown in FIG. 3. The state space 400 includes: a first portion 402 comprising codeword state information; a second portion 404 comprising channel state information; and a third portion 406 that includes antenna status information. It should be noted that state space 400 is exemplary and not exhaustive of all state information that may be used to form a state space.

The first portion 402 includes a codeword identifier 408, a decoded indicator 410, an undecoded indicator 412, a post-processing SINR 414, and an effective code rate 416. The second portion 404 includes a channel identifier 418 and a channel estimate 420. The third section 406 includes a receive antenna indicator 422 and an SNR value 424.

Other status information may also be associated with each codeword. When an undecoded codeword is successfully decoded, state space 400 is updated to identify the decoded codeword and update other portions of the state space. Thus, as the decoding process continues, the undecoded codewords will be systematically decoded according to the decoding order and marked as decoded until all codewords have been decoded or other decoding metrics are reached, such as decoding iteration time-out. The decoded codeword is provided to a MIMO equalizer that determines the channels that can be eliminated from the equalization process.

Fig. 5 illustrates an exemplary embodiment of a decoding strategy 500. For example, decoding strategy 500 is suitable for use as strategy 210 shown in fig. 3. The decoding strategy is used in conjunction with the state space 208 to determine the decoding order. The decoding policy is updated based on the rewards determined from the decoding process. The decoding strategy is updated at each iteration of the decoder loop, so the decoding strategy is dynamic and may change as the transmission environment changes. In one embodiment, the decoding strategy utilizes (1) an undecoded codeword, (2) a post-processing SINR, (3) channel estimation, (4) effective coding rate, (5) rewards, and (6) SNR. It should be noted that the decoding strategy may utilize other parameters.

Fig. 6 illustrates an exemplary method 600 for determining decoding order according to an exemplary embodiment of the present invention. For example, method 600 is suitable for use with DODC112 shown in FIG. 3.

At block 602, a plurality of codewords are received on a plurality of channels at a plurality of MIMO antennas. For example, the codeword is received at antenna 108 shown in fig. 2.

At block 604, MIMO channel equalization is performed to equalize the channels of the received codewords. For example, MIMO equalizer 202 performs equalization.

At block 606, a state space is determined. For example, state information 328 received from DODC112 determines a state space. The state space includes information about decoded and non-decoded codewords, channel estimates, SINR, and other parameters, as shown in state space 400 shown in fig. 4.

At block 608, a decoding order is determined to decode the one or more codewords based on the state space 208 and the decoding strategy 210. For example, decoding order 324 indicates an order in which codewords are decoded based on current state 208 and policy 210. In one embodiment, DOD 302 determines decoding order 314 based on state 208 and policy 210.

At block 610, one or more decoders are enabled to decode codewords based on the determined decoding order. In one embodiment, DOD 302 determines decoding order 324 and controls decoder interface 314 to enable selected decoder 110 to decode codewords on a selected channel. The decoded codeword is returned to DODC112 and received by prize-determining circuit 306.

At block 612, a reward is calculated based on the decoding result. For example, the prize-determining circuit 306 calculates a digital prize based on successfully decoded codewords by performing a CRC check on the decoding result.

At block 614, the decoding policy is updated. For example, the reward determination circuit 306 calculates the reward 318 based on decoding success or failure. The rewards 318 are stored in the memory 304. Policy update circuit 308 obtains rewards 318 from memory 304 and updates policy 210 based on rewards 318. In an exemplary embodiment, at iteration (n), policy 210 is updated based on rewards in iteration (n) and past state information (i.e., the state at iteration (n-1)). In response to the decoding result in iteration n, the state is updated for iteration n+1. At iteration (n+1), the updated strategy at iteration (n+1) is used along with the state of (n+1) to determine a new decoding order.

At block 616, the state space is updated. For example, state 208 is updated by DOD 302 with the decoding result and the received state information. For example, successfully decoded codewords are marked in state 208 and parameters for channels and antennas are updated based on information received by state interface 310.

At block 618, a determination is made as to whether all codewords have been successfully decoded. If all codewords have been successfully decoded, the method ends. In another embodiment, the method ends if certain decoding conditions are met. For example, if a timeout condition occurs, the method ends. If all codewords have not been successfully decoded, the method proceeds to block 620. For example, DOC 302 determines whether all codewords have been successfully decoded.

At block 620, candidates for interference cancellation are updated and sent to the equalizer. For example, channels associated with successfully decoded codewords are transmitted to MIMO equalizer 202 using equalizer interface 312 so that these channels may be removed from the equalization process. The method then proceeds at block 608 to determine the next decoding order of the remaining undecoded codewords.

Thus, the method 600 is used to determine a decoding order of codewords received in a MIMO system. It should be noted that the operations of method 600 are exemplary and that changes, modifications, additions, and deletions may be made within the scope of the embodiments.

Fig. 7 illustrates an exemplary apparatus 700 for determining a decoding order according to an exemplary embodiment of the present invention. For example, device 700 is suitable for use as DODC112 shown in FIG. 3.

The apparatus includes means (702) for determining a decoding order based on a state space and a decoding strategy, in one embodiment, the apparatus includes a DOD 302. The apparatus further includes means (704) for decoding the selected codeword based on a decoding order, which in one embodiment includes a decoder interface 314. The apparatus further comprises means (706) for updating the decoding strategy based on the decoding result and the state space, which in one embodiment comprises the strategy updating circuit 308. The apparatus further comprises means (708) for updating the state space based on the decoding result, which in one embodiment comprises the state interface 310. The apparatus further includes means (710) for updating the decoding order based on the state space and the decoding strategy, in one embodiment the means includes the DOD 302.

While particular embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that, based upon the teachings herein, that changes and modifications may be made without departing from these exemplary embodiments and their broader aspects of the invention. It is therefore intended in the appended claims to cover all such changes and modifications that are within the true spirit and scope of these exemplary embodiments of the invention.

Claims

1. A method for decoding codewords in a multiple-input multiple-output, MIMO, communication network, the method comprising:

determining a decoding order based on the state space and the decoding strategy;

decoding the selected codeword based on the decoding order;

updating the decoding strategy based on decoding results and the state space;

updating the state space based on the decoding result; and

the decoding order is updated based on the state space and the decoding strategy.

2. The method of claim 1, further comprising: the operations of claim 1 are repeated until a decoding completion criterion is met.

3. The method of claim 1, wherein the decoding operation comprises: one or more decoders are activated based on the decoding order to decode one or more codewords.

4. The method of claim 1, wherein updating the decoding policy further comprises:

calculating a reward based on the decoding result and an internal metric; and

updating the decoding strategy based on the rewards.

5. The method of claim 4, wherein calculating the reward comprises:

assigning a positive value to the prize for a successfully decoded codeword; and

and assigning a negative value to the reward for unsuccessfully decoded codewords.

6. The method of claim 5, further comprising checking a decoded codeword cyclic redundancy check, CRC, value to determine decoding success.

7. The method of claim 5, wherein calculating the reward comprises determining the reward from a combination of one or more of a number of successfully decoded codewords or code blocks within a codeword, a number of unsuccessfully decoded codewords or code blocks within a codeword, and an inner decoding metric comprising extrinsic information related to the codeword.

8. The method of claim 1, further comprising receiving the codeword on a plurality of channels at a plurality of antennas.

9. The method of claim 8, further comprising equalizing the plurality of channels based on the decoding result.

10. The method of claim 1, wherein updating the state space comprises updating the state space based on one or more of successfully decoded codewords, unsuccessfully decoded codewords, post-processing signal-to-interference-plus-noise ratio SINR, coding rate, channel estimation, and antenna SNR.

11. The method of claim 1, further comprising updating the decoding policy based on a policy objective.

12. The method of claim 4, wherein the updating operation comprises updating the decoding policy based on the reward using a neural network.

13. An apparatus for decoding codewords in a multiple-input multiple-output, MIMO, communication network, the apparatus comprising:

a decoding order determining circuit that determines a decoding order of the decoded codeword based on the state space and the decoding strategy;

a prize determination circuit that receives the decoded codeword and determines a prize based on the decoding result;

a policy updating circuit that updates the decoding policy based on the decoding result and the state space;

a state interface updating the state space based on the decoding result; and

wherein the decoding order determination circuit generates an updated decoding order based on the state space and the decoding policy.

14. The apparatus of claim 13, further comprising one or more decoders activated in the decoding order to decode one or more codewords.

15. The apparatus of claim 13, wherein the policy updating circuit updates the decoding policy by calculating a reward based on the decoding result and an internal metric and updating the decoding policy based on the reward.

16. The apparatus of claim 15, wherein the policy updating circuit calculates a prize by assigning positive values to the prize for successfully decoded codewords and negative values to the prize for unsuccessfully decoded codewords.

17. The apparatus of claim 16, wherein the policy update circuit calculates the reward by determining the combination of one or more of the reward from a number of successfully decoded codewords or code blocks within codewords, a number of unsuccessfully decoded codewords or code blocks within codewords, and an inner decoding metric comprising external information related to the codewords.

18. The apparatus of claim 13, further comprising an equalizer interface that equalizes a plurality of channels conveying the codeword based on the decoding result.

19. The apparatus of claim 13, wherein the state interface updates a state space based on one or more of successfully decoded codewords, unsuccessfully decoded codewords, post-processing signal-to-interference-plus-noise ratio SINR, coding rate, channel estimation, and antenna SNR.

20. An apparatus for decoding information in a multiple-input multiple-output, MIMO, communication network, the apparatus comprising:

means for determining a decoding order based on the state space and the decoding strategy;

means for decoding selected codewords based on the decoding order;

means for updating the decoding strategy based on the decoding result and the state space;

means for updating the state space based on decoding results; and

means for updating the decoding order based on the state space and the decoding policy.