PULSE AMPLITUDE MODULATION DRIVERS AND TRANSMITTERS WITH REDUCED POWER ONSUMPTION
DESCRIPTION
Background Art High-speed interconnect links have been widely used in high-speed network switching, local area networks, memory buses and multi-processor interconnection networks. Many high-speed digital signals are transmitted between analog and digital chips and between field programmable gate array (FPGA) chips in a system. High-speed inter-chip links can significantly reduce the total number of signal traces between chips in a system. For example, 500 signals at 100 Mb/s between two FPGA chips can also be transmitted by a 10 Gb/s interconnect with 5 channels. In such applications, overall system performance increases as the communication speed between chips in the system increases. For a given data rate, multi-level signaling can be used to reduce the channel symbol rate. Multi-level signaling also lowers the required maximum on-chip clock frequency, intersymbol interference (ISI), and crosstalk.
"Pulse amplitude modulation" or "PAM" signaling is an important kind of multilevel signaling. PAM signaling schemes are specified in part by the number of levels used to represent symbols. A "4-PAM" system uses 4 levels, a "6-PAM" system uses 6 level, etc. While PAM signaling has many benefits, as described above, it requires that transmitted power be high enough to compensate for the impacts that multi-level signaling has on bit error rate (BER). Since there are typically several drivers in a parallel bus signaling system, the power dissipation of the drivers can be significant. It is therefore desirable with PAM drivers and transmitters to reduce power consumption as much as possible while maintaining the maximum possible speed and performance.
Disclosure of Invention The multi-level PAM driver, which is disclosed here, exhibits power-efficiency while maintaining high-speed capabilities through use of a look-ahead technique involving the pre-switching of current sources. A transmitter architecture based on a 4-PAM embodiment of the driver is also disclosed. In some embodiments of the transmitter, using
multiplexers to serialize the input data prior to the PAM driver can minimize the number of on-chip blocks that have to work at full speed.
In example embodiments, a multi-level pulse amplitude modulation (PAM) driver includes at least one, and possibly multiple drive unit. At least one of the drive units includes at least one tail current source, which can be powered off when not needed for signal transmission if there are multiple drive units. In some embodiments, at least two tail current sources are present. These current sources may be implemented by current sourcing transistors. A plurality of current steering transistors is connected to the tail current source(s) to form a differential drive circuit. A pre-switcher is connected to the tail current source(s). When a signal transition is upcoming which will switch the driver to a state where a tail current source is needed, the pre-switcher provides a way to switch, which in some cases can include powering on, the tail current source in advance of the signal transition to provide a specified settling time prior to the signal transition. In this way, the current sources of a driver can be switched in such a way that there is no settling wait, thus maintaining the high-speed capability of the driver. In a multiple drive unit arrangement, the current sources can be switched off when not needed to conserve power. In a single drive unit arrangement, power can be conserved simply by eliminating a pre- driver.
The pre-switcher can be implemented by traditional circuit elements, such as a plurality of switching transistors and a plurality of delay elements connected to the current steering transistors and the switching transistors. In such an embodiment, inverters can serve as delay elements. Alternatively, the pre-switcher could be implemented by a software or firmware programmed device such as an FPGA or digital signal processor.
An apparatus for transmitting pulse amplitude modulation (PAM) signals can be a single transmitter, or multiple transmitters on a semiconductor chip, either alone or with other components. In some embodiments, a PAM transmitter built around the driver includes a data input; a differential output; at least one encoder connected to the data input. The driver is connected to the encoder or encoders. One or more multiplexers can be connected between the encoders) and the driver, to serialize the output of the encoder(s) to increase the speed of the data input to the driver. In this way, other on-chip blocks can operate at slower speeds.
Brief Description of The Drawings FIG. 1 is a schematic diagram of a unipolar PAM drive unit. FIG. 2 is a schematic diagram of a bipolar PAM drive unit. FIG. 3 is simplified schematic diagram of a 4-PAM, bipolar driver according to embodiments of the invention.
FIG. 4 is a more detailed schematic diagram of one of the drive units of the 4- PAM, bipolar driver of FIG. 3. FIG. 4 is shown in three parts for convenience, Figures 4A, 4B, and 4C.
FIG. 5 shows voltage waveform diagrams as FIGs 5A and 5B that illustrate some of the characteristics of a 2-PAM driver according to the invention.
FIG. 6 is a simplified schematic diagram of a 4-PAM, unipolar driver according to embodiments of the invention.
FIG. 7 is a block diagram of a PAM transmitter according to embodiments of the invention. FIG. 8 is a logic diagram of a portion of the PAM transmitter illustrated in FIG. 7.
FIG. 9 is a combination logic and block diagram of another portion of the PAM transmitter illustrated in FIG. 7. FIG. 9 is presented in two parts, Figures 9A and 9B.
FIG. 10 is a combination logic and block diagram of another portion of the PAM transmitter illustrated in FIG. 7. FIG. 11 is another combination logic and block diagram of another portion of the
PAM transmitter illustrated in FIG. 7. FIG. 11 is presented in two parts, Figures 11 A and 11B.
Best Mode(s) for Carrying Out the Invention The present invention will now be described in terms of specific, example embodiments. It is to be understood that the invention is not limited to the example embodiments disclosed. It should also be understood that not every feature of the circuits and/or devices described is necessary to implement the invention as claimed in any particular one of the appended claims. Various elements and features of various embodiments of devices and processes are described to fully enable the invention.
Additionally, references may be made comparing various types of circuits and devices in order to assist the reader in understanding the invention. Such references are not to be
construed to mean that any particular combination of the circuits discussed is required to implement the invention. It should also be understood that throughout this disclosure, where a process or method is shown or described, the steps of the method may be performed in any order or simultaneously, unless it is clear from the context that one step depends on another being performed first. It should be noted that much of what is disclosed in this application is also disclosed in commonly owned, United States provisional patent application serial number 60/408,574, filed September 6, 2002, which is incorporated by reference.
It should also pointed out that references are made throughout this disclosure to figures and descriptions using terms such as top, bottom, above, beneath, below, within, on, at, vertical, horizontal, upper, lower, right, left, etc. These terms are used merely for convenience and refer only to the relative position of features as shown from the perspective of the reader. An element that is placed or above another element in the context of this disclosure, or otherwise described as "upper" or "lower" can be found at any position in an actual product.
As shown in Figures 1 and 2, there are generally two different architectures for PAM drivers: unipolar and bipolar. An explanation of these basic architectures and their operation will serve to aid the reader in understanding the terminology conventions that apply to this disclosure. A unipolar architecture basic drive unit, 100, is illustrated in FIG. 1. In unipolar architectures, current / is steered either in the right or left branch of the circuit. Transistors 102 and 103 are referred to as steering transistors, or current steering transistors, and the two branches of the circuit are referred to as differential branches. The transistors are also sometimes referred to as the current-steering differential pair. The steering transistors are gated by the input data, labeled Bitl in this example, and it's compliment as indicated so as to form a differential drive circuit. The outputs of the drive unit, Out- and Out+ are differential and produced between the transistors and resistors R, which are connected to VDD. The current source, sometimes referred to as a "tail" current source is disposed between the transistors and ground. The tail current source is often a single current sourcing transistor, but a multiple transistor circuit or some other current supplying device can be used. In this case, single-ended output voltages would be either VDD or VDD-RI. Therefore, the differential swing is RI.
A bipolar drive unit, 200, is illustrated in FIG. 2. With a bipolar architecture, there is an upper tail current source, 202, and a lower tail current source, 204. In this case, two current steering transistors are used in each differential branch, one an NMOS transistor, such as transistors 206 and 208, and one a PMOS transistor such as transistors 210 and 212. In a differential drive circuit like that of FIG. 2, the output voltages would be either VDD/2+RI or VDD/2-RI, which makes the differential swing equal to 2RI. Since the power dissipation in both architectures is can be represented the same way, VDD • I, the bipolar architecture is more power-efficient; for a given swing, it needs half the power of unipolar architectures. As is known in the art, drive units such as those shown in Figures 1 and 2 are connected together to increase the number of signal levels in a PAM system. In a 4-PAM system, the first drive unit is said to be driven by Bitl and the second drive unit is said to be driven by Bit2. For power sensitive applications it is known that tail current sources for all but the first drive unit can be shut off when the level equivalent to +RI is generated, that is when Bitl is '0' and Bit2 is 7 '. Since the driver power dissipation is directly proportional to the total current (P=VDD • Itotai), the driver power consumption can be reduced by a factor of 1.5 for 4-PAM. This power saving is even more significant in multi-level PAM schemes that have more than 4 levels. For example, in an 8-PAM system, in which a driver may include 4 drive units, the above factor would be 1.75. However, in the present art, this technique is restricted to relatively slow-speed systems, since settling times for switching the current sources limit circuit transmission speed. FIG. 3 shows the basic architecture of a bipolar driver, 300, according to some embodiments of the invention. In this example, the driver is a 4-PAM driver and is thus composed of two basic drive units. To reduce the power consumption, the right unit, 302, can be turned off whenever ±RIis to be transmitted. This can be done by proper selection of its inputs, which are labeled B2nl, B2n2, B2pl, andB2p2, and are the gates of current steering transistors 304, 306, 308, and 310, respectively. The labeling convention for the inputs will become clear upon reading the explanation of FIG. 4, below. The top current source, 312, can be turned off by pulling up both B2pl and B2p2 inputs, while for turning off the bottom current source, 314, B2nl and B2n2 signals can be pulled down. A pre- switcher, 316, is connected to the tail current sources and the inputs to accomplish the powering off and powering on of the current sources. Some of the connections are
omitted in FIG. 3 for clarity, but will be detailed below. This architecture has the advantage of modularity. This 4-PAM topology can be easily changed to 6-PAM by just adding another basic unit. Therefore this architecture can be used as a general power- efficient architecture for multi-level PAM drivers. For 6-PAM and above driver, each basic drive unit can have its own pre-switcher, or a more complex pre-switcher can serve all the pre-switched drive units in the driver. A pre-switcher according to embodiments of the invention can also be used with a 2-P AM driver having a single drive unit. In such a case, current sources are not necessarily switched off when not needed, but power can be conserved by eliminating a pre-driver, as will be discussed in more detail below. An architecture like that illustrated in FIG. 3 can significantly reduce the power by simply switching current sources off when not needed, and switching them on when they become needed at a signal transition. However, switching current sources in such a manner reduces the maximum operating speed of the driver since current sources need some time to settle when switched. Therefore, a data look ahead technique is used by the pre-switcher. FIG. 4 illustrates the detail of drive unit 302 of FIG. 3 and the pre-switcher and their method of operation. Like reference numbers are used for like structures in FIG. 3 and FIG. 4. In the example of FIG. 4, current sourcing transistors 402 and 404 serve as upper and lower tail current sources, respectively. These current sourcing transistors are biased with voltages Vbiasp and Vbiasn, respectively. In some embodiments the bias may be applied in such a way as to maintain like characteristics among current sources. Four branches for each drive unit are used to pre-switch the current sources. These branches are shown connected to the drive unit in FIG. 4A. The two branches that switch current sourcing transistor 402 are made up of NMOS switching transistors 404, 406, 408, and 410. The two branches that switch current sourcing transistor 404 are made up of PMOS switching transistors 412, 414, 416, and 418. The notation used for the gate inputs can now be discussed. The first two characters in the notation correspond to the bit, such as, in this example, "B2" for Bit2, as previously discussed. A signal identified with &p in the next position is associated with a PMOS steering transistor (notwithstanding that it may be connected to an NMOS switching transistor as well). Likewise, a signal identified with an n in the third position is associated with an NMOS steering transistor. The fourth character is a number, where 1 means the signal is associated with the left differential branch of the drive circuit, and 2 is associated with the right differential branch. These
numbers represent a randomly chosen convention that could be easily changed. Finally, all timings are referenced to a steering transistor gate signal, which has no more characters in the notation. If an a appears after the branch designation in a signal notation, that signal transitions in advance of the reference transition, and if a d appears after the branch designation, the signal transitions with a delay.
FIG. 4B illustrates a plurality of delay elements, 430, which can be inverters, which in this example are connected to differential branch 1 of the drive unit. These are organized into two chains of three elements each, connected according to the signal notations shown. Transitions propagate through a series of delay elements in the direction of the arrow. FIG. 4C likewise shows the plurality of delay elements, 450, organized the same way, which are connected to differential branch 2 of the drive unit. Again, transitions propagate in the direction of the arrow. Thus current is supplied from a current source and steered between differential branches of the drive unit substantially in response to input data. A current source can be powered off when not needed for the PAM transmission, and powered on in advance of a signal transition to provide a specified settling time. The settling time required will vary depending on the characteristics of the transistors used in the circuits, but can be readily established as a design parameter by one of ordinary skills in the art. Delay elements can be added or one can be removed from each chain accordingly. The mechanism can be further described with an example of the timing for a portion of the circuit. Assume that both current sources in FIG. 4 are off and signal B2nl is about to go from low to high. Since signal B2nl is the output of an inverter whose input signal is B2nla, signal B2nla goes low before the signal B2nl goes high (one inverter delay, in this example, roughly 40 ps). Since the signal B2nld is the delayed version of signal B2nl ( in this example, a two inverter delay, roughly 80 ps), at the transition of B2nla from high to low B2nld is still low. Thus transistors 412 and 414 turn on, which then turns on transistor 404. By the time that signal B2nl goes from low to high, after one inverter delay, the current in transistor 404 has settled down. The current source is turned on slightly before it is actually needed by the transition of signal B2nl. Two inverter delays after the transition of signal B2nl, signal B2nld goes high and turns off the 412- 414 branch.
In addition to saving power in the driver itself in some embodiments, this architecture can eliminate the need for a pre-driver. Pre-drivers are sometimes connected to PAM drivers to switch the gate voltages on the current-steering transistors so that current steers smoothly from one output to the other. To achieve this smooth steering, tail current sourcing transistors should stay in saturation region. Thus, driver inputs should switch between VDD and a voltage slightly less than the sum of the voltage at the tail transistor and its threshold voltage. In transmitters without a pre-driver or a pre-switcher, the crossover voltage of the inputs of the driver, Bitl and its compliment, cannot hold both steering devices on, and thus the tail transistor will fall out of saturation. This falling out of saturation not only reduces the speed of the driver, but also creates overshoot and undershoot in the output. The use of the pre-switcher can alleviate this problem, eliminating the need for a pre-driver. This effect is illustrated in FIG. 5. FIGs. 5 A and 5B compares two voltage waveforms for the output of bipolar, 2-PAM drivers. Waveform 502 is for a driver using the pre-switching technique. Waveform 504 is a driver that does not use a pre-switcher or a pre-driver. Eliminating the pre-driver from a transmitter also reduces power consumption, since a high-speed pre-driver is typically a power-hungry block.
FIG. 6 illustrates how the same type pre-switching can be used in an embodiment of a 4-PAM driver, 600, that uses unipolar architecture. In this example, drive unit 602 includes two, NMOS steering transistors, 604 and 606, and one tail current source, 608. Pre-switcher 610 is connected to the current source. Again, connections to the steering transistors are omitted for clarity. The operation of this pre-switcher is the same as the operation of the pre-switcher illustrated in FIG. 4, except that the pre-switcher in FIG. 6 only includes half the branches of the one in FIG. 4. FIG. 7 illustrates an example block diagram of a transmitter, 700, which makes use of a 4-PAM bipolar driver, 702, according to embodiments of the invention. Each of four substantially identical encoders, 704 has six outputs, which correspond to the driver inputs. The encoders receive input data from input block 706. In this example, four encoders work in parallel to generate the data at a quarter of driver speed, in this case, 1.25 Gb/s. This lower speed makes the design of the encoder units simpler. 4: 1 multiplexers,
708, are used for serializing data from the outputs of the encoders. Therefore, the outputs of the multiplexers (inputs of the driver) are at full speed, in this example, 5 Gb/s. Note
that all the signal inputs to the driver use designations with an a on the end, including the Bit 1 input, Bla, and it's compliment, _Bla. As previously discussed, since timings are referenced to steering transistor transitions, all input bits to the driver in this example transition in advance of the time reference. Other signal designations are as previously discussed. Note that an alternate design can have encoders encoding at full speed, in which case the multiplexers are not needed.
The transmitter includes additional blocks in support of the operation of the driver. -As shown in FIG. 7, two differential clock signals are received at 1.25 GHz, where one is 90 degrees phase shifted. These clock signals are converted from differential to single- ended at blocks 710, and a doubler, 712, also produces a 2.5 GHz clock. These clock signals serve the encoders and multiplexers. Replica bias block 714 services to equalize the characteristics of current sourcing transistors in the transmitter. A differential output for the transmitter is labeled XI and _X1. Output resistances 716 and 718 relative to VDD/2 can be theoretically fixed but are provided in the transmitter so that they are variable and controlled with a control voltage so that proper values can be achieved despite process variation.
FIG. 8 shows an example circuit design for one of the encoders of FIG. 7. As shown in this figure the circuitry for each encoder consists of 3 inverters, 802, 803, and 804, two NAND gates, 806 and 808, and six flip-flop gates, 810, 812, 814, 816, 818, and 820, one for each of the driver input signals. Each flip-flop is clocked with the main clock input signal. The inputs to the encoder are designated INI and IN2.
An example multiplexer architecture for the transmitter of FIG. 7 is shown in FIG. 9. Each 4:1 multiplexer consists of two stages and each stage doubles the data rate. The inputs to a 4:1 multiplexer are labeled DO, Dl, D2, and D3. The first stage includes two 2:1 multiplexers, 902 and 904 of FIG. 9 A. The two clocks with 90-degree phase shift, CLK1, and CLK2, are used in by these multiplexers. These clock signals are differential and include both the clock signal and it's compliment. Multiplexer 906 of FIG. 9 A doubles the rate again to complete the serialization of the data. This multiplexer is clocked by the high-frequency clock. The basic architecture of each 2: 1 multiplexer in this example is illustrated in FIG. 9B, and is similar to a dual edge trigger flip-flop. This particular 2: 1 multiplexer is used in the first stage of the 4: 1 multiplexer as can be appreciated due to the fact that it has DO and Dl as inputs. In this design, switches at the
inputs of inverters 908, 910, and 912 only use NMOS transistors, such as transistors 914, 916, 918, and 920 to reduce the switch parasitic capacitance and therefore to increase the speed. Threshold voltages of the inverters that are right after the switches are designed to be less than VDD/2 by careful sizing. This sizing compensates for the inefficiency of these switches in passing a "high" signal. Moreover, the on-chip termination reduces the ringing due to package parasitics. Additional inverters 922 and 924, and PMOS transistors 926 and 928 complete the circuit.
A pseudo-NMOS XOR, as shown in FIG. 10 is used as a frequency doubler to generate the high frequency clock. Four taps of the lower speed clock that are 90 degrees apart covering phase 1 , phase 2, phase 3, and phase 4 of a clock cycle feed NMOS transistors 1002, 1004, 1006, and 1008. The output is generated at PMOS transistor 1010 and inverter 1012 separates the high frequency clock signal from its compliment.
FIG. 11 shows example circuitry for the transmitter termination. The driver, 702 from FIG. 7, is shown in FIG. 11A. The rest of the transmitter components are omitted for clarity and are indicated by ellipsis dots . Resistances 716 and 718 from FIG. 7 are also shown in FIG. 11 A. A stable reference voltage VDD/2 is created by operational amplifier
1101 from supply voltage VDD/2. A factor in transmitter design is the variation of differential termination and single-ended termination with frequency. Ideally, with perfect matching, only differential termination is important. However, due to mismatches between the two differential outputs, there can be some small single-ended reflections as well. Therefore, single-ended termination is also important. Ideally in FIG. 11, there would be no current going to the VDD/2 power supply and ideally the differential termination is always 100 Ω in parallel with the differential output impedance of the driver. Therefore, this structure shows small differential-termination variation for changes in the input frequency. However, in this architecture, single-ended termination can vary significantly with frequency. At high frequency, capacitance 1102 at node 1104 of FIG. 11 A is short circuit and therefore the single-ended termination would be roughly 50Ω. However at low frequency there are some impedances between node 1104 and ground and this changes the single-ended termination at each output. Increasing capacitance 1102 at node 1104 can alleviate this situation. Fortunately, increasing the capacitance at node
1102 can also decouple the noise of the VDD/2 reference voltage. In a chip embodying a transmitter, an external (off-chip) capacitance can be used for this purpose to reduce the
chip area at the cost of adding one pin to the chip. Another method that can be used to solve this problem is to increase the gain of the OP AMP, 1101, used for generating VDD/2, although at a cost of increased power consumption by the OP AMP.
FIG. 1 IB shows how the termination resistances 716 and 718 are implemented in the example embodiment of a transmitter. The external control voltage can be used to adjust the resistance of NMOS transistor 1120, which works in triode. The transistor is connected in parallel with 25 ohm resistor 1122, and the resulting transistor/resistor combination is connected in series with 25 ohm resistor 1124. This structure is used to improve the termination resistor linearity since the nonlinearity of the NMOS transistor does not directly affect the termination resistor linearity. The structure may also be needed for compensation for process variation in the case of mass-produced chips containing PAM transmitters according to this embodiment.
As previously discussed, a PAM transmitter according to example embodiments of the invention may be provided in the form of a semiconductor device. Such a device may include multiple transmitter units that can be used together or independently.
Additionally, such a device may include the transmitter and other circuitry that is used to provide other function related to signal transmission, or processing functions from which the information to be transmitted results.
A pseudo random bit sequence (PRBS) unit consisting of two 27-l PRBS generators has been used to produce random test data for a sample 4-PAM transmitter embodiment of the invention made using 0.18μm standard digital CMOS technology. The sample 4-level PAM transmitter draws 39 mA from 1.7V power supply. The driver and VDD/2 reference generator consume roughly 12.5 mW at 7 Gb/s. The transmitter has a random output jitter of 22 ps (peak to peak) at 7 Gb/s, which is small compared to the eye opening of an eye diagram generated by measuring its output. A better jitter benchmark for multi-level signaling is the eye opening. An eye diagram for the sample transmitter has a maximum eye height of 140 mV and an eye width of 200 ps over a 0.8 m cable and 3 cm printed circuit board (PCB) traces at 7 Gb/s. An eye opening of 200 ps at the 3.5 GS/s rate corresponds to a 70% eye-opening. By increasing the speed from 7 Gb/s toward 10 Gb/s with 1.7V power supply the eye opening gets smaller. This problem can be resolved by increasing the power supply voltage from 1.7V to 2V to increase the input reference current by 25%. With this
configuration, the sample transmitter at 8 Gb/s exhibited an eye diagram with a maximum eye height of 140 mV and an eye width of 160 ps. This 160 ps eye opening at the 4 GS/s rate corresponds to a 70% eye opening. The eye diagram at 10 Gb/s, is open, but with the duty cycle of the clock less than 50%, eyes with different widths may be produced. A duty cycle correction circuit can be used to address this situation, or a delay locked loop (DLL) can be used to generate four different quadrature clocks.
As frequency increases with a fixed termination resistor, the swing decreases due to the parasitic capacitance at the output. Thus, to get the same swing the input biasing current can be increased. Power dissipation therefore increases accordingly with frequency. The following table shows the measured driver power and total transmitter power at different frequencies, when the input reference current is fixed and the sample chip is supplied by a 2 V power supply. Since the power supply voltage and the input reference current would likely be increased by 20% and 25% respectively in an actual application, the power dissipation can be expected to increase roughly by a factor of 1.5, but would still be low compared to a typical driver with no pre-switcher.
To test the performance of the sample driver in the presence of power-supply noise, a 0.45V peak-to-peak noise was added to the driver power supply when operating the test device at 1.8 V. Channel attenuation caused ISI and imperfect termination results in reflection when the power supply noise was added. For testing purposes, the interconnect between the PAM transmitter and the test equipment included circuit board traces connectors and cables, and these connection discontinuities were responsible for some of the ISI. However, these results show that on-chip decoupling may need to be used in embodiments of the invention where power supply noise will be encountered.
Specific embodiments of an invention have been herein described. One of ordinary skill in the electronics arts will quickly recognize that the invention has numerous other embodiments. In fact, many implementations are possible. The following claims are in no way intended to limit the scope of the invention to the specific embodiments described.