# A 10.8pJ/bit Pulse-Position Inductive Transceiver for Low-Energy Wireless 3D Integration

Benjamin J. Fletcher, Dept. of Electronics and Computer Science University of Southampton, UK. bjf1g13@ecs.soton.ac.uk Shidhartha Das, Arm Ltd Cambridge, UK. shidhartha.das@arm.com Terrence Mak, Dept. of Electronics and Computer Science University of Southampton, UK. tmak@ecs.soton.ac.uk

Abstract—This paper presents a low-energy die-to-die inductive transceiver for use within a stacked 3D-IC. The design is implemented in a 2-tier 0.35um CMOS test chip and demonstrates vertical communication at a rate of 133Mbps/channel, across a distance of 110um, whilst consuming only 10.8pJ per transmitted bit. This represents a  $5.3 \times$  improvement when compared to state-of-the-art inductive transceivers by combining: (1) 3-ary pulse-position modulation, to encode data in terms of the latency between sequential pulses (rather than using one-to-one pulsecode mappings), and (2) A tunable current driver circuit to adjust the transmit current dynamically based on the quality of the stacked die assembly.

#### I. INTRODUCTION

Wireless (or *contactless*) 3D integration using inductive coupling links (ICLs) is a promising low-cost alternative to through silicon vias (TSVs) for heterogeneous integration of multiple dies fabricated in diverse process technologies. ICLs allow data to be communicated vertically through the 3D stack, purely by electromagnetic (EM) coupling between planar inductors fabricated in the back-end-of-line (BEOL) interconnect layers of each die. This means that, when opting for ICLs, existing fabrication processes can be used without alteration, thereby precluding the need for the inflated design, fabrication and testing costs associated with TSVs [1]. This makes ICLs an attractive *low-cost* 3D integration option, particularly for cost-sensitive Internet-of-Things (IoT) applications.

Despite the obvious economic advantages of wireless 3D integration, ICLs suffer from poor energy-efficiency when compared with TSVs [2], especially when communicating over a long distance (*e.g.* through many stacked dies). This is because the transmit current, required to form a magnetic field that penetrates the silicon substrate, is an exponential function of the communication distance, X.

In this paper, we address the energy-overheads inherent in wireless 3D integration. We make the following contributions:

- 1) A novel inductive transceiver for energy-efficient communication across long-range channels ( $X \ge 100 \,\mu\text{m}$ ) that maintains low power consumption by encoding data in terms of the latency between pulses using pulse-position modulation (PPM). The presented design also includes a dynamically tunable current driver to ensure the transmit current is absolutely minimized, for a given scenario.
- 2) A simulation framework to analyze and dimension inductive-link design parameters. In particular, we ana-



Fig. 1. Illustration of a multi-tier wireless 3D-IC with a vertical system bus implemented using near-field inductive coupling links (ICLs). The highlighted die-to-die transceiver element forms the focus of this paper.

lyze inter-tier link misalignment and its impact on signal integrity in a multi-tiered stacking configuration.

3) Measurement results from a chip that incorporates the presented transceiver and successfully demonstrates 133Mbps inter-tier communication at 10.8pJ/bit across a 110  $\mu$ m channel. This is an improvement better than 5.3× compared to previously reported implementations.

The remainder of the paper is structured as follows. Section II provides an overview of prior works implementing ICLs for 3D integration. Section III presents the proposed lowenergy die-to-die transceiver, before measurement results and conclusions are presented in Sections IV and V respectively.

## II. 3D INTEGRATION INDUCTIVE COUPLING LINKS

Fig. 1 shows an example of a 4-tier 3D-IC using wireless 3D integration. Here, the stacked dies are connected by a vertical system bus, implemented using ICLs. Within each ICL, data packets are communicated by encoding the bit-stream as a series of current pulses. These pulses are then driven through a planar transmitting (Tx) inductor, forming a magnetic field within the die stack. Through EM induction, corresponding voltage pulses can be detected in similar inductors, fabricated in neighboring stacked receiving (Rx) dies. These pulses can in-turn be used to decode the transmitted bit stream.

All previous works focusing on 3D integration using ICLs ([1], [3]–[5]) use the inductive non-return-to-zero (NRZ) signalling scheme proposed in [1] by Miura *et al.* Here, each rising/falling data edge is encoded as a current pulse with corresponding positive/negative polarity, in a 1-to-1 mapping. This is a robust solution that allows data to be simply encoded



Fig. 2. Multi-channel 3D-IC using the proposed pulse-position modulation (PPM) inductive transceiver, consisting of: the PPM encoding logic, tunable current driver circuitry, the inductively coupled channel, Rx sense amplifier, and PPM decoding logic.



Fig. 3. Illustration of *N*-ary pulse position modulation (PPM), used in the presented die-to-die transceiver, for N=3. Here, *N* data bits are represented by an  $I_{\text{Tx}}$  pulse occurring at the COUNT position corresponding to their value.

using a delay buffer and H-Bridge transmitter, and decoded using just a sense amplifier (SA) and set-reset (SR) latch [1], [5]. Despite this, inductive NRZ encoding requires, on average, one Tx pulse per transmitted bit, resulting in poor energy efficiency when communicating over large distances.

#### **III. DIE-TO-DIE INDUCTIVE TRANSCEIVER**

To address this challenge, we present a novel low-energy inductive transceiver. Fig. 2 shows schematically the 8-channel 3D-IC fabricated for testing in this paper. Each channel uses the proposed die-to-die inductive transceiver consisting of: PPM modulation/demodulation circuits, a tunable current driver, and the inductive EM channel. The design of each of these components is elaborated in the following sub-sections.

#### A. Pulse-Position Modulator/Demodulator

Fig. 3 illustrates the *N*-ary pulse position modulation (PPM) scheme adopted in this work. Here, *N* binary bits (in this case N=3) are represented by a single current pulse, denoted by its time-domain position within the data-frame. For example, 'b011 is represented by a pulse in position COUNT=3, 'b110 by a pulse when COUNT=6, *etc.* Whilst this reduces the number of pulses required for a given bit stream by  $N\times$ , a trade-off clearly exists, as the frequency (and hence energy contribution of the digital encoding logic) increases proportionally to  $2^N$ . For this reason, *N* should be carefully selected to best exploit the energy trade-off, depending on the communication distance *X*.

Fig. 2 shows the implementation of the *N*-ary PPM modulator and demodulator circuits used in the proposed transceiver, incorporating *N*-bit counters (for the COUNT signal) and XOR-based match logic. On the Rx-side, a sense amplifier (shown in Fig. 4) is used due to the depleted amplitude of the Rx voltage signal (determined by  $M \cdot dI_{\text{Tx}}/dt$ ). The



Fig. 4. Buffered sense-amplifier (SA) adopted in the proposed transceiver. Here, the differential Rx signal is applied across  $IN_N/IN_P$  and the sensitivity of the SA is adjusted by carefully sizing the differential pair MN4.



Fig. 5. (a) Dynamically controlled current driver circuits, and (b) transmit current control register tuning algorithm.

biasing resistors,  $R_p$  (c.f. Fig. 2) are selected to be  $1.5 \text{ k}\Omega$  to provide a high input impedance, and their layout is considered carefully to ensure precise matching. To minimize the power consumption of the system, a separate supply domain is used for implementing the modulation logic, where near-threshold voltage scaling is applied (V<sub>DD</sub>=1.8V).

## B. Tunable Current Driver Circuits

Prior works (such as [1]) use static H-Bridge drivers, where the widths of the driving transistors are fixed. Whilst this is an adequate solution, the transmit current must be sized to meet the *worst-case* scenario in terms of die-to-die stacking alignment, substrate thinning variation, and epoxy thickness. These factors will vary slightly between ICs meaning that prefixed drivers are unlikely to yield *optimal* efficiency.

To address this, we use the dynamically adjustable current driver architecture (based on that presented in [4]) shown in Fig. 5 (a). Here, multiple driver stages are connected in parallel, each consisting of an NMOS of width w and a PMOS



Fig. 6. Scatter plot showing the simulated efficiency vs. area trade-off, including pareto-optimal frontier (only a small sample of trialed layouts are presented for clarity). The  $250\,\mu\text{m}\times250\,\mu\text{m}$  square geometry used for practical measurement results in Section IV is highlighted.



Fig. 7. (a) Layout parameters of selected channel inductors. (b) EM simulation setup and (c) simulated M when mapped to the equivalent circuit channel model in Fig. 2.

of width 2w. w varies between stages (as shown) and each stage can be enabled independently using the control register ITX\_CTRL\_REG. This allows the Tx current to be finely tuned depending on the channel quality. Fig. 2(b) presents the algorithm used to set ITX\_CTRL\_REG. Initially, the Tx die sends a pilot pulse and begins the PPM COUNT timer. If no acknowledgement has been received before the timer expires, the register value is incremented, and the process is repeated. Simultaneously, if the Rx die is in calibration mode, upon receiving a pulse, it will immediately return an acknowledgement pulse (using the maximum ITX\_CTRL\_REG) to signal that the link is operational. Once this process has been completed, a small offset is added as a hysteresis margin.

## C. Inductive Link

To maintain low power consumption in the transceiver, it is important that the inductor layouts (used for forming the wireless link) are optimized. To ensure this, a range of inductor geometries were simulated with varying layout parameters (diameter, number of turns, track-width and track-spacing) and shapes (square and octagon). Fig. 6 shows a scatter plot of link efficiency ( $V_{\text{Rx}}/V_{\text{Tx}}$ , generated by the COIL-3D tool [6]) versus diameter, for a selection of these geometries. As can be observed, a strong trade-off exists and therefore, a range of 8 different pareto-optimal layouts (with diameters between



Fig. 8. Micrograph of (a) the 2-tier stacked IC with wire-bonded power, reset and debug pins, and (b) a single die layout, showing the 8 ICL channels with varying inductor geometries. Presented measurement results are based on the best-performing  $250 \,\mu\text{m}$  square channel (highlighted).



Fig. 9. (a) Variation in link energy consumption in Tx and Rx dies, as N varies. (b) Energy breakdown for presented transceiver for N=3 (the optimal case) in terms of circuit elements.

100 µm and 250 µm) were used for each of the 8 separate channels on the test-chip. In simulation, the 250 µm channel on the 'knee' of the pareto curve performed optimally (in terms of energy/area) and hence this layout, with parameters shown in Fig. 7 (a), was selected for experimental validation<sup>1</sup>. In the transceiver, both Tx and Rx coils are fabricated in the top M4 layer with matching layouts, in an area where dummy-fill and extraneous circuitry is excluded. EM simulations suggest the channel achieves a mutual inductance of 1.45nH, corresponding to a coupling coefficient *k* of 1.26 (*c.f.* Fig. 7 (c)) when mapped to the equivalent model in Fig. 2.

### **IV. MEASUREMENT RESULTS**

The proposed transceiver, combining these 3 elements, was fabricated in the 2-tier 3D stacked 0.35 µm CMOS IC shown in Fig. 8 (a). Before stacking, each die was thinned to a height of 100 µm and attached using epoxy adhesive with 10 µm thickness. The dies were stacked in a face-toback (F2B) arrangement resulting in a total communication distance of 110 µm through the silicon substrate, BEOL, and adhesive layers. In the context of state-of-the-art die thinning capabilities (which can achieve thicknesses less than 15 µm per die [4]), this is equivalent to communicating through 7 F2B stacked silicon dies. Fig. 8 (b) shows the 8 ICL EM channels (each trialing different inductor geometries from Section III-C). Concordant with simulation results, the square 250 µm channel (highlighted on Fig. 8(b)) performed best, and hence is selected for subsequent measurements in this section. Including Tx and Rx inductors, the proposed transmitter and receiver consume 0.0855mm<sup>2</sup> and 0.0894mm<sup>2</sup> respectively.

(a) Parameter tuning: To determine the optimal N value (corresponding to the trade-off between transmit energy and

<sup>1</sup>This is supported by practical test-chip measurements in Section IV.



Fig. 10. Measured link BER and energy-per-bit as  $I_{Tx}$  control register varies. At the tuned operating point (ITX\_CTRL\_REG=18) the link achieves a BER of 1E-5 at 10.8pJ/bit. TABLE I

MEASURED PERFORMANCE OF THE PROPOSED INDUCTIVE TRANSCEIVER.

| Evaluation Metric         | Measured Performance                                                   |
|---------------------------|------------------------------------------------------------------------|
| Technology                | 2-tier stacked 0.35 µm CMOS                                            |
| Communication Distance    | $110 \mu\text{m}$ (100 $\mu\text{m}$ chip + 10 $\mu\text{m}$ adhesive) |
| Average Energy Per Bit    | 10.8pJ/bit                                                             |
| Average Bit Error Rate    | 1E-5                                                                   |
| Channel Area              | $250\mu m \times 250\mu m \ (0.063 mm^2)$                              |
| Transceiver Circuits Area | Tx:0.0225mm <sup>2</sup> , Rx:0.0264mm <sup>2</sup>                    |
| Maximum Data Rate         | 133Mbps/channel                                                        |

digital processing), N was varied between 1 and 6, and the transceiver energy measured. Fig. 9 shows the results of these measurements including the Tx/Rx energy consumption breakdown. As shown, 3-ary PPM is the best performing encoding approach for the 110 µm channel considered here.

Using 3-ary PPM, the performance of the proposed transceiver was evaluated whilst transmitting a pseudo-random binary bit sequence. Fig. 10 shows the measured BER and energy-per-bit as a function of the ITX\_CTRL\_REG register. At the smallest settings (1,2,3) the Tx current is low, and hence the Tx pulses are not detected. As the ITX\_CTRL\_REG register is incremented further, the link reaches its optimal operating conditions (ITX\_CTRL\_REG = 18) and a BER in the order of 1E-5 is achieved at 10.8pJ/bit. The link bandwidth is 133Mpbs/channel, as summarized in Table I.

(b) Comparison with the state-of-the art: Fig. 11 compares this proposed design with leading published research. Works [7] and [8] implement near-field *capacitive* communication, and [1], [3]–[5] use *inductive* communication (as adopted in this paper). Fig. 11 plots the *energy-per-bit* against *communication distance* for each approach. When compared to priorart, results indicate a  $5.3 \times$  reduction in energy consumption for wireless 3D integration across the  $X = 110 \,\mu\text{m}$  channel.

(c) Tolerance to misalignment: Finally, the tolerance of the proposed transceiver to lateral die-to-die stacking misalignment was explored. Whilst it is not possible to vary the alignment once the physical test-chip has been assembled, Fig. 12 presents simulation results illustrating the effect of alignment accuracy on channel coupling, k. As shown, the channel will tolerate up-to  $40 \,\mu\text{m}$  of die-to-die misalignment in both x and y directions (a total diagonal offset of 56  $\mu\text{m}$ ) whilst maintaining performance within 10% of the optimum.

## V. CONCLUSIONS

This paper has presented a low-energy ICL transceiver for inter-tier wireless bus communication within a stacked 3D-IC.



Fig. 11. Comparison of presented transceiver with prior art, demonstrating a  $5.3 \times$  improvement in terms of energy-per-bit/micron for the 110 µm channel considered in this case.



Fig. 12. Simulated channel performance with respect to x and y die-to-die stacking misalignment (in terms of coupling coefficient, k).

The proposed transceiver uses 3-ary PPM combined with an adjustable current feedback driver to achieve low energy transmission (10.8pJ/bit across a 110  $\mu$ m channel) whilst maintaining a high inter-tier bandwidth of 133Mbps/channel in 0.35  $\mu$ m technology. This represents a 5.3× energy improvement when compared to the state-of-the-art. The design also demonstrated high tolerance to stacking misalignment, making it ideally suited for realising low-power 3D-ICs with very low cost.

#### REFERENCES

- N. Miura *et al.*, "Analysis and design of inductive coupling and transceiver circuit for inductive inter-chip wireless superconnect," *IEEE J. of Solid-State Circuits*, vol. 40(4), pp. 829–37, 2005.
- [2] B. J. Fletcher *et al.*, "Design and optimization of inductive-coupling links for 3-D-ICs," *IEEE Trans. VLSI*, vol. 27(3), pp. 711–23, 2018.
- [3] S. W. Han, "Wireless interconnect using inductive coupling in 3D-ICs." Ph.D. dissertation, Dept. Elect. Eng., Univ. Michigan, 2012.
- [4] N. Miura et al., "A 195Gb/s 1.2W 3D-stacked inductive inter-chip wireless superconnect with transmit power control scheme," in *IEEE Int. Solid-State Circuits Conf.*, Feb 2005, pp. 264–597 Vol. 1.
- [5] D. Mizoguchi et al., "A 1.2Gb/s/pin wireless superconnect based on inductive inter-chip signaling (IIS)," in *IEEE Int. Solid-State Circuits* Conf., Feb 2004, pp. 142–517.
- [6] B. J. Fletcher et al., "A high-speed design methodology for inductive coupling links in 3D-ICs," in Proc. Conf. Design, Automation and Test in Europe, Mar 2018.
- [7] A. Fazzi et al., "3D capacitive interconnections with mono- and bidirectional capabilities," in *IEEE Int. Solid-State Circuits Conf.*, Feb 2007, pp. 356–608.
- [8] Q. Gu et al., "Two 10Gb/s/pin low-power interconnect methods for 3D ICs," in IEEE Int. Solid-State Circuits Conf., Feb 2007, pp. 448–614.