TWI326997B

TWI326997B - A working method of multiplexer

Info

Publication number: TWI326997B
Application number: TW95126150A
Authority: TW
Inventors: Chauchin Su; Hung Wen Lu; guan yu Chen
Original assignee: Univ Nat Chiao Tung
Priority date: 2006-07-17
Filing date: 2006-07-17
Publication date: 2010-07-01
Also published as: TW200807943A

Description

1326997 九、發明說明：【發明所屬之技術領域】本發明關於一種多工器架構，尤其是關於一種降低高速傳輸的功率消耗以及面積消耗的多工器架構。【先前技術】多工器（multiplexer)，也稱爲序列器（Serializer)，功能在於將多筆較爲低速的平行輸入資料依序傳送爲單筆高速的輸出，如圖一所示，大多數高速傳輸系統均透過此模組將資料轉成高速輸出，大多數多工器所處理的多工數目都爲二的次方，如2，4，8，16，有些系統會再將資料輸出時編碼，因此多工數目會變成其他數字，如8B/10B的編碼會需要10對一的多工器。關於多工器的電路架構，主要分爲三種，分別爲位移暫存器式多工器（shift register type)，單級式多工器（single stage type)與樹狀多工器（tree type)，分別如圖二，圖三與圖四所示，以下將依序大略介紹每種暫存器的操作方式與優缺點. 圖二所示的位移暫存器式多工器的主體電路分成平行載入（parallel load)與序列移位（serial shift)，兩部分的暫存器各操作在不同頻率，低速的平行載入（paranel l〇ad) 使用低速的CLK2時脈，將平行的輸入資料信號載進來，筒速的序列移位（serial shift)使用高速的時脈CK1，因此多工器會根據CK1的頻率依序送出資料，當序列移位 5 1326997 (serial shift )內的DFF的資料已經全部序列輸出時，我們使用CK3改變序列移位（serial shift)內DFF輸入路徑，將平行載入（parallel load )內的資料全數輸入到序列移位 (serial shift )內DFF，這部份的操作可以參考對應的時脈圖，由於外加的時脈通常是只有最高速的CK1，所以必須透過一個類似除頻器的模組產生額外的CLK2與CLK3。圖三所示的是單級式多工器（習知技術一），此多工器需要輸入與平行輸入資料同頻率且與平行輸入筆數相同相位數的參考時脈，如圖三右邊所示，操作上則是利用不同特定相位的重疊產生不同資料到輸出的導通路徑，兩兩重疊的時脈區間可由時脈圖中灰色區域所示，如DO可透過CK0 正緣到CK5的負緣的重疊期間輸出，D1可透過CK1正緣到CK6的負緣的重疊期間輸出，其餘的資料也是依此原則依序送出，在此不贅述。圖四所示的是八對一的樹狀多工器（習知技術二），由三級的二對一多工器所組成，就一個二對一的操作而言，先利用CLK90兩筆輸入資料作重定時（retime )，並使兩筆資料產生180度的相位差，接著再利用CLK90使資料在不同的時間送出，兩筆資料都可以同時有足夠的組態時間 (setup time)與持住時間（hold time)。圖五是三種不同的多工器的比較，單級多工器的架構優點在於可搭配環狀震盪器（ring oscillator)的鎖相迴路，也就是說需要的時脈只需要操作在傳輸資料量的N分之一， 6 L326997 同時N的値可任意變換，但相對的需要的環狀震盪器的階數也跟著不同，但是由於單級多工器輸出端有相當大的寄生電容，使的此架構的頻寬大幅地壓縮，而樹狀多工器由於分成多階，每一階的多工數目減少，輸出端的寄生電容降低，操作頻率大幅提高，缺點則是需要一個相當高速的時脈，以每階均爲二對一的多工器而言，供應的時脈必須爲資料傳輸速率的2分之一。其中當晶片與外界通訊時，輸入/輸出介面是決定晶片間是否成功傳送接收的重要因素。由於製程技術的不斷縮小使得晶片內部的操作頻率及電路複雜度相對的增加，晶片內部資料處理量與處理速度的不斷提升，在有限的傳輸通道下，但晶片間的傳輸頻寬卻無法相對的提升，也因此輸入/輸出介面的傳輸速度成爲限制了系統的整體效能的瓶頸。爲了分析此瓶頸，請參見下列說明：傳統的樹狀多工器操作分成三個階段，分別爲 U)時脈產生：將輸入時脈（CLK)除以2倍頻並產生4個不同的相位（CK0，CK90，CK180，CK270)。 (2) 輸入資料相差：一筆資料由正緣取樣，另外一由負緣取樣，使兩筆資料產生180度的相差。 (3) 資料切換與重定時（retime):資料被CK0與CK180 重新取樣後，在ck90與ck270控制輸出的切換開關改變導通路徑將資料送出，此種操作理想上可允許資料有1/4時 7 1326997 脈週期的組態時間（setup time)與持住時間（hold time)，以確保資料不致操作失誤。但有些設計會在輸出端再利用高頻時脈再取樣重新取樣一次以減少輸出時脈抖動値，但要付出的代價是需要設置極高速的時脈產生器與極高速取樣速率的暫存器。圖六所示的是一個傳統八對一個樹狀多工器詳細架構圖’可以看到每一個二對一的多工器子模組都要有三個暫存器，都是爲了使資料產生相位差的暫存器，大量的暫存器也主宰了此多工器的功率消耗與面積，本發明之發明動機即在於修改時序與多工方式以減少此樹狀多工器的功率消耗與硬體面積。在檢索之先前專利案方面，美國第42702 04號名爲『Clock and data recovery method and apparatus』之專利，亦使用多相位取樣方式。美國第 4789984 號名爲『High Speed Multiplexer Circuit』之專利，始提出一種最基本的樹狀序列器，惟改良有限。本發明即針對此樹狀序列器進行改良。美國第 5 724 3 6 1 號名爲『Hi gh Performance N : 1 M u 11 i p 1 e x e r w i t h 0 v e r 1 a p C o n t r o 1 o f M u 11 i - P h a s e C1 o c k s』之專利，以相差 0,90，180,270度的時脈相位（clockphase) 兩兩重疊（overlap )的方式實現多工器，並搭配參考比較電路（reference comparison circuit)的機制調整時脈（clock) 輸出準位。美國第 5726990 號名爲『Multiplexer and Demultiplexer 8 L326997 』之專利，架構上同樣爲樹狀多工器，但並不使用多重相位（multi-phase)的方式，也沒有對每一級的MUX的輸入重定時（retimeing)。美國第 5805089 號名爲『Time-Division Data Multiplexer with Feedback for Clock Cross-over Adjustment』之專利，使用多相（multi-phase )的方式，並包含時脈交越調整 (clock cross-over adjustment)的功能。然上揭專利文獻，對於高速傳輸介面與降低功率消耗、佔用面積縮小化之改良均未臻完善，而存有改良空間。【發明內容】爲了克服先前技術之缺失，本發明之一目的即在核心技術上使用多相（multi-phase)的方式，但是以2的冪次對1 的方式使每一集的2對1多工器輸出端的寄生電容縮小，進而使電路工作速度提升；另外，本發明進一步透過搭配適當的時脈相位（clock phase )，使原本用來做資料重定時 (data re timing)的D型正反器（D Flip-Flop)能夠省略掉，如此可減少相當多的功率消耗以及面積消耗。【實施方式】爲了達成上述之發明目的，本發明所提出的架構即著重在於時脈的重新分配的原則，在此先介紹單一二對一多工器的控制時脈，接著——列出四對一與八對一的時脈配置範例，最後則是加入實際設計考量，詳細陳述每個子模組的電路。圖七左方爲傳統樹狀多工器子模組的操作時脈與模組，右方則是本發明所提出的子模組（其中資料偏斜的功能係 9 1,326997 內嵌於之前的多工器胞元（CELL )中），如所示，本發明重新分配了時脈使得每一級的二對一子模組除了有多工的效果外，同時也內建了等效下一級多工器所需要的資料相差功能，因此並不需要任何產生相差的暫存器，如此即達成本發明之省略電路的目的》圖八爲本發明利用所提出的方式個別排出四對一的多工器架構與時脈範例，圖九爲利用本發明提出的方式之八對一的多工器架構與時脈範例，在此本發明考慮理想狀況下，也就是每一級多工器的延遲時間爲零，照著此種方式排列’理想上每一級多工器會有相當於此級多工器輸入時脈週期1/2的組態時間（setup time)，而持住時間（hold time) 則爲0。上述的例子都是在不考慮每一級的延遲時間狀況下的時脈圖，然而就0.13微米製程而言，一個單純的反相器在四倍輸出負載（Fan out of 4)的狀況下，延遲時間大約爲60ps，將此延遲帶入到一個八對一’ 2.5Gbps的多工器範例下，實際的時脈如圖十所示，Pn[l]，Pn[l]b爲1.25GHz的時脈， Pn[2]，Pn[2]b，Pn[3]，Pn[3]b則是透過一級除頻器所產生的 625MHz ， 4 個相位的參考時脈，Pn[4]，Pn[4]b，Pn[5]，Pn[5]b，Pn[6]，Pn[6]b，Pn[7]，Pn[7]b 則是第二級除頻器所產生的312.5MHz，4個相位的參考時脈，1.25GHz，625MHz與312.5MHz間都有60ps的時間延遲，同樣的多工器本身也有單級的時間延遲，因此每個二對一 1326997 的多工器資料大約有（l/2Tp)-T1的組態時間（setup time ) 與Tl的持住時間（hold time)，如圖十一所示，Tp指的是當級多工器輸入時脈週期，Τ1指的是一個DFF的傳輸延遲 (Propagation delay ) ° 圖十二則是詳細時脈產生與多工器控制時脈分配的架構圖’每個多工器的電路與除頻器電路如圖十三所示，考量降低功率與硬體銷耗並提昇操作速度，本發明都採用以 pseudo pmos方式去實現邏輯，就二對一多工器而言，Ck 爲0時，DO輸出，ck爲1時D1輸出，在界限電壓（threshold voltage)約0.35V，VDD爲1.2V的狀況下此邏輯的輸出高低準位分別爲1.2V與0.2V，而D型暫存器採用差動架構的原因在於需要產生0度、90度、180度、27 0度四個不同相位。爲了比較一階序列器、本發明所提出之全新樹狀序列器、傳統樹狀序列器，這三種架構圖在相同速度時的所需面積與消耗功率，發明人等特做了從圖十四到圖二十五的一系列比較（標示爲Tree Type曲線代表本發明提出的全新樹狀序列器），以證明本發明提出的新架構各項比較都最優。首先，圖十四與圖十五分別表示本發明將一階序列器以及全新樹狀序列器作最佳化的步驟，其中一階序列器之最佳化步驟爲：選擇多工器的大小以配合上昇時間規格；選擇資料偏斜DFF之大小，以維持上昇時間規格；選擇時脈產生之大小，以維持上昇時間規格。 L326997 全新樹狀序列器之最佳化步驟爲：選擇第一階多工器（即圖式之中之以配合上昇時間規格；選擇第二階多工器（即圖式之中之以配合上昇時間規格；選擇第三階多工器（即圖式之中之以配合上昇時間規格；選擇資料偏斜DFF之大小，以維持選擇時脈產生之大小，以維持上昇依序將架構中的不同部份最佳化才情況下已將面積功率降至最低，這樣傳統樹狀序列器的步驟與圖十四類似十六與圖十七爲將此三個架構的各部錄是否符合規格，如此便於知道縮小佳化作業，圖十八是利用HSPICE模g 電晶體尺寸逐漸縮小，紀錄相對的上間可代表該架構的頻寬，可得到在相序列器所需面積最小，而當尺寸加大於飽和是因爲此時電路已被自身的寄圖十九、圖二十、圖二十一、圖二十個不同的上升時間去做分析比較，分 250ps ， 225ps ， 200ps ， 175ps ， 15〇ps 果作成表格與曲線圖，得到的結果顯 1個多工器）的大小 2個多工器）的大小 4個多工器）的大小上昇時間規格；時間規格。能確保在符合規格的才是客觀的比較；而，在此不再贅述；圖份依等比例縮小，紀的最大比例，方便最 Ϊ結果，將三個架構的升時間，因爲上升時同速度下，全新樹狀到一個程度，速度趨生電容限制住速度；二則是本發明選擇九別是 300ps，275ps，，125ps，lOOps，將結示本發明提出之全新 12 1.326997 樹狀序列器功率與面積都消牦最少，圖二十三與圖二十四以及圖二十五則是將面積與功率相乘，做成表格與曲線圖’全新樹狀序列器（圖中標不爲Tree type)在功率消耗與面積消耗的改善更爲顯著。圖—十六是多相產生器（Multi-phase Generator)的輸出，分別產生 1.25GHZ 時脈 Pn[l]，625MHZ 時脈 Pn[2]，Pn[3]，相位差 90 度 ’ 312.5MHZ 時脈 Pn[4]，Pn[5]，Pn[6]，Pn[7]，相位差400ps，這些時脈用來供給八對一的序列器使用，與圖九的時脈規劃相符合；圖二十七爲八對一序列器模擬結果，包括 625Mbps 的資料輸出 net4，netl，net2，net3，1.25Gbps 的資料輸出net5，net6，2.5Mbps的資料輸出至out，與圖十的時脈規劃相符合。以上之較佳實施例僅是用來解說本發明之最佳實施方式，本發明之專利範圍仍應以申請專利範圍所載爲準。【圖式簡單說明】圖一：習知多工器功能示意圖。圖二：移位暫存器式多工器（shifter register type multiplexer)架構與時序圖。圖三：一階的8對一多工器與其相對所需供應的時脈圖。圖四：樹狀的八對一多工器與其相對所需供應的時脈。圖五：三種不同傳統多工器的比較圖。圖六：傳統八對一樹狀序列器架構圖。 1326997 圖七：本發明所提出之架構基本原理示意圖。圖八·四對一理想時脈之改進版樹狀多工器（全新樹狀序列器）架構與時序圖。圖九：八對一理想時脈改進版樹狀多工器（全新樹狀序列器）架構與時序圖。圖十：本發明之改進版樹狀多工器（全新樹狀序列器）加入傳輸延遲之時序圖。圖Η :時間延遲對於組態時間與持住時間的影響圖十二：本發明之改進版樹狀多工器（全新樹狀序列器）架構圖。圖十三：二對一多工器與差動式D型暫存器電路圖。圖十四：一階序列器的最佳化步驟流程與示意圖。圖十五：新架構樹狀序列器的最佳化步驟與示意圖。圖十六：將單級（single stage)與新架構樹狀序列器（tree type)兩架構的面積等比例縮小的比較圖。圖十七：將傳統序列器（basic tree type)的面積等比例縮小的比較圖。圖十八：三個不同架構的面積相對上升時間圖。圖十九：三種不同架構在九種不同上升時間的面積與功率比較表β 圖二十：單級（single stage)架構在四種不同上升時間的面積與功率比較表。圖二十一：三種不同架構在九種不同上升時間的功率比較 14 1326997 产， 1 • , 99年4月22日 * 圖。竹年刊如修正替換頁圖二十二：三種不同架構在九種不同上升時間的面積比較^〜〜^ 圖。圖二十三：三種不同架構在九種不同上升時間的功率乘以面積比較表。圖二十四：單級（single stage )架構在4種不同上升時間的功率乘以面積比較表。1326997 IX. Description of the Invention: [Technical Field] The present invention relates to a multiplexer architecture, and more particularly to a multiplexer architecture that reduces power consumption and area consumption of high speed transmission. [Prior Art] A multiplexer, also called a serializer, functions to sequentially transmit multiple low-speed parallel input data into a single high-speed output, as shown in Figure 1. High-speed transmission systems use this module to convert data into high-speed output. Most multiplexers process the number of multiplexes to the second power, such as 2, 4, 8, and 16. Some systems will output data again. Encoding, so the number of multiplexes will become other numbers, such as 8B/10B encoding will require a 10-to-one multiplexer. The circuit architecture of the multiplexer is mainly divided into three types, namely, a shift register type, a single stage type, and a tree type. As shown in Figure 2, Figure 3 and Figure 4, the following describes the operation mode and advantages and disadvantages of each register in order. The main circuit of the shift register multiplexer shown in Figure 2 is divided into parallel. Parallel load and serial shift, the two parts of the register operate at different frequencies, low-speed parallel loading (paranel l〇ad) using low-speed CLK2 clock, parallel input data The signal is loaded, the serial shift of the tube speed uses the high-speed clock CK1, so the multiplexer will send the data sequentially according to the frequency of CK1, and the data of the DFF in the sequence shift 5 1326997 (serial shift) When all sequences have been output, we use CK3 to change the DFF input path in the serial shift, and input all the data in the parallel load into the serial shift DFF. Operation can be Corresponding to FIG clock, since the external clock speed is usually only the most CK1 is, it must be similar to the divider module generates additional CLK2 and CLK3 through a. Figure 3 shows a single-stage multiplexer (known technique 1). This multiplexer needs to input a reference clock with the same frequency as the parallel input data and the same number of parallel input pens, as shown in the right side of Figure 3. In operation, the conduction path of different data to the output is generated by using the overlap of different specific phases, and the overlapping time intervals of the two clocks can be indicated by the gray area in the clock map, for example, DO can pass the positive edge of CK0 to the negative edge of CK5. During the overlap period output, D1 can be output through the overlap period of the positive edge of CK1 to the negative edge of CK6, and the rest of the data is also sent in order according to this principle, and will not be described here. Figure 4 shows an eight-to-one tree multiplexer (known technique 2) consisting of a three-level two-to-one multiplexer. For a two-to-one operation, first use CLK90 two strokes. Input data for retime, and make the two data 180 degrees phase difference, and then use CLK90 to send data at different times, both data can have enough setup time (setup time) Hold time (hold time). Figure 5 is a comparison of three different multiplexers. The architecture advantage of a single-stage multiplexer is that it can be matched with a phase-locked loop of a ring oscillator, which means that the required clock only needs to operate on the amount of data transmitted. One N, 6 L326997, N can be arbitrarily changed, but the order of the required ring oscillator is also different, but because of the considerable parasitic capacitance at the output of the single-stage multiplexer, The bandwidth of this architecture is greatly compressed, and the tree multiplexer is divided into multiple orders, the number of multiplexes per stage is reduced, the parasitic capacitance at the output is reduced, and the operating frequency is greatly increased. The disadvantage is that a relatively high speed clock is required. In the case of a two-to-one multiplexer with every order, the supply clock must be one-half of the data transmission rate. When the chip communicates with the outside world, the input/output interface is an important factor in determining whether the chip is successfully transmitted and received. Due to the continuous shrinking of the process technology, the operating frequency and circuit complexity of the wafer are relatively increased. The data processing capacity and processing speed of the wafer are continuously increased, and the transmission bandwidth between the wafers cannot be relative. The increase, and hence the transfer speed of the input/output interface, becomes a bottleneck that limits the overall performance of the system. To analyze this bottleneck, see the following description: The traditional tree multiplexer operation is divided into three phases, U) clock generation: dividing the input clock (CLK) by 2 and producing 4 different phases (CK0, CK90, CK180, CK270). (2) The difference between the input data: one data is sampled by the positive edge, and the other is sampled by the negative edge, so that the two data produce a 180 degree phase difference. (3) Data switching and retiming (retime): After the data is resampled by CK0 and CK180, the switch of the control output of ck90 and ck270 changes the conduction path to send the data. This operation is ideally allowed to have 1/4 of the data. 7 1326997 The setup time and hold time of the pulse period to ensure that the data does not operate incorrectly. However, some designs will resample the high-frequency clock resampling at the output to reduce the output clock jitter, but at the cost of setting a very high-speed clock generator and a very high-speed sampling rate register. . Figure 6 shows a detailed architecture diagram of a traditional eight-to-one tree multiplexer. It can be seen that each two-to-one multiplexer sub-module must have three registers, all in order to make the data phase. Poor register, a large number of registers also dominate the power consumption and area of the multiplexer. The motivation of the invention is to modify the timing and multiplexing mode to reduce the power consumption and hardness of the tree multiplexer. Body area. In the search for prior patents, the US patent No. 42702 04 entitled "Clock and data recovery method and apparatus" also uses multi-phase sampling. The US patent No. 4789984 entitled "High Speed Multiplexer Circuit" began to propose a basic tree sequencer, but the improvement was limited. The present invention is directed to the improvement of this dendritic sequencer. US Patent No. 5 724 3 6 1 is entitled "Hi gh Performance N : 1 M u 11 ip 1 exerwith 0 ver 1 ap C ontro 1 of M u 11 i - P hase C1 ocks", with a difference of 0,90, 180,270 degrees clock phase (clockphase) Two-to-two overlap (overlap) implementation of the multiplexer, and with the reference comparison circuit (reference comparison circuit) mechanism to adjust the clock output level (clock). US Patent No. 5,726,990, entitled "Multiplexer and Demultiplexer 8 L326997", is also a tree-like multiplexer, but does not use a multi-phase approach, nor does it have a heavy input to each stage of the MUX. Retimeing. US Patent No. 5,805,089 entitled "Time-Division Data Multiplexer with Feedback for Clock Cross-over Adjustment", using a multi-phase approach and including clock cross-over adjustment Features. However, the patent documents have not been perfected for the improvement of the high-speed transmission interface, the reduction of power consumption, and the reduction of the occupied area, and there is room for improvement. SUMMARY OF THE INVENTION In order to overcome the deficiencies of the prior art, one of the objects of the present invention is to use a multi-phase approach in the core technology, but to make a 2-to-1 set of each set in a power of two to one. The parasitic capacitance at the output of the multiplexer is reduced, which in turn increases the operating speed of the circuit. In addition, the present invention further uses a proper clock phase to make a D-type positive for data retiming. The D Flip-Flop can be omitted, which reduces considerable power consumption and area consumption. [Embodiment] In order to achieve the above object, the architecture proposed by the present invention focuses on the principle of redistribution of clocks. Here, the control clock of a single two-to-one multiplexer is introduced, and then - listed Four-to-one and eight-to-one clock configuration examples, and finally, add actual design considerations, detailing the circuit of each sub-module. The left side of Figure 7 is the operating clock and module of the traditional tree multiplexer sub-module, and the right side is the sub-module proposed by the present invention (where the data skewing function system 1, 1,326,997 is embedded in the former In the multiplexer cell (CELL), as shown, the present invention redistributes the clock so that in addition to the multiplexed effect of the two-to-one sub-module of each stage, the equivalent next level is also built in. The data difference function required by the multiplexer does not require any register to create a phase difference, thus achieving the purpose of omitting the circuit of the present invention. FIG. 8 is a multi-distribution of the present invention by using the proposed method. The architecture and clock example, FIG. 9 is an eight-to-one multiplexer architecture and clock example using the method proposed by the present invention. The present invention considers the ideal condition, that is, the delay time of each multiplexer. Zero, arranged in this way 'ideally, each stage of the multiplexer will have a setup time equivalent to 1/2 of the clock input period of this stage of the multiplexer, and hold time (hold time) Then it is 0. The above examples are clock diagrams without considering the delay time of each stage. However, for a 0.13 micron process, a simple inverter is delayed in a four-fold output load (Fan out of 4). The time is about 60ps, and this delay is brought into an eight-to-one 2.5Gbps multiplexer example. The actual clock is shown in Figure 10. Pn[l], Pn[l]b is 1.25GHz. Pulse, Pn[2], Pn[2]b, Pn[3], Pn[3]b are 625MHz, four phase reference clocks generated by the first stage divider, Pn[4], Pn[ 4]b, Pn[5], Pn[5]b, Pn[6], Pn[6]b, Pn[7], Pn[7]b are the 312.5MHz generated by the second stage divider. The reference clock of 4 phases, 1.25GHz, 625MHz and 312.5MHz have a time delay of 60ps. The same multiplexer also has a single-stage time delay, so each two-to-one 1326997 multiplexer data has about (l/2Tp)-T1 configuration time (setup time) and Tl hold time (hold time), as shown in Figure 11, Tp refers to the stage multiplexer input clock cycle, Τ 1 finger Is a DFF propagation delay (Propagation delay) ° Figure 12 is the architecture diagram of the detailed clock generation and multiplexer control clock distribution. The circuit and frequency divider circuit of each multiplexer is shown in Figure 13. Consider reducing the power and hardware consumption and improving Operating speed, the present invention uses pseudo pmos to implement logic. For a two-to-one multiplexer, when Ck is 0, DO output, ck is 1 when D1 is output, and the threshold voltage is about 0.35V. The output high and low levels of this logic are 1.2V and 0.2V when VDD is 1.2V, and the D-type register uses the differential architecture because it needs to generate 0 degrees, 90 degrees, 180 degrees, 27 degrees. Four different phases. In order to compare the first-order sequencer, the novel tree sequencer proposed by the present invention, and the traditional tree sequencer, the required area and power consumption of the three architecture diagrams at the same speed are invented by the inventors. A series of comparisons to Figure 25 (labeled as Tree Type curves represent the novel tree sequencer proposed by the present invention) to demonstrate that the new architecture proposed by the present invention is optimal. First, FIG. 14 and FIG. 15 respectively show steps of optimizing the first-order sequencer and the new tree sequencer according to the present invention, wherein the optimization step of the first-order sequencer is: selecting the size of the multiplexer to Match the rise time specification; select the data skew DFF size to maintain the rise time specification; select the size of the clock generation to maintain the rise time specification. L326997 The optimal step of the new tree sequencer is: Select the first-order multiplexer (that is, the pattern to match the rise time specification; select the second-order multiplexer (that is, the pattern to match the rise) Time specification; select the third-order multiplexer (that is, the pattern to match the rise time specification; select the data skew DFF size to maintain the size of the selected clock to maintain the rise in order to different in the architecture In the case of partial optimization, the area power has been minimized, so that the steps of the conventional tree sequencer are similar to those of FIG. 14 and FIG. 17 is whether the parts of the three architectures meet the specifications, so that it is convenient. Knowing that the shrinking operation is done, Figure 18 shows that the size of the transistor is gradually reduced by using the HSPICE mode. The upper side of the record can represent the bandwidth of the architecture, and the area required for the phase sequencer can be minimized, and when the size is larger than the saturation. It is because the circuit has been analyzed and compared by its own picture 19, 20, 21, and 20 different rise times, divided into 250ps, 225ps, 200ps, 175ps, 15〇p. s fruit is made into a table and a graph, the result is shown in the size of a multiplexer) 2 multiplexer) size 4 multiplexer) size rise time specification; time specification. It can be ensured that the specifications are in an objective comparison; however, it will not be repeated here; the proportions are scaled down, the largest proportion of the records, the most convenient results, the rise time of the three architectures, because the same speed as the rise Next, the new tree shape to a degree, the speed-generating capacitance limits the speed; the second is that the invention selects the three is 300ps, 275ps, 125ps, lOOps, and will represent the new 12 1.326997 tree-like sequencer power proposed by the present invention. The area and the area are the least eliminated. Figure 23 and Figure 24 and Figure 25 are the area and power multiplied to make a table and a graph 'new tree sequencer' (the figure is not the Tree type) The improvement in power consumption and area consumption is more significant. Figure 16 is the output of the Multi-phase Generator, which produces 1.25 GHz clock Pn[l], 625 MHz clock Pn[2], Pn[3], and phase difference 90 degrees '312.5 MHz. Pulses Pn[4], Pn[5], Pn[6], Pn[7], phase difference 400ps, these clocks are used to supply an eight-to-one sequencer, which is consistent with the clock planning of Figure 9. Twenty-seven is an eight-to-one sequencer simulation result, including 625Mbps data output net4, netl, net2, net3, 1.25Gbps data output net5, net6, 2.5Mbps data output to out, and Figure 10 clock planning meets the. The preferred embodiments of the present invention are intended to be illustrative of the preferred embodiments of the present invention. [Simple description of the diagram] Figure 1: Schematic diagram of the function of the conventional multiplexer. Figure 2: Shift register type multiplexer architecture and timing diagram. Figure 3: Clock diagram of the first-order 8-to-one multiplexer and its relative required supply. Figure 4: The tree-shaped eight-to-one multiplexer and its relative required supply clock. Figure 5: Comparison of three different traditional multiplexers. Figure 6: Architectural diagram of a traditional eight-to-one tree sequencer. 1326997 Figure 7: Schematic diagram of the basic principle of the architecture proposed by the present invention. Figure VIII. Four-to-one ideal clocking improved tree multiplexer (new tree sequencer) architecture and timing diagram. Figure 9: Eight-to-one ideal clock improved tree multiplexer (new tree sequencer) architecture and timing diagram. Figure 10: The improved version of the tree multiplexer (new tree sequencer) of the present invention incorporates a timing diagram of the propagation delay. Figure Η: Effect of time delay on configuration time and hold time Figure 12: Frame of the improved version of the tree multiplexer (new tree sequencer) of the present invention. Figure 13: Circuit diagram of a two-to-one multiplexer and a differential D-type register. Figure 14: Flow chart and schematic diagram of the optimization procedure of the first-order sequencer. Figure 15: Optimization steps and schematic diagrams of the new architecture tree sequencer. Figure 16: Comparison of the area of the two architectures of the single stage and the new architecture tree type. Figure 17: Comparison of the area of the traditional tree type is scaled down. Figure 18: Relative rise time map of three different architectures. Figure 19: Area and power of three different architectures at nine different rise times. Comparison Table β Figure 20: Area and power comparison table for a single stage architecture at four different rise times. Figure 21: Power comparison of three different architectures at nine different rise times 14 1326997 Production, 1 • , April 22, 1999 * Figure. Bamboo Yearbook as amended replacement page Figure 22: Three different architectures in nine different rise time areas compared ^~~^ Figure. Figure 23: The power of the three different architectures at nine different rise times multiplied by the area comparison table. Figure 24: The power of the single stage architecture at four different rise times times the area comparison table.

圖二十五：三種不同架構在九種不同上升時間的功率乘以面積比較圖。圖一*十六：多相位產生器（Multi-phase Generator)的輸出不意圖。圖二十七：8 對〗序列器（Serilaizer ) 時脈與資料輸出波形圖【主要元件符號說明】Figure 25: Comparison of the power of three different architectures over nine different rise times by area comparison. Figure 1 *16: The output of the Multi-phase Generator is not intended. Figure 27: 8 Pair Sequencer (Serilaizer) Clock and Data Output Waveforms [Main Component Symbol Description]

1515

Claims

1326997 April 22, 1999, application for patent garden: I:- 年年key 6日修正々 page 1. A multiplexer architecture operation method, which uses multi-phase, at least 0 Four different phases, such as degrees, 90 degrees, 180 degrees, and 270 degrees, and reduce the parasitic capacitance of the output of the 2-to-1 multiplexer of each stage by the power of 2 to 1 to improve the speed of the circuit operation; The multiplexer architecture redistributes the clock by matching various clock phases, and uses the phase difference between the clocks to directly generate phase difference between the input data, further omitting the capital

Data retiming D-type flip-flops (D Flip-Flop) to reduce power and footprint consumption. 2. For example, the operation method of the multiplexer architecture of claim 1 is implemented in the form of pseudo pmos. 3. As in the operation method of the multiplexer architecture of claim 1 of the patent scope, the circuit of each of the 2 to 1 multiplexers of the multiplexer can generate a data phase difference in a built-in manner. 4. For the operation method of the multiplexer architecture in the scope of patent application No. 1, the multiplexer architecture is suitable for the high-speed transmission interface. 5. The operation method of the multiplexer architecture as in the first application of the patent scope 'The multiplexer architecture is suitable for 2 pairs of 1, 4 pairs 1, 8 pairs 1, and 16 to 1 multiplexer architecture. 16