1326997 九、發明說明: 【發明所屬之技術領域】 本發明關於一種多工器架構,尤其是關於一種降低高速 傳輸的功率消耗以及面積消耗的多工器架構。 【先前技術】 多工器(multiplexer),也稱爲序列器(Serializer),功 能在於將多筆較爲低速的平行輸入資料依序傳送爲單筆高 速的輸出,如圖一所示,大多數高速傳輸系統均透過此模 組將資料轉成高速輸出,大多數多工器所處理的多工數目 都爲二的次方,如2,4,8,16,有些系統會再將資料輸 出時編碼,因此多工數目會變成其他數字,如8B/10B的編 碼會需要10對一的多工器。 關於多工器的電路架構,主要分爲三種,分別爲位移暫 存器式多工器(shift register type),單級式多工器(single stage type)與樹狀多工器(tree type),分別如圖二,圖三與 圖四所示,以下將依序大略介紹每種暫存器的操作方式與 優缺點. 圖二所示的位移暫存器式多工器的主體電路分成平行載 入(parallel load)與序列移位(serial shift),兩部分的 暫存器各操作在不同頻率,低速的平行載入(paranel l〇ad) 使用低速的CLK2時脈,將平行的輸入資料信號載進來, 筒速的序列移位(serial shift)使用高速的時脈CK1,因 此多工器會根據CK1的頻率依序送出資料,當序列移位 5 1326997 (serial shift )內的DFF的資料已經全部序列輸出時,我 們使用CK3改變序列移位(serial shift)內DFF輸入路徑, 將平行載入(parallel load )內的資料全數輸入到序列移位 (serial shift )內DFF,這部份的操作可以參考對應的時 脈圖,由於外加的時脈通常是只有最高速的CK1,所以必 須透過一個類似除頻器的模組產生額外的CLK2與CLK3。 圖三所示的是單級式多工器(習知技術一),此多工器需 要輸入與平行輸入資料同頻率且與平行輸入筆數相同相位 數的參考時脈,如圖三右邊所示,操作上則是利用不同特 定相位的重疊產生不同資料到輸出的導通路徑,兩兩重疊 的時脈區間可由時脈圖中灰色區域所示,如DO可透過CK0 正緣到CK5的負緣的重疊期間輸出,D1可透過CK1正緣 到CK6的負緣的重疊期間輸出,其餘的資料也是依此原則 依序送出,在此不贅述。 圖四所示的是八對一的樹狀多工器(習知技術二),由三 級的二對一多工器所組成,就一個二對一的操作而言,先 利用CLK90兩筆輸入資料作重定時(retime ),並使兩筆資 料產生180度的相位差,接著再利用CLK90使資料在不同 的時間送出,兩筆資料都可以同時有足夠的組態時間 (setup time)與持住時間(hold time)。 圖五是三種不同的多工器的比較,單級多工器的架構優 點在於可搭配環狀震盪器(ring oscillator)的鎖相迴路,也 就是說需要的時脈只需要操作在傳輸資料量的N分之一, 6 L326997 同時N的値可任意變換,但相對的需要的環狀震盪器的階 數也跟著不同,但是由於單級多工器輸出端有相當大的寄 生電容,使的此架構的頻寬大幅地壓縮,而樹狀多工器由 於分成多階,每一階的多工數目減少,輸出端的寄生電容 降低,操作頻率大幅提高,缺點則是需要一個相當高速的 時脈,以每階均爲二對一的多工器而言,供應的時脈必須 爲資料傳輸速率的2分之一。 其中當晶片與外界通訊時,輸入/輸出介面是決定晶片間 是否成功傳送接收的重要因素。由於製程技術的不斷縮小 使得晶片內部的操作頻率及電路複雜度相對的增加,晶片 內部資料處理量與處理速度的不斷提升,在有限的傳輸通 道下,但晶片間的傳輸頻寬卻無法相對的提升,也因此輸 入/輸出介面的傳輸速度成爲限制了系統的整體效能的瓶 頸。 爲了分析此瓶頸,請參見下列說明:傳統的樹狀多工器 操作分成三個階段,分別爲 U)時脈產生:將輸入時脈(CLK)除以2倍頻並產生4個 不同的相位(CK0,CK90,CK180,CK270)。 (2) 輸入資料相差:一筆資料由正緣取樣,另外一由負緣 取樣,使兩筆資料產生180度的相差。 (3) 資料切換與重定時(retime):資料被CK0與CK180 重新取樣後,在ck90與ck270控制輸出的切換開關改變導 通路徑將資料送出,此種操作理想上可允許資料有1/4時 7 1326997 脈週期的組態時間(setup time)與持住時間(hold time), 以確保資料不致操作失誤。 但有些設計會在輸出端再利用高頻時脈再取樣重新取樣 一次以減少輸出時脈抖動値,但要付出的代價是需要設置 極高速的時脈產生器與極高速取樣速率的暫存器。 圖六所示的是一個傳統八對一個樹狀多工器詳細架構 圖’可以看到每一個二對一的多工器子模組都要有三個暫 存器,都是爲了使資料產生相位差的暫存器,大量的暫存 器也主宰了此多工器的功率消耗與面積,本發明之發明動 機即在於修改時序與多工方式以減少此樹狀多工器的功率 消耗與硬體面積。 在檢索之先前專利案方面,美國第42702 04號名爲『Clock and data recovery method and apparatus』之專利,亦使用 多相位取樣方式。 美國第 4789984 號名爲『High Speed Multiplexer Circuit』 之專利,始提出一種最基本的樹狀序列器,惟改良有限。 本發明即針對此樹狀序列器進行改良。 美國第 5 724 3 6 1 號名爲『Hi gh Performance N : 1 M u 11 i p 1 e x e r w i t h 0 v e r 1 a p C o n t r o 1 o f M u 11 i - P h a s e C1 o c k s』 之專利,以相差 0,90,180,270度的時脈相位(clockphase) 兩兩重疊(overlap )的方式實現多工器,並搭配參考比較電 路(reference comparison circuit)的機制調整時脈(clock) 輸出準位。 美國第 5726990 號名爲『Multiplexer and Demultiplexer 8 L326997 』之專利,架構上同樣爲樹狀多工器,但並不使用多重相 位(multi-phase)的方式,也沒有對每一級的MUX的輸入 重定時(retimeing)。 美國第 5805089 號名爲『Time-Division Data Multiplexer with Feedback for Clock Cross-over Adjustment』之專利, 使用多相(multi-phase )的方式,並包含時脈交越調整 (clock cross-over adjustment)的功能。 然上揭專利文獻,對於高速傳輸介面與降低功率消耗、 佔用面積縮小化之改良均未臻完善,而存有改良空間。 【發明內容】 爲了克服先前技術之缺失,本發明之一目的即在核心技 術上使用多相(multi-phase)的方式,但是以2的冪次對1 的方式使每一集的2對1多工器輸出端的寄生電容縮小, 進而使電路工作速度提升;另外,本發明進一步透過搭配 適當的時脈相位(clock phase ),使原本用來做資料重定時 (data re timing)的D型正反器(D Flip-Flop)能夠省略掉, 如此可減少相當多的功率消耗以及面積消耗。 【實施方式】 爲了達成上述之發明目的,本發明所提出的架構即著重 在於時脈的重新分配的原則,在此先介紹單一二對一多工 器的控制時脈,接著——列出四對一與八對一的時脈配置 範例,最後則是加入實際設計考量,詳細陳述每個子模組 的電路。 圖七左方爲傳統樹狀多工器子模組的操作時脈與模組, 右方則是本發明所提出的子模組(其中資料偏斜的功能係 9 1,326997 內嵌於之前的多工器胞元(CELL )中),如所示,本發明 重新分配了時脈使得每一級的二對一子模組除了有多工的 效果外,同時也內建了等效下一級多工器所需要的資料相 差功能,因此並不需要任何產生相差的暫存器,如此即達 成本發明之省略電路的目的》 圖八爲本發明利用所提出的方式個別排出四對一的多工 器架構與時脈範例,圖九爲利用本發明提出的方式之八對 一的多工器架構與時脈範例,在此本發明考慮理想狀況 下,也就是每一級多工器的延遲時間爲零,照著此種方式 排列’理想上每一級多工器會有相當於此級多工器輸入時 脈週期1/2的組態時間(setup time),而持住時間(hold time) 則爲0。 上述的例子都是在不考慮每一級的延遲時間狀況下的時 脈圖,然而就0.13微米製程而言,一個單純的反相器在四 倍輸出負載(Fan out of 4)的狀況下,延遲時間大約爲60ps, 將此延遲帶入到一個八對一’ 2.5Gbps的多工器範例下,實 際的時脈如圖十所示,Pn[l],Pn[l]b爲1.25GHz的時脈, Pn[2],Pn[2]b,Pn[3],Pn[3]b則是透過一級除頻器所產生的 625MHz , 4 個 相位的 參考時 脈,Pn[4],Pn[4]b,Pn[5],Pn[5]b,Pn[6],Pn[6]b,Pn[7],Pn[7]b 則 是第二級除頻器所產生的312.5MHz,4個相位的參考時 脈,1.25GHz,625MHz與312.5MHz間都有60ps的時間延遲, 同樣的多工器本身也有單級的時間延遲,因此每個二對一 1326997 的多工器資料大約有(l/2Tp)-T1的組態時間(setup time ) 與Tl的持住時間(hold time),如圖十一所示,Tp指的是 當級多工器輸入時脈週期,Τ1指的是一個DFF的傳輸延遲 (Propagation delay ) ° 圖十二則是詳細時脈產生與多工器控制時脈分配的架構 圖’每個多工器的電路與除頻器電路如圖十三所示,考量 降低功率與硬體銷耗並提昇操作速度,本發明都採用以 pseudo pmos方式去實現邏輯,就二對一多工器而言,Ck 爲0時,DO輸出,ck爲1時D1輸出,在界限電壓(threshold voltage)約0.35V,VDD爲1.2V的狀況下此邏輯的輸出高低 準位分別爲1.2V與0.2V,而D型暫存器採用差動架構的原 因在於需要產生0度、90度、180度、27 0度四個不同相位。 爲了比較一階序列器、本發明所提出之全新樹狀序列 器、傳統樹狀序列器,這三種架構圖在相同速度時的所需 面積與消耗功率,發明人等特做了從圖十四到圖二十五的 一系列比較(標示爲Tree Type曲線代表本發明提出的全新 樹狀序列器),以證明本發明提出的新架構各項比較都最 優。首先,圖十四與圖十五分別表示本發明將一階序列器 以及全新樹狀序列器作最佳化的步驟,其中一階序列器之 最佳化步驟爲: 選擇多工器的大小以配合上昇時間規格; 選擇資料偏斜DFF之大小,以維持上昇時間規格; 選擇時脈產生之大小,以維持上昇時間規格。 L326997 全新樹狀序列器之最佳化步驟爲: 選擇第一階多工器(即圖式之中之 以配合上昇時間規格; 選擇第二階多工器(即圖式之中之 以配合上昇時間規格; 選擇第三階多工器(即圖式之中之 以配合上昇時間規格; 選擇資料偏斜DFF之大小,以維持 選擇時脈產生之大小,以維持上昇 依序將架構中的不同部份最佳化才 情況下已將面積功率降至最低,這樣 傳統樹狀序列器的步驟與圖十四類似 十六與圖十七爲將此三個架構的各部 錄是否符合規格,如此便於知道縮小 佳化作業,圖十八是利用HSPICE模g 電晶體尺寸逐漸縮小,紀錄相對的上 間可代表該架構的頻寬,可得到在相 序列器所需面積最小,而當尺寸加大 於飽和是因爲此時電路已被自身的寄 圖十九、圖二十、圖二十一、圖二十 個不同的上升時間去做分析比較,分 250ps , 225ps , 200ps , 175ps , 15〇ps 果作成表格與曲線圖,得到的結果顯 1個多工器)的大小 2個多工器)的大小 4個多工器)的大小 上昇時間規格; 時間規格。 能確保在符合規格的 才是客觀的比較;而 ,在此不再贅述;圖 份依等比例縮小,紀 的最大比例,方便最 Ϊ結果,將三個架構的 升時間,因爲上升時 同速度下,全新樹狀 到一個程度,速度趨 生電容限制住速度; 二則是本發明選擇九 別是 300ps,275ps, ,125ps,lOOps,將結 示本發明提出之全新 12 1.326997 樹狀序列器功率與面積都消牦最少,圖二十三與圖二十四 以及圖二十五則是將面積與功率相乘,做成表格與曲線 圖’全新樹狀序列器(圖中標不爲Tree type)在功率消耗 與面積消耗的改善更爲顯著。 圖—十六是多相產生器(Multi-phase Generator)的輸出, 分別產生 1.25GHZ 時脈 Pn[l],625MHZ 時脈 Pn[2],Pn[3], 相位差 90 度 ’ 312.5MHZ 時脈 Pn[4],Pn[5],Pn[6],Pn[7], 相位差400ps,這些時脈用來供給八對一的序列器使用,與 圖九的時脈規劃相符合;圖二十七爲八對一序列器模擬結 果,包括 625Mbps 的資料輸出 net4,netl,net2,net3,1.25Gbps 的資料輸出net5,net6,2.5Mbps的資料輸出至out,與圖十 的時脈規劃相符合。 以上之較佳實施例僅是用來解說本發明之最佳實施方 式,本發明之專利範圍仍應以申請專利範圍所載爲準。 【圖式簡單說明】 圖一:習知多工器功能示意圖。 圖二:移位暫存器式多工器 (shifter register type multiplexer)架構與時序圖。 圖三:一階的8對一多工器與其相對所需供應的時脈圖。 圖四:樹狀的八對一多工器與其相對所需供應的時脈。 圖五:三種不同傳統多工器的比較圖。 圖六:傳統八對一樹狀序列器架構圖。 1326997 圖七:本發明所提出之架構基本原理示意圖。 圖八·四對一理想時脈之改進版樹狀多工器(全新樹狀序列 器)架構與時序圖。 圖九:八對一理想時脈改進版樹狀多工器(全新樹狀序列器) 架構與時序圖。 圖十:本發明之改進版樹狀多工器(全新樹狀序列器)加入 傳輸延遲之時序圖。 圖Η :時間延遲對於組態時間與持住時間的影響 圖十二:本發明之改進版樹狀多工器(全新樹狀序列器)架 構圖。 圖十三:二對一多工器與差動式D型暫存器電路圖。 圖十四:一階序列器的最佳化步驟流程與示意圖。 圖十五:新架構樹狀序列器的最佳化步驟與示意圖。 圖十六:將單級(single stage)與新架構樹狀序列器(tree type)兩架構的面積等比例縮小的比較圖。 圖十七:將傳統序列器(basic tree type)的面積等比例縮小 的比較圖。 圖十八:三個不同架構的面積相對上升時間圖。 圖十九:三種不同架構在九種不同上升時間的面積與功率 比較表β 圖二十:單級(single stage)架構在四種不同上升時間的 面積與功率比較表。 圖二十一:三種不同架構在九種不同上升時間的功率比較 14 1326997 产, 1 • , 99年4月22日 * 圖。 竹年刊如修正替換頁 圖二十二:三種不同架構在九種不同上升時間的面積比較^〜〜^ 圖。 圖二十三:三種不同架構在九種不同上升時間的功率乘以 面積比較表。 圖二十四:單級(single stage )架構在4種不同上升時間 的功率乘以面積比較表。1326997 IX. Description of the Invention: [Technical Field] The present invention relates to a multiplexer architecture, and more particularly to a multiplexer architecture that reduces power consumption and area consumption of high speed transmission. [Prior Art] A multiplexer, also called a serializer, functions to sequentially transmit multiple low-speed parallel input data into a single high-speed output, as shown in Figure 1. High-speed transmission systems use this module to convert data into high-speed output. Most multiplexers process the number of multiplexes to the second power, such as 2, 4, 8, and 16. Some systems will output data again. Encoding, so the number of multiplexes will become other numbers, such as 8B/10B encoding will require a 10-to-one multiplexer. The circuit architecture of the multiplexer is mainly divided into three types, namely, a shift register type, a single stage type, and a tree type. As shown in Figure 2, Figure 3 and Figure 4, the following describes the operation mode and advantages and disadvantages of each register in order. The main circuit of the shift register multiplexer shown in Figure 2 is divided into parallel. Parallel load and serial shift, the two parts of the register operate at different frequencies, low-speed parallel loading (paranel l〇ad) using low-speed CLK2 clock, parallel input data The signal is loaded, the serial shift of the tube speed uses the high-speed clock CK1, so the multiplexer will send the data sequentially according to the frequency of CK1, and the data of the DFF in the sequence shift 5 1326997 (serial shift) When all sequences have been output, we use CK3 to change the DFF input path in the serial shift, and input all the data in the parallel load into the serial shift DFF. Operation can be Corresponding to FIG clock, since the external clock speed is usually only the most CK1 is, it must be similar to the divider module generates additional CLK2 and CLK3 through a. Figure 3 shows a single-stage multiplexer (known technique 1). This multiplexer needs to input a reference clock with the same frequency as the parallel input data and the same number of parallel input pens, as shown in the right side of Figure 3. In operation, the conduction path of different data to the output is generated by using the overlap of different specific phases, and the overlapping time intervals of the two clocks can be indicated by the gray area in the clock map, for example, DO can pass the positive edge of CK0 to the negative edge of CK5. During the overlap period output, D1 can be output through the overlap period of the positive edge of CK1 to the negative edge of CK6, and the rest of the data is also sent in order according to this principle, and will not be described here. Figure 4 shows an eight-to-one tree multiplexer (known technique 2) consisting of a three-level two-to-one multiplexer. For a two-to-one operation, first use CLK90 two strokes. Input data for retime, and make the two data 180 degrees phase difference, and then use CLK90 to send data at different times, both data can have enough setup time (setup time) Hold time (hold time). Figure 5 is a comparison of three different multiplexers. The architecture advantage of a single-stage multiplexer is that it can be matched with a phase-locked loop of a ring oscillator, which means that the required clock only needs to operate on the amount of data transmitted. One N, 6 L326997, N can be arbitrarily changed, but the order of the required ring oscillator is also different, but because of the considerable parasitic capacitance at the output of the single-stage multiplexer, The bandwidth of this architecture is greatly compressed, and the tree multiplexer is divided into multiple orders, the number of multiplexes per stage is reduced, the parasitic capacitance at the output is reduced, and the operating frequency is greatly increased. The disadvantage is that a relatively high speed clock is required. In the case of a two-to-one multiplexer with every order, the supply clock must be one-half of the data transmission rate. When the chip communicates with the outside world, the input/output interface is an important factor in determining whether the chip is successfully transmitted and received. Due to the continuous shrinking of the process technology, the operating frequency and circuit complexity of the wafer are relatively increased. The data processing capacity and processing speed of the wafer are continuously increased, and the transmission bandwidth between the wafers cannot be relative. The increase, and hence the transfer speed of the input/output interface, becomes a bottleneck that limits the overall performance of the system. To analyze this bottleneck, see the following description: The traditional tree multiplexer operation is divided into three phases, U) clock generation: dividing the input clock (CLK) by 2 and producing 4 different phases (CK0, CK90, CK180, CK270). (2) The difference between the input data: one data is sampled by the positive edge, and the other is sampled by the negative edge, so that the two data produce a 180 degree phase difference. (3) Data switching and retiming (retime): After the data is resampled by CK0 and CK180, the switch of the control output of ck90 and ck270 changes the conduction path to send the data. This operation is ideally allowed to have 1/4 of the data. 7 1326997 The setup time and hold time of the pulse period to ensure that the data does not operate incorrectly. However, some designs will resample the high-frequency clock resampling at the output to reduce the output clock jitter, but at the cost of setting a very high-speed clock generator and a very high-speed sampling rate register. . Figure 6 shows a detailed architecture diagram of a traditional eight-to-one tree multiplexer. It can be seen that each two-to-one multiplexer sub-module must have three registers, all in order to make the data phase. Poor register, a large number of registers also dominate the power consumption and area of the multiplexer. The motivation of the invention is to modify the timing and multiplexing mode to reduce the power consumption and hardness of the tree multiplexer. Body area. In the search for prior patents, the US patent No. 42702 04 entitled "Clock and data recovery method and apparatus" also uses multi-phase sampling. The US patent No. 4789984 entitled "High Speed Multiplexer Circuit" began to propose a basic tree sequencer, but the improvement was limited. The present invention is directed to the improvement of this dendritic sequencer. US Patent No. 5 724 3 6 1 is entitled "Hi gh Performance N : 1 M u 11 ip 1 exerwith 0 ver 1 ap C ontro 1 of M u 11 i - P hase C1 ocks", with a difference of 0,90, 180,270 degrees clock phase (clockphase) Two-to-two overlap (overlap) implementation of the multiplexer, and with the reference comparison circuit (reference comparison circuit) mechanism to adjust the clock output level (clock). US Patent No. 5,726,990, entitled "Multiplexer and Demultiplexer 8 L326997", is also a tree-like multiplexer, but does not use a multi-phase approach, nor does it have a heavy input to each stage of the MUX. Retimeing. US Patent No. 5,805,089 entitled "Time-Division Data Multiplexer with Feedback for Clock Cross-over Adjustment", using a multi-phase approach and including clock cross-over adjustment Features. However, the patent documents have not been perfected for the improvement of the high-speed transmission interface, the reduction of power consumption, and the reduction of the occupied area, and there is room for improvement. SUMMARY OF THE INVENTION In order to overcome the deficiencies of the prior art, one of the objects of the present invention is to use a multi-phase approach in the core technology, but to make a 2-to-1 set of each set in a power of two to one. The parasitic capacitance at the output of the multiplexer is reduced, which in turn increases the operating speed of the circuit. In addition, the present invention further uses a proper clock phase to make a D-type positive for data retiming. The D Flip-Flop can be omitted, which reduces considerable power consumption and area consumption. [Embodiment] In order to achieve the above object, the architecture proposed by the present invention focuses on the principle of redistribution of clocks. Here, the control clock of a single two-to-one multiplexer is introduced, and then - listed Four-to-one and eight-to-one clock configuration examples, and finally, add actual design considerations, detailing the circuit of each sub-module. The left side of Figure 7 is the operating clock and module of the traditional tree multiplexer sub-module, and the right side is the sub-module proposed by the present invention (where the data skewing function system 1, 1,326,997 is embedded in the former In the multiplexer cell (CELL), as shown, the present invention redistributes the clock so that in addition to the multiplexed effect of the two-to-one sub-module of each stage, the equivalent next level is also built in. The data difference function required by the multiplexer does not require any register to create a phase difference, thus achieving the purpose of omitting the circuit of the present invention. FIG. 8 is a multi-distribution of the present invention by using the proposed method. The architecture and clock example, FIG. 9 is an eight-to-one multiplexer architecture and clock example using the method proposed by the present invention. The present invention considers the ideal condition, that is, the delay time of each multiplexer. Zero, arranged in this way 'ideally, each stage of the multiplexer will have a setup time equivalent to 1/2 of the clock input period of this stage of the multiplexer, and hold time (hold time) Then it is 0. The above examples are clock diagrams without considering the delay time of each stage. However, for a 0.13 micron process, a simple inverter is delayed in a four-fold output load (Fan out of 4). The time is about 60ps, and this delay is brought into an eight-to-one 2.5Gbps multiplexer example. The actual clock is shown in Figure 10. Pn[l], Pn[l]b is 1.25GHz. Pulse, Pn[2], Pn[2]b, Pn[3], Pn[3]b are 625MHz, four phase reference clocks generated by the first stage divider, Pn[4], Pn[ 4]b, Pn[5], Pn[5]b, Pn[6], Pn[6]b, Pn[7], Pn[7]b are the 312.5MHz generated by the second stage divider. The reference clock of 4 phases, 1.25GHz, 625MHz and 312.5MHz have a time delay of 60ps. The same multiplexer also has a single-stage time delay, so each two-to-one 1326997 multiplexer data has about (l/2Tp)-T1 configuration time (setup time) and Tl hold time (hold time), as shown in Figure 11, Tp refers to the stage multiplexer input clock cycle, Τ 1 finger Is a DFF propagation delay (Propagation delay) ° Figure 12 is the architecture diagram of the detailed clock generation and multiplexer control clock distribution. The circuit and frequency divider circuit of each multiplexer is shown in Figure 13. Consider reducing the power and hardware consumption and improving Operating speed, the present invention uses pseudo pmos to implement logic. For a two-to-one multiplexer, when Ck is 0, DO output, ck is 1 when D1 is output, and the threshold voltage is about 0.35V. The output high and low levels of this logic are 1.2V and 0.2V when VDD is 1.2V, and the D-type register uses the differential architecture because it needs to generate 0 degrees, 90 degrees, 180 degrees, 27 degrees. Four different phases. In order to compare the first-order sequencer, the novel tree sequencer proposed by the present invention, and the traditional tree sequencer, the required area and power consumption of the three architecture diagrams at the same speed are invented by the inventors. A series of comparisons to Figure 25 (labeled as Tree Type curves represent the novel tree sequencer proposed by the present invention) to demonstrate that the new architecture proposed by the present invention is optimal. First, FIG. 14 and FIG. 15 respectively show steps of optimizing the first-order sequencer and the new tree sequencer according to the present invention, wherein the optimization step of the first-order sequencer is: selecting the size of the multiplexer to Match the rise time specification; select the data skew DFF size to maintain the rise time specification; select the size of the clock generation to maintain the rise time specification. L326997 The optimal step of the new tree sequencer is: Select the first-order multiplexer (that is, the pattern to match the rise time specification; select the second-order multiplexer (that is, the pattern to match the rise) Time specification; select the third-order multiplexer (that is, the pattern to match the rise time specification; select the data skew DFF size to maintain the size of the selected clock to maintain the rise in order to different in the architecture In the case of partial optimization, the area power has been minimized, so that the steps of the conventional tree sequencer are similar to those of FIG. 14 and FIG. 17 is whether the parts of the three architectures meet the specifications, so that it is convenient. Knowing that the shrinking operation is done, Figure 18 shows that the size of the transistor is gradually reduced by using the HSPICE mode. The upper side of the record can represent the bandwidth of the architecture, and the area required for the phase sequencer can be minimized, and when the size is larger than the saturation. It is because the circuit has been analyzed and compared by its own picture 19, 20, 21, and 20 different rise times, divided into 250ps, 225ps, 200ps, 175ps, 15〇p. s fruit is made into a table and a graph, the result is shown in the size of a multiplexer) 2 multiplexer) size 4 multiplexer) size rise time specification; time specification. It can be ensured that the specifications are in an objective comparison; however, it will not be repeated here; the proportions are scaled down, the largest proportion of the records, the most convenient results, the rise time of the three architectures, because the same speed as the rise Next, the new tree shape to a degree, the speed-generating capacitance limits the speed; the second is that the invention selects the three is 300ps, 275ps, 125ps, lOOps, and will represent the new 12 1.326997 tree-like sequencer power proposed by the present invention. The area and the area are the least eliminated. Figure 23 and Figure 24 and Figure 25 are the area and power multiplied to make a table and a graph 'new tree sequencer' (the figure is not the Tree type) The improvement in power consumption and area consumption is more significant. Figure 16 is the output of the Multi-phase Generator, which produces 1.25 GHz clock Pn[l], 625 MHz clock Pn[2], Pn[3], and phase difference 90 degrees '312.5 MHz. Pulses Pn[4], Pn[5], Pn[6], Pn[7], phase difference 400ps, these clocks are used to supply an eight-to-one sequencer, which is consistent with the clock planning of Figure 9. Twenty-seven is an eight-to-one sequencer simulation result, including 625Mbps data output net4, netl, net2, net3, 1.25Gbps data output net5, net6, 2.5Mbps data output to out, and Figure 10 clock planning meets the. The preferred embodiments of the present invention are intended to be illustrative of the preferred embodiments of the present invention. [Simple description of the diagram] Figure 1: Schematic diagram of the function of the conventional multiplexer. Figure 2: Shift register type multiplexer architecture and timing diagram. Figure 3: Clock diagram of the first-order 8-to-one multiplexer and its relative required supply. Figure 4: The tree-shaped eight-to-one multiplexer and its relative required supply clock. Figure 5: Comparison of three different traditional multiplexers. Figure 6: Architectural diagram of a traditional eight-to-one tree sequencer. 1326997 Figure 7: Schematic diagram of the basic principle of the architecture proposed by the present invention. Figure VIII. Four-to-one ideal clocking improved tree multiplexer (new tree sequencer) architecture and timing diagram. Figure 9: Eight-to-one ideal clock improved tree multiplexer (new tree sequencer) architecture and timing diagram. Figure 10: The improved version of the tree multiplexer (new tree sequencer) of the present invention incorporates a timing diagram of the propagation delay. Figure Η: Effect of time delay on configuration time and hold time Figure 12: Frame of the improved version of the tree multiplexer (new tree sequencer) of the present invention. Figure 13: Circuit diagram of a two-to-one multiplexer and a differential D-type register. Figure 14: Flow chart and schematic diagram of the optimization procedure of the first-order sequencer. Figure 15: Optimization steps and schematic diagrams of the new architecture tree sequencer. Figure 16: Comparison of the area of the two architectures of the single stage and the new architecture tree type. Figure 17: Comparison of the area of the traditional tree type is scaled down. Figure 18: Relative rise time map of three different architectures. Figure 19: Area and power of three different architectures at nine different rise times. Comparison Table β Figure 20: Area and power comparison table for a single stage architecture at four different rise times. Figure 21: Power comparison of three different architectures at nine different rise times 14 1326997 Production, 1 • , April 22, 1999 * Figure. Bamboo Yearbook as amended replacement page Figure 22: Three different architectures in nine different rise time areas compared ^~~^ Figure. Figure 23: The power of the three different architectures at nine different rise times multiplied by the area comparison table. Figure 24: The power of the single stage architecture at four different rise times times the area comparison table.
圖二十五:三種不同架構在九種不同上升時間的功率乘以 面積比較圖。 圖一*十六:多相位產生器(Multi-phase Generator)的輸出 不意圖。 圖二十七:8 對〗序列器(Serilaizer ) 時脈與資料輸 出波形圖 【主要元件符號說明】Figure 25: Comparison of the power of three different architectures over nine different rise times by area comparison. Figure 1 *16: The output of the Multi-phase Generator is not intended. Figure 27: 8 Pair Sequencer (Serilaizer) Clock and Data Output Waveforms [Main Component Symbol Description]
1515