[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

The Proposed DSIP Architecture Template for the Wireless Communication Domain

  • Chapter
  • First Online:
Energy-Efficient Communication Processors

Abstract

In this chapter, the proposed Domain Specific Instruction set Processor (DSIP) architecture template and the corresponding design approach, which target emerging wireless communication systems, are explained. Section 3.1. defines the considered domain and explains the proposed DSIP platform design approach. Information on the applied design approach for defining the template is provided in Sect. 3.2. Thereby also the considered layout constraints, which ensure scalability in advanced Deep Deep Sub-Micron (DDSM) technologies, are summarized. The analysis of domain-specific algorithm characteristics will be shown in Sect. 3.3. Section 3.4 motivates the choices of the employed architectural concepts. In Sect. 3.5, the proposed architecture template will be explained. The scalability of the template will be reviewed in Sect. 3.6. Section 3.7 summarizes the main concepts of the proposal and their main differences compared to conventional approaches. Finally, Sect. 3.8 concludes this chapter. This chapter also includes an appendix which shows the concepts/ideas of the proposed back-end semi-custom design approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 103.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 129.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
GBP 129.99
Price includes VAT (United Kingdom)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahn, J.H., Erez, M., Dally, W.J.: Tradeoff between Data-, Instruction-, and thread-level parallelism in stream processors. In: ACM International Conference on Supercomputing (ICS), vol. 21 (2007)

    Google Scholar 

  2. Anjum, O., Ahonen, T., Garzia, F., Nurmi, J., Brunelli, C., Berg, H.: State of the art baseband DSP platforms for Software Defined Radio: A survey. EURASIP J. Wirel. Commun. Netw. 1, 5 (2011). doi:10.1186/1687-1499-2011-5

    Article  Google Scholar 

  3. Auger, F., Lou, Z., Feuvrie, B., Li, F.: Multiplier-free divide, square root, and log algorithms [DSP Tips and Tricks]. IEEE Signal Proc. Magaz. 28(4), 122–126 (2011). doi:10.1109/MSP.2011.941101

    Article  Google Scholar 

  4. Implementing SIMD in Software. BDTI (2006). http://www.bdti.com/InsideDSP/

  5. Bi, X., Weldon, M.A., Li, H.: STT-RAM Designs Supporting Dual-port Accesses. Design, Automation and Test in Europe (DATE), In (2013)

    Google Scholar 

  6. Bougard, B., De Sutter, B., Rabou, S., Novo, D., Allam, O., Dupont, S., Van der Perre, L.: A coarse-grained array based baseband processor for 100Mbps+ software defined radio. In: Design, Automation and Test in Europe (DATE), pp. 716–721. IMEC (2008). doi:10.1109/DATE.2008.4484763

  7. Encounter Digital Implementation System, Cadence. http://www.cadence.com/

  8. Carvalho, E.L.d.S., Calazans, N.L., Moraes, F.G.: Dynamic Task Mapping for MPSoCs. IEEE Design Test Comput. 27(5), 26–35 (2010). doi:10.1109/MDT.2010.106

  9. Catthoor, F., Danckaert, K., Wuytack, S., Dutt, N.: Code transformations for data transfer and storage exploration preprocessing in multimedia processors. IEEE Design Test Comput. 18(3), 70–82 (2001)

    Article  Google Scholar 

  10. Catthoor, F., Raghavan, P., Lambrechts, A., Jayapala, M., Kritikakou, A., Absar, J.: Ultra-Low Energy Domain-specific Instruction-set Processors, 1st edn. Springer, New York (2010)

    Book  Google Scholar 

  11. Chang, M.F., Chen, P.C.: Embedded non-volatile memory circuit design technologies for mobile low-voltage SoC and 3D-IC. In: IEEE International Conference on Solid-state and Integrated Circuit Technology (ICSICT), pp. 13–16 (2010). doi:10.1109/ICSICT.2010.5667868

  12. Cheng, Y.: A glance of technology efforts for design-for-manufacturing in nano-scale CMOS processes. Sci. China Ser F: Inform. Sci. 51(6), 807–818 (2008). doi:10.1007/s11432-008-0054-9

    Article  Google Scholar 

  13. Chinnery, D., Keutzer, K.: Closing the Power Gap between ASIC & Custom: Tools and Techniques for Low Power Design. Springer, New York (2007)

    Book  Google Scholar 

  14. Dally, W.J., Balfour, J., Black-Shaffer, J.C., Harting, R.C., Parikh, V., Park, J., Sheffield, D.: Efficient embedded computing. Computer 41(7), 27–32 (2008)

    Article  Google Scholar 

  15. Fan, X.: A VLSI-oriented FFT algorithm and its pipelined design. In: International Conference on Signal Processing (ICSP), pp. 414–417. IEEE (2008). doi:10.1109/ICOSP.2008.4697159

  16. Fasthuber, R.: Efficient Implementation of Multiplications (Slide Set). Technical Report, IMEC (2011)

    Google Scholar 

  17. Fasthuber, R., Raghavan, P., Catthoor, F.: An enhancement for enabling variable multiplications on a general shift-add/sub datapath. to be decided (In preparation, 2013)

    Google Scholar 

  18. Fort, A., Weijers, J.W., Derudder, V., Eberle, W., Bourdoux, A.: A performance and complexity comparison of auto-correlation and cross-correlation for OFDM burst synchronization. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), p. II-341-4. IMEC (2003) doi:10.1109/ICASSP.2003.1202364

  19. Ghosh, A., Ratasuk, R., Mondal, B., Mangalvedhe, N., Thomas, T.: LTE-advanced: Next-generation wireless broadband technology. IEEE Wirel. Commun. 17(3), 10–22 (2010)

    Article  Google Scholar 

  20. Glossner, J., Iancu, D., Hokenek, E., Moudgill, M.: A software-defined communications baseband design. IEEE Commun. Magaz. 41(1), 120–128 (2003). doi:10.1109/MCOM.2003.1166669

    Article  Google Scholar 

  21. Gupta, R.: The Variability Expeditions: Exploring the Software Stack for Underdesigned Computing Machines. Qualcomm, UCSD (2011)

    Google Scholar 

  22. IRC: ITRS Roadmap on Interconnect (2009)

    Google Scholar 

  23. Jayapala, M., Barat, F., Vander Aa, T., Catthoor, F., Corporaal, H., Deconinck, G.: Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Trans. Comput. 54(6), 672–683 (2005)

    Article  Google Scholar 

  24. Kantabutra, V.: On hardware for computing exponential and trigonometric functions. IEEE Trans. Comput. 45(3), 328–339 (1996)

    Article  MATH  Google Scholar 

  25. Karuri, K., Leupers, R., Ascheid, G., Meyr, H., Kedia, M.: Design and implementation of a modular and portable IEEE 754 compliant floating-point unit. In: Design, Automation and Test in Europe (DATE), vol. 2, RWTH Aachen (2006)

    Google Scholar 

  26. Kelley, B.: Software defined radio for broadband OFDM protocols. In: International Conference on Systems, Man and Cybernetics (ICSMC), pp. 2309–2314. IEEE (2009). doi:10.1109/ICSMC.2009.5345986

  27. Kin, J., Gupta, M., Mangione-Smith, W.: Filtering memory references to increase energy efficiency. IEEE Trans. Comput. 49(1), 1–15 (2000). doi:10.1109/12.822560

    Google Scholar 

  28. Komalan, M., Hartmann, M., Gomez Perez, J.I., Tenllado, C., Artes Garcia, A., Catthoor, F.: System level exploration of Resistive-RAM (ReRAM) based hybrid instruction memory organization. In: Memory Architecture and Organization Workshop (MeAOW) (2012)

    Google Scholar 

  29. Koren, I.: Computer Arithmetic Algorithms, 2nd edn. Peters/CRC Press, A.K., Natick (2002)

    MATH  Google Scholar 

  30. Kunze, S., Matus, E., Fettweis, G.: ASIP decoder architecture for convolutional and LDPC codes. In: IEEE International Symposium on Circuits and Systems (ISCAS), i, pp. 2457–2460 (2009) doi:10.1109/ISCAS.2009.5118298

  31. Lambrechts, A., Raghavan, P., Novo, D., Ramos, E.R., Jayapala, M., Catthoor, F., Verkest, D.: Enabling wordWidth aware energy and performance optimizations for embedded processors. In: Workshop on Optimizations for DSP and Embedded Systems (FlexWare). IMEC (2007)

    Google Scholar 

  32. Lee, D.: Reconfigurable and area-efficient architecture for symmetric FIR filters with powers-of-Two coefficients. In: Conference on Innovations in Information Technologies (IIT), pp. 287–291. IEEE (2007). doi:10.1109/IIT.2007.4430440

  33. Li, M.: Algorithm and Architecture Co-design For Software Defined Radio Baseband. Ph.D. thesis, KU Leuven (2010)

    Google Scholar 

  34. Li, M., Amin, A., Appeltans, R., Torrea, R., Cappelle, H., Fasthuber, R., Dejonghe, A., Van der Perre, L.: Instruction set support and algorithm-architecture for fully parallel multi-standard soft-output demapping on baseband processors. In: IEEE Workshop on Signal Processing System (SIPS), pp. 140–145. IMEC (2010). doi:10.1109/SIPS.2010.5624777

  35. Li, M., Bougard, B., Xu, W., Novo, D., Van Der Perre, L., Catthoor, F.: Optimizing Near-ML MIMO detector for SDR baseband on parallel programmable architectures. In: Design, Automation and Test in Europe (DATE), pp. 444–449 (2008). doi:10.1109/DATE.2008.4484721

  36. Li, M., Fasthuber, R., Novo, D., Van Der Perre, L., Catthoor, F.: Algorithm-architecture co-design of soft-output ML MIMO detector for parallel application specific instruction set processors. In: Design, Automation and Test in Europe (DATE), pp. 1608–1613. IMEC (2009)

    Google Scholar 

  37. Liu, D., Nilsson, A., Tell, E., Wu, D., Eilert, J.: Bridging dream and reality: Programmable baseband processors for software-defined radio. IEEE Commun. Magaz. 47(9), 134–140 (2009). doi:10.1109/MCOM.2009.5277467

    Article  Google Scholar 

  38. Mansour, M., Shanbhag, N.: High-throughput LDPC decoders. IEEE Trans. Very Large Scale Integr. Syst. 11(6), 976–996 (2003). doi:10.1109/TVLSI.2003.817545

    Google Scholar 

  39. Markovic, D., Brodersen, R.W.: DSP Architecture Design Essentials. Springer, New York (2012)

    Book  Google Scholar 

  40. Miniskar, N.R., Hammari, E., Munaga, S., Mamagkakis, S., Kjeldsberg, P.G., Catthoor, F.: Scenario based mapping of dynamic applications on MPSoC: A 3D graphics case study. In: International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 48–57 (2009). doi:10.1007/978-3-642-03138-0_6

    Google Scholar 

  41. Nahm, S., Han, K., Sung, W.: A CORDIC-based digital quadrature mixer: Comparison with a ROM-based architecture. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 385–388 (1998). doi:10.1109/ISCAS.1998.698871

  42. Nigam, T.: Scaling to the final frontier: Reliability challenges in sub 20 nm technologies. In: IEEE International Integrated Reliability Workshop (IIRW), pp. xi–xi (2011). doi:10.1109/IIRW.2011.6142574

  43. Novo, D., Li, M., Fasthuber, R., Raghavan, P., Catthoor, F.: Exploiting finite precision information to guide data-flow mapping. In: Design Automation Conference (DAC), pp. 248–253 (2010)

    Google Scholar 

  44. Okada, K., Kousai, S.: Digitally-Assisted Analog and RF CMOS Circuit Design for Software-Defined Radio, 1st edn. Springer, New York (2011)

    Book  Google Scholar 

  45. Panda, P.R., Nicolau, A., Dutt, N.: Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration. Kluwer Academic Publishers, Norwell (1998)

    Google Scholar 

  46. Psychou, G., Fasthuber, R., Hulzink, J., Husiken, J., Catthoor, F.: Subword handling in data-parallel mapping. In , Parallel Programming and Run-Time Management Techniques for Many-core Architectures (PARMA) (2012)

    Google Scholar 

  47. Rabaey, J.: Low-power silicon architectures for wireless communications. In: Design Automation Conference (DAC), pp. 377–380 (2000). doi:10.1109/ASPDAC.2000.835128

  48. Rabaey, J., Abnous, A., Ichikawa, Y., Seno, K., Wan, M.: Heterogeneous reconfigurable systems. In: IEEE Workshop on Signal Processing System (SIPS), pp. 24–34 (1997). doi:10.1109/SIPS.1997.625684

  49. Raghavan, P., Lambrechts, A., Jayapala, M., Catthoor, F., Verkest, D.: Distributed loop controller for multi-threading in uni-threaded ILP architectures. IEEE Trans. Comput. 58(3), 311–321 (2009)

    Article  MathSciNet  Google Scholar 

  50. Raghavan, P., Lambrechts, A., Jayapala, M., Catthoor, F., Verkest, D., Corporaal, H.: Very Wide Register: An asymmetric register file organization for low power embedded processors. In: Design, Automation and Test in Europe (DATE). IMEC (2007)

    Google Scholar 

  51. Ramacher, U.: Software-defined radio prospects for multistandard mobile phones. Computer 40(10), 62–69 (2007)

    Article  Google Scholar 

  52. Salmela, P., Happonen, A., Burian, A., Takala, J.: Several approaches to fixed-point implementation of matrix inversion. International Symposium on Signals, Circuits and Systems (ISSCS), vol. 2(2), 497–500 (2005)

    Google Scholar 

  53. Samaras, K., Fasthuber, R., Agrawal, P., Catthoor, F.: Code Profiling for 60 GHz Baseband Processing. Technical Report, IMEC, SSET-CSI (2012)

    Google Scholar 

  54. Sasanka, R., Li, M.L., Adve, S.V., Chen, Y.K., Debes, E.: ALP: Efficient support for All levels of parallelism for complex media applications. ACM Trans. Architect Code Optim. 4(1) (2007)

    Google Scholar 

  55. Sheu, S.S., Cheng, K.H., Chang, M.F., Chiang, P.C., Lin, W.P., Lee, H.Y., Chen, P.S., Chen, Y.S., Chen, F.T., Tsai, M.J.: Fast-write resistive RAM (RRAM) for embedded applications. IEEE Design Test Comput. 28(1), 64–71 (2011). doi:10.1109/MDT.2010.96

    Article  Google Scholar 

  56. Tran, A.T., Truong, D.N., Baas, B.: A reconfigurable source-synchronous on-chip network for GALS many-core platforms. IEEE Trans. Comput. Aided Design 29(6), 897–910 (2010). doi:10.1109/TCAD.2010.2048594

    Article  Google Scholar 

  57. Woh, M., Mahlke, S., Mudge, T., Chakrabarti, C.: Mobile supercomputers for the next-generation cell phone. Computer 43(1), 81–85 (2010). doi:10.1109/MC.2010.16

    Article  Google Scholar 

  58. Woh, M., Sangwon, S., Mahlke, S., Mudge, T., Chakrabarti, C., Flautner, K.: AnySP: anytime anywhere anyway signal processing. IEEE Micro 30(1), 81–91 (2010)

    Article  Google Scholar 

  59. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: ACM Conference on Programming Language Design and Implementation, pp. 30–44. Stanford University, Stanford (1991)

    Google Scholar 

  60. Xie, Y.: Future Memory and Interconnect Technologies. Design, Automation and Test in Europe (DATE), In (2013)

    Google Scholar 

  61. Xu, W., Richter, M., Sauermann, M., Capar, F., Grassmann, C.: Efficient baseband implementation on an SDR platform. In: International Conference on ITS, Telecommunications, pp. 794–799 (2011). doi:10.1109/ITST.2011.6060163

  62. Yoshizawa, S., Miyanaga, Y.: Use of a variable wordlength technique in an OFDM receiver to reduce energy dissipation. IEEE Trans. Circ. Syst. (TCAS) 55(9), 2848–2859 (2008). doi:10.1109/TCSI.2008.920098

    Article  MathSciNet  Google Scholar 

  63. Zhang, W., Hu, J.S., Degalahal, V., Kandemir, M., Vijaykrishnan, N., Irwin, M.J.: Reducing instruction cache energy consumption using a compiler-based strategy. ACM Trans. Archit. Code Optim. 1(1), 3–33 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Fasthuber .

Appendix: Proposed Back-End Semi-custom Design Approach

Appendix: Proposed Back-End Semi-custom Design Approach

This appendix provides concepts/ideas for a back-end design approach which leads to a significant higher implementation efficiency as the conventional standard-cell flow (especially for DDSM technologies), but without sacrificing design time in a significant way. To enable this, the proposed approach is limited to architecture templates, such as the one proposed in this book. Preliminary experimental results, which are based on this proposal, will be shown in the appendix of Chap. 5. It should be emphasized that the content of this appendix is mostly theoretical at this stage and more effort is needed to develop, evaluate and refine this ongoing work further.

3.1.1 Motivation

As explained in Sect. 2.7.2, for the implementation of a hardware block, generally three different types of design flows can be applied: (1) conventional standard-cell design flow, (2) semi-custom design flow (based only on standard cells or also on custom cells) and (3) full-custom design flow. These three different types of design flows provide basically a trade-off in terms of design time and resulting implementation efficiency (i.e. area, energy of the design/layout). For both criteria, the differences amongst design flows can be in the order of several magnitudes [13]. Because of the tight time-to-market requirements and the strict limitations on development cost, today mostly the conventional standard-cell design flow is applied. In this design flow only (generic) standard cells are used and typically wires on each routing layer are either routed horizontally or vertically. Amongst further other drawbacks, the conventional standard-cell design flow cannot sufficiently cope with the increasing wire influence in DDSM technologies [22]. Thus, despite of migrating to a newer technology, the benefit, i.e. the reduction in terms of area and energy, may strongly be limited because of this reason. This is clearly creating a big dilemma. The application of conventional semi-custom design flows is basically also not a viable solution, since it usually implies a strong negative impact on design time and this because of the requirement to develop and characterize a new cell library for each design. In addition, many current semi-custom design flows focus only on the layout generation of the arithmetic datapath of the design, and neglect that the data movement from/to the peripheral (e.g. memories, arithmetic datapath of other cores) contributes significantly to the overall design efficiency. Therefore, to achieve overall high implementation efficiency, the complete floorplan needs to be taken into account. So if hard macros blocks are used, they should be present in a form which enables to change their physical shape, and that property should be exploited. Furthermore, most proposals are based on semi-custom designed hard-macro cells, which cannot “easily” be reused across different design instances. Hence, technology-dependency and reusability is a big issue. Thus, from these observations it is clear that the need for a new back-end design approach, which targets high implementation efficiency in combination with sufficient low design time, is mandatory.

3.1.1.1 Overall Proposed Back-End Design Approach

To summarize, the two main criteria that the new back-end design approach has to fulfill are: 1) acceptable design time (comparable to the conventional standard-cell design flow) and 2) highest possible implementation efficiency (comparable to existing semi-custom design flows and ideally close to the ones of full-custom design). Unfortunately, these two criteria imply contradictory constraints. The overall proposed back-end design approach offers a reasonable compromise. It is depicted in Fig. A.1. It combines the following main concepts/ideas which should ensure that both criteria can be fulfilled in the best-possible way:

Fig. A.1
figure 18

Overview of the overall proposed hierarchical back-end design approach. A clear distinction between library design and instance design is made. The library design involves two design flows: (1) top-down propagation of requirements from the architecture to the technology level. Thereby the design space is made, which means that possible design options for implementing the required architecture/component are “collected”. (2) bottom-up characterization and pareto-optimal filtering from the technology to the architecture level. Thereby the previously “collected” design options are implemented, characterized e.g. in terms of delay and energy, and non-pareto optimal design options are discarded at each level. The results of this flow are then stored in libraries, which are used during the design of template instances

  • Enable in general a big layout/design optimization space by removing restrictions of the conventional standard-cell flow: For instance, by enabling the use of custom cells and by removing the restrictions of an equal cell height, the design can significantly be more optimized (increases efficiency). Nevertheless, as a negative effect, the degree of automation will decrease (increases design time!). Fortunately, the following concepts will largely compensate for that.

  • Customize only parts of the architecture/circuits that have a significant impact on efficiency: To increase the efficiency of the design in a significant manner, the most impacting and critical parts of the architecture/circuit need to be optimized (e.g by using custom cells). However, for parts of the architecture/circuit that have overall little impact and are less critical (e.g. certain control logic, parts of the datapath that are seldomly activated and are not in the critical path), no customization will be applied (i.e. standard cells are used), since the resulting effect would basically only be an increase of design time. Thus, a “hybrid” design approach, which combines custom cells and standard cells, is proposed.

  • Limit the design space at the (micro-)architectural level by using an architecture template: The conventional standard-cell design flow can be applied to obtain the layout for any type of digital circuit description/architecture. Nevertheless, to enable this generality while keeping the number of different cells in the library acceptable (important for automation), only cells with primitive functionality (=less customization/efficiency) are present. Because an architecture template limits the number of architectural options significantly, this high degree of generality offered by the conventional standard-cell design flow is clearly not needed. Thus, the proposed design approach supports only a limited design space (decreases design time). In general, to determine the actual required architecture design space, i.e. the DSIP architecture template parameter space (see Sect. 3.6.2), requirements/limitations from the algorithm side as well as technology side have to be considered.

  • Reduce the “final” design space at component/cell level by applying a combination of top-down and bottom-up flow with pareto-optimal filtering: As mentioned above, the DSIP architecture template defines the design space at the (micro-) architectural level. Based on this information, a top-down flow is applied and thereby all feasible/relevant design options/implementations at different levels of the design hierarchy are “collected”. This basically involves two sub-steps: First, all possible design options for a particular component/cell are determined. For instance, to implement the adder that is present in the BAU, all different types of adder topologies, such as carry-ripple adder and brent-kung adder, are initially considered. Second, all design options which would clearly not fulfill the requirements are already discarded. For instance, the BAU is generally the most utilized unit and should therefore determine the maximal clock frequency, i.e. the adder that is used within this unit should have a very low delay. Thus, utilizing a “slow” carry-ripple in the BAU is not feasible and therefore this implementation option can already be filtered-out. After applying this top-down flow, which is generally technology-independent (across technology nodes with similar back-end of line options), all feasible/relevant design options at each level of the design hierarchy (component/cell levels) are known. Next, a bottom-up flow is applied. Thereby all previously determined feasible/relevant design options are implemented, characterized and again filtered at each level of the design hierarchy. As it will be mentioned further below, instead of making a manual layout for each design option, the designs are described in a more general/scalable way, which enables a high reuse across technology nodes. Nevertheless, the characterization and the filtering of each design option has to be technology specific, but it is assumed that this task can largely be automated. The filtering is done by discarding all design options which are not pareto-optimal (e.g. in terms of delay-energy trade-off). This ensures a reduction of the design space (decreases design time) in an optimal multi-objective way. Note, because the proposed DSIP architecture template is composed in a strong hierarchical way, see Fig. A.2 for an example, this concept can be well applied.

  • Describe design options in a scalable/reusable way by using (propagated) relative placement information at all design levels: Instead of designing fixed cells and/or hard-macro blocks, the logical and physical structures of design options (circuits) are described in a parametrizable format, which enables high scalablility/reusability. A parameter of the logic description will usually be the word length, a parameter of the physical description could be the edge of the data input pins. In order to ensure that the most important wires (most actively used, wires in critical path) in the design are short, which is a necessity for DDSM technology (increases efficiency), the relative placement of components/cells in the architecture template is pre-defined at design time, and propagated between hierarchy levels. In general, the architecture template is the enabler of this approach. Note, the relative placement information is largely technology independent and therefore highly reusable. With the a priori specification of the relative placement, the general design space is reduced. Thus, automation becomes again more feasible (decreases design time). In comparison, traditional hard macros offer no or a very limited degree of flexibility and traditional soft macros do not include physical/layout information.

  • Leverage on a design library at all levels to enable a high reuse of design effort: The DSIP architecture template ensures that the actual designs (instances) are always composed out of the same components/cells. Therefore, the components/cells do only need to be designed once and can then be reused. To leverage on this property, a clear distinction between library design and instance design is made. The effort for designing and optimizing the library can be averaged over all instances (decreases design time). By employing an individual design library at each hierarchy level, maximal re-usability is ensured.

  • Provide meaningful/rather accurate information to architecture and even algorithm designers: As mentioned in Sect. 1.2.4, in practice the efficiency of the design implementation and the design time suffers because of the cultural gap. The former mainly because of over-design, the latter mainly because of design iterations. Both negative effects are caused due to the lack of information. By applying the proposed back-end design approach, a strong link between technology/layout-level and the architecture-level is enabled. Thus, a significant part of the cultural gap can be bridged, which results in less design iterations (decreases design time) and in designs which fulfill exactly the requirements/specifications (increases efficiency).

  • Automate as much as possible to increase productivity: The top-down flow of the library design can probably not very-well automated and requires at least initially a manual effort. However, this initial effort is highly reusable because (1) the architecture template is domain-specific and the requirements (optimization goals) for a domain do basically not change and (2) the space of design options is also growing very slow. For instance, it is rather seldom that a new better adder topology is found. In the bottom-up flow of the library design the design options need to be described, the actual layouts need to be made and characterized, and the pareto-optimal filtering needs to be applied. Except for the first step, a high degree of automation seems to be feasible. The first step requires manual effort, but it is a task that is largely technology-independent and highly reusable. Since it is applied most often, the most crucial design flow in terms of design time is the library instantiation design flow. However, because of the existence of the design library, which captures basically all possible “layouts” within the DSIP template design space “in advance”, the application of this design flow is rather straight-forward and can therefore be highly automated (decreases design time). Large parts of the proposed approach can already be automated today with existing EDA tools.

In the following we propose an initial library instantiation design flow, which is still only based on the standard-cell layout structure, i.e. it does not include custom cells.

3.1.1.2 Proposed Back-End Library Instantiation Design Flow

Figure A.3a depicts the main steps of the proposed back-end library instantiation design flow. As a reference, the conventional standard-cell design flow is shown in Fig. A.3b. To enable a reasonable degree of automation with existing EDA tools, the resulting layout of the proposed flow is still compliant with the conventional standard-cell layout structure. Thus, the only way to obtain higher efficiency is basically by reducing the length of important wires (i.e. wires in the crictical path and highly active wires). It is important to emphasize that this limitation will not be present in the actual targeted flow mentioned further above.

Fig. A.2
figure 19

Design hierarchy present in typical DSIP template instances. Here shown in the case of the HardSIMD FIR DSIP instance which will be explained in Chap. 5. The strong hierarchy ensures high scalability and reusability

Fig. A.3
figure 20

a Proposed back-end design flow and b conventional standard-cell flow, which can be used for the template instantiation, in comparison

In the following the main three differences compared to the conventional standard-cell design flow are highlighted:

  • The architecture design, the logic synthesis as well as the “local” layouting (refers to the relative placement of cells) steps are combined. Because in this combined step basically only a selection of pre-computed pareto-optimal design options takes place, this seems feasible. As mentioned earlier, design options are described in a scalable and largely technology-independent way (at least as much as possible). For instance, VHDL enables the use of generic parameters and to abstract the logic gates from the standard cell technology library, an abstraction layer can be added. Existing layout description languages, such as Cadence Structured DataPath (SDP) [7], enable already today to describe the relative placement of standard cells in a parametrizable manner. Optional pin position and routing constraints can typically be provided to layout tools in form of scripts. The selection of pareto-optimal design options, given the specific requirements for the instance, can largely be automated. After this combined step the design is represented in a generic gate-level HDL with relative placement information of cells and optional routing constraints. Generic means that certain parameters, such as drive strength of cells, can still be changed later in the design flow (e.g. for the most critical locations).

  • Semi-custom placement of cells. All cells, for which relative placement information is present, are placed first. Note, this affects typically all the important cells from the datapath and the data memories (data plane). Only after this a priori placement has been completed, all other cells (which are typically part of the control plane and have little overall impact) are placed automatically. An incremental optimization step can refine the positions of the cells, but the ones with relative placement information can only be moved within a certain radius.

  • Optional routing constraints are applied. When deciding on the relative placement of cells, the relative length of the wires and potentially also the targeted routing layers are considered. The information about the targeted routing layers can be provided in this step and would ensure that the resulting routing corresponds to the targeted one.

3.1.1.3 Implementation of the Proposed Back-End Library Instantiation Design Flow and Experimental Results

A variant of the proposed back-end library instantiation design flow has been implemented and applied on the following two designs: 1) on the DSIP architecture template instance of the FIR filter (see case study 2) and on 2) a standard-cell memory. The preliminary experimental results, which are shown in the appendix of Chap. 5, are promising.

3.1.1.4 Conclusions

In this appendix concepts/ideas for a new (more holistic) back-end design approach were proposed and for the part of the template instantiation, a more concrete design flow variant was introduced. Contrary to the conventional standard-cell design flow, the proposed approach establishes a strong link between architecture and technology, which is strongly enabled because of the assumption that an architecture template is utilized. Nevertheless, this assumption applies for the targeted DSIP design approach proposed in this book. To increase the implementation efficiency, certain design/layout limitations that are present in the conventional standard-cell flow have been removed. In general this has a negative impact on automation and design time. Nevertheless, because the components of the library are often reused, the effort for designing and optimizing these components can be averaged-out over many template instances, which decreases again the average design time. The design flow which is applied most often, i.e. the template instantiation flow, can largely be automated, because most design decisions have already been considered during the library design. Thus, the proposal combines the advantages of conventional semi-custom design flows in terms of implementation efficiency with the advantages of the conventional standard-cell design flows in terms of design time.

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Fasthuber, R., Catthoor, F., Raghavan, P., Naessens, F. (2013). The Proposed DSIP Architecture Template for the Wireless Communication Domain. In: Energy-Efficient Communication Processors. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4992-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-4992-8_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-4991-1

  • Online ISBN: 978-1-4614-4992-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics