Tkool Electronics

4. IEEE, Inc. Forwarding and Queuing Enhancements for Time-Sensitive Streams. IEEE802.1Qav/D7.0 edition. [Online] October 2009.


4. IEEE, Inc. Forwarding and Queuing Enhancements for Time-Sensitive Streams. IEEE802.1Qav/D7.0 edition. [Online] October 2009.

Custom datapath designers designing microprocessors, digital signal processors (DSPs) and graphics processors have to meet aggressive Performance Power Area (PPA) targets. They use datapath techniques such as tiling to layout highly regular structures for commonly used building blocks such as adders, multipliers, coders, decoders, etc., used in processor designs. In tiling, the designer partitions specific functions and arranges library cells for those functions into rows and columns. Each cell is placed relative to the other cell; as an example, cell A is on the same row and to the left of cell B and so on. Tiling enables custom layouts that can meet rigorous PPA targets. However, tiling requires a lot of manual work to make sure the structure is regular; the cells are optimal and address multi-mode multi-corner, SI and multi-voltage requirements. Creating and maintaining the optimality of tiling throughout the design flow is time consuming and results in longer design cycles. At the same time, designers designing mobile and multimedia SoCs at 40nm and below are also using processor cores. These mainstream” designers would like to target that same custom performance but without the penalty of longer design cycles and extensive manual intervention. Additionally, datapath techniques are being extended to design structures such as register banks, clock trees, multiplexers, etc. In summary, mainstream designers need an automated datapath solution that allows them to meet custom design objectives with predictable and shorter design schedules.

What are custom datapath designs? The logic in digital designs is typically classified into two categories: control blocks and datapath blocks. Control blocks are random in nature and are handled well by standard synthesis and place-and-route tools. Datapath blocks perform Boolean (AND, OR, and XOR) or arithmetic (ADD, SHIFT, and MULTIPLY) operations. In datapath designs, Boolean or arithmetic bitwise data operations are performed in parallel on each bit of a bus. Each operation corresponds to a dedicated function, for example, adder, multiplier, register and multiplexer. Figure 1 shows an example of a datapath block.


Flow complexity It is quite common to see datapath designers use their unique flow to implement datapath structures. This flow, typically a combination of a proprietary language and GUI, helps meet PPA targets but is hard to share across teams and to provide as IP to external customers. This is because the combination of a standalone custom datapath stage and a traditional place-and-route (P&R) tool yields a complex and inefficient flow. Designers first create a datapath structure and then need to ensure that it is preserved through the full place and route flow in the context of the rest of the design implementation. As shown in Figure 3 , this flow requires many optimization loops to meet the targeted PPA. Changes in cell selection made in one part of the design affect other parts of the design and, in turn, might alter the structure of the physical datapath. Picking cells with the correct drive specifications and size (height/width) to fix timing or logic design rule constraints without disturbing the datapath structure requires extensive design and tool knowledge. The bottom line is that in the custom flow, the cost of design changes is very high.

Design knowledge requirements Custom datapath designers need to have detailed knowledge of datapath blocks, which requires access to the design at an early stage of the flow. Not every function or block is suitable for structured placement. Designers need to know the dataflow, input-output connections and loads when making major cell placement decisions. Typically datapath designs have large buses (64/128/256 bit); therefore knowing the fan-in logic of input and output loads and the size of the buses (64/128/256 bit) is important from a routing resources standpoint.


Project schedules Implementing high-performance custom datapath design is time consuming, primarily due to the handcrafted placement stage and the iterations between custom datapath and traditional place and route. Depending on the design complexity, the number and size of datapath blocks, and the technology node, custom datapath design teams’ development schedules are long and harder to predict. What do designers want? Designers developing cores for processors, DSP or other applications design them knowing their IP will be used in multiple products at different process nodes. These designers want IP that is portable, easy to use, and predictable and meets the PPA targets of the end users. Mainstream designers, especially those developing mobile and multimedia applications, require powerful processing engines and the capacity to handle increasing graphics content. To meet these requirements, they use on-chip processor cores, DSP cores and small memories. Mainstream designers want a solution that helps them create datapath structures for the IP blocks as well as handle standard cells in the design seamlessly. These designers also want a solution that can help them create register banks, clock structures, multiplexers, crossbar switches, etc. that are more efficient and help meet PPA objectives. In summary, designers want an automated solution based on traditional place and route tools that delivers custom PPA but with a shorter and predictable design schedule.

Automated datapath solution Physical datapath technology in IC Compiler provides logic designers with a predictable, production proven flow in a single unified environment. With the support of Design Compiler during the front-end synthesis phase, RTL designers can create datapath structures and then pass them to IC Compiler for place and route. As needed, designers in IC Compiler can also create datapath structures to specify the relative column and row positions of instances by using simple built-in Tcl commands. These specific commands are called relative placement (RP) constraints. During placement, legalization and optimization, datapath structures are physically preserved and are placed as a single entity in the context of the rest of the design.


Relative placement is usually applied to datapath blocks and registers but can also be applied to any cells in the design, which require some regularity. Examples of commonly used datapath structures are adders, register banks, coders, decoders, multiplexers, etc. Figure 4 displays the design matrix as an RP group after placement.

. ARM cores ARM cores are used extensively in high-performance mobile and communication products. The ARM Cortex-A8 processor’s NEON unit process multimedia applications and include blocks such as video encode/decode, 2D/3D graphics, gaming, audio processing, image processing, etc. In the NEON unit, the 19 blocks highlighted in Figure 5 , used IC Compiler RP constraints for a physical datapath implementation that reduced total negative slack (TNS) by 20x.

Power density is not the only issue in today’s data center; operating cost has been another hot topic as the number of servers rose in the past years.Many techniques such as power capping have been introduced to increase energy efficiency in data center, but the ability to measure the power level accurately remain the key factor.This paper compares the area, cost, power saving, and other benefits gained with an integrated solution used in protection and power measurement in servers compared to discrete implementations.


The significance of error in power measurement in servers is well known.Recent published studies have estimated a cascade effect such that for every watt in savings at the server component level, 2.84 watts can be saved from the facility energy consumption (see Reference) . For example, consider a 600 W server where a 5% error represents 30 W. This represents an unfavorable 85.2 W impact in facility energy consumption. In a typical data center with 1,000 servers (Reference ), this adds up to 85.2 kW of wasted energy.By reducing the power accuracy error to 2%, 34 kW of energy is wasted – a 60% reduction.

With the numbers of servers rising in data centers, the financial impact of the inaccuracy can quickly add up to millions of dollars in extra utility bill. It is estimated that the annual cost in operating data centers in the U.S. has reached as high as $3.3 billion (Reference ).


Powered By Tkool Electronics

Copyright Your WebSite.sitemap