.PH "" .ce 3 .B .S 14 Synopsys Notes .S .R Matt Biggio June 4, 1993 .H 1 "Introduction" I used Synopsys and DMoct to generate the layout for the controller. After writing an initial verilog model, in which I attempted to follow the advise of the handout on \fIVerilog Coding Styles\fR, Synopsys generated 453 cells. My estimate for our chip at the time was that the controller would need to be about 150 cells to comfortably fit with our datapath. Instead of giving up entirely, I decided it would be instructive to try to see what I could do to get Synopsys to generate a more compact design. In the end, I was able to get Synopsys to generate a design with almost the identical functionality as the 453 cell design in about 180 cells. Since I made notes of the steps I took to achieve these results, I thought I should pass along this information. .H 1 "Information from Synopsys" The first source of information came from the reporting features of Synopsys and the general output as it performed the operations. When a verilog file is first read in to synopsys, it reports on the inferred latches and tri-states, as well as the types of case statements that it found in the verilog file. The report of inferred latches should be used first to make sure that only the latches that were intended were inferred. If unintentional latches are being created, the most likely case is that there is some path through the verilog code in which a conditional declaration has been made. In any \fIalways\fR block, every combinationl (non-latched) output must be assigned in every path through the \fIalways\fR block. For example, if an output \fBout1_s2\fR is intended to be combinational, then it must be set in both the \fIif\fR or the \fIelse\fR of an \fIif-then-else\fR block. The output must also be set for \fBevery\fR case of a \fIcase\fR block. .P Examples: .in +5 .CW .nf // Using if-then-else statements // always (@enable) // Combinational Logic if (enable == 1'b1) out = 1'b1; else out = 1'b0; always (@enable) // Latch with enable as clock and out as input if (enable == 1'b1) out = 1'b1; // Using case statements reg [1:0] state; always (@state) // Combinational Logic case (state) 2'b0: out = 1'b1; 2'b1: out = 1'b0; 2'b2: out = 1'b0; 2'b3: out = 1'b1; endcase always (@state) // Combinational Logic case (state) 2'b0: out = 1'b1; 2'b1: out = 1'b0; default: out = 1'b0; endcase always (@state) // Latch with state bits as clock and out as input case (state) 2'b0: out = 1'b1; 2'b2: out = 1'b0; 2'b3: out = 1'b0; endcase .in -5 .fi .R .P Another potential source for getting unwanted latches is if you make an assignment for an output in more than one \fIalways\fR block. It should be possible to have an output controlled completely by a single \fIalways\fR block. If the output is conditional on more that one signal, put all of these signals in the @ line of the always and use \fIif\fR and \fIcase\fR statements to specify the output. .P For example, it would seem to be easier to reset the state bits of a state machine in its own always block. However, synopsys will generate two latches for each signal and then use combinational logic to combine the latch outputs. Example: .in +5 .CW .nf // Reset Logic for state bits in separate always block // always (@Reset) // Combinational Logic if (Reset == 1'b1) state = 3'b0 // Normal next state logic // always (@Phi1 or opcode or state) if (Phi1) case (state) `STATE1: state = `STATE2; ... ... .in -5 .fi .R .P The above code will generate a Phi1 latch and a Reset latch. The output of the two latches will then be combined using ANDs or ORs to define the state output. Instead, the following form should be used: Example: .in +5 .CW .nf // Next state logic // always (@Reset or Phi1 or opcode or state) // Combinational Logic if (Reset == 1'b1) state = 3'b else case (state) `STATE1: state = `STATE2; ... ... .in -5 .fi .R .P The information about the types of \fIcase\fR statements is helpful as well. By default, synopsys will assume that all \fIcase\fR statments are \fBnot\fR parallel and \fBnot\fR full. The parallel-case directive may be used in any \fIcase\fR statement where you know for certain that only one of the possible cases will be true. This is typically true for a state-machine based design. The parallel-case directive is easy to include, just add a comment at the top of the case as follow: Example: .in +5 .CW .nf // Parallel case directive example // ... case (state) // synopsys parallel_case `STATE1: .... .in -5 .fi .R .P In my case, the state-machine had to be in only one state at a time, and was always executing one instruction. Therefore, all of my \fIcase\fR statements could be declared to be parallel. This does not prevent you from having several cases share an identical execution block. In fact, it is good to identify where cases are identical and structure the code accordingly. Example: .in 5 .CW case (state) `STATE1: // execution falls through to `STATE2 `STATE2: begin ... end `STATE3: // This block is different from the above. begin ... end ... .R .in -5 .P It is also possible to use the full-case directive, but more care must be taken. If every possible case is specifically included in the \fIcase\fR statement, then the full-case directive can be used. If every possible case is specified, then \fBno\fR default case should be included. Synopsys will generate somewhat fewer gates if this is the case. However, the benefit was only about 10 gates for me. Therefore, if every case is not needed, then it doesn't make sense to add them just to be able to use full-case. In many cases, full case is what you need. In my controller, I had 16 states, which covered every case of a 4 bit state variable. The full-case directive is added similarly to the the parallel-case. And, both can be used together. Example: .in +5 .CW .nf // Parallel case directive example // ... // Full Case: (for state being 2 bits) case (state) // synopsys full_case `STATE0: .... `STATE1: .... `STATE2: .... `STATE3: .... endcase ... // Parallel and Full case: (for state being 2 bits) case (state) // synopsys parallel_case full_case `STATE0: .... `STATE1: .... `STATE2: .... `STATE3: .... endcase .in -5 .fi .R .P The schematic generated by Synopsys can be used to try to get an understanding of what it is doing, but other things should be tried first. For any reasonably sized design, the schematic generated seems to look like a rat's nest. If you do look at the schematics, you can zoom in on various points and select the nets you want to follow. The real obstacle is that all of the internal nets have names like \fBn1254\fR, which makes the schematic hard to follow. One thing that can be usefully gleaned from the schematic is where any constant zero or constant one assignments are being made. The standard cell library can not handle these cases and will give errors. The problem can usually be eliminated by identifying where the constants are being generated and restructing this part of the code. Actually, this is usually an indication of a bug. .H 1 "Information to Synopsys" Synopsys will generate a much more efficient mapping if you give it more information. Synopsys is very good at cell reduction, but it does not use any high-level algorithms to simplify the design. Instead, Synopsys will implement exactly what you told it to. Therefore, an attempt should be made to simplify the design before passing it to synopsys. An example is with a state-machine design. If you have a 24 state machine in which 8 of the states are actually copies of other states, Synopsys will not reduce the state-machine to 16 states. It will generate gates for the 24 state machine. The total number of states is an important factor in the size of the controller, so it pays to do some paper design work to try to get an efficient state machine. .P Similarly, if you are using the current state or current instruction to conditionally set outputs, the most obvious thing to do in the verilog code is something like: Example: .in +5 .CW .nf if (state == 4'b0101) ... else ... .in -5 .fi .R Synopsys will generate a more efficient design if you are able to give it more information. If you know that in the particular state or instruction you are checking, that only a single bit (or a subset of bits) must be used to set the output, write this specifically in the verilog. If in the above example, only the state[2] bit must be used, then the following code is more efficient: Example: .in +5 .CW .nf if (state[2] == 1'b1) ... else ... .in -5 .fi .R .P It is generally true that if you can eliminate unneeded wires and registers in the verilog by using other necessary elements, then synopsys will generate a more efficient design. That is, if you know that some signal is just the inverse of another signal, eliminate one of them and check for the opposite condition. That is, use != instead of ==, or vice-versa. .P Another area where you can help out Synopsys is in state assignment. The normal procedures should be used to try to reduce the amount of logic needed to generate the next-state information. In cases where the state-machine is simply looping, a shift-register design for the state bits is more efficient. The state assignment issue is not a simple one, and may require some trial-and-error experimentation. The state assignments can also dramatically effect your ability to use single state bits for \fIif\fR statements as described above. .P Obviously, if the number of states can be encoded in 4 bits instead of 5 bits, less gates will be required. That is, if a state machine definition requires 17 states, some effort should be given to eliminating one additional state. .P Always give size information on constants. Synopsys will assume 32 bit comparisons if no size information is given. These extra comparisons will sometimes be optimized back out, but it is better to explicity define constants to the number of bits that are actually needed. The bit-definitions in Verilog make the code less readable in my opinion, so I chose to use \fBdefine\fR statements for most constants. Some examples: .in +5 .CW .nf `define LOW 1'b0 `define HIGH 1'b1 `define STATE0 3'b0 `define STATE1 3'b1 `define STATE2 3'b2 `define STATE3 3'b3 .in -5 .fi .R .P The last input to Synopsys that can have a huge impact on the number of cells generated is in the constraints that are given. Synopsys will do a very good job of reducing the number of cells if you specify that the "Smallest" design should be generated. Unfortunately, the fewest cells does not always produce the "smallest" design since the number of connections may go up as the cell count goes down. However, this is still a very productive tool. Also, the first time the optimization step is performed by Synopsys, it does not reduce the number of cells to minimum. The algorithm used seems to be affected by the starting point, and therefore, improved results can be achieved by running the optimizer several times. In my experience, the size generated after two optimizations worked out to be the best, but this could change based on the specific design. I generally ran it through two optimizations as I compared design changes to be consistent. .H 1 "General Strategies" Another area where some logic can be saved is by cheating on the timing types. If you know what you are doing, you can make cases for specific instances where you can use the wrong timing type and save a latch. In particular, if something is latched, and then not used until several cycles later, it is OK to use the output of the single latch. This was the case for the Instruction Register (IR) in my design. The IR was latched by Phi2, generating a stable 1 signal. The controller needed to use the IR bits during Phi2 as well, so theoretically, a Phi1 latch is needed to generate the stable 2 signal. However, I knew that the IR would not be latched again before the states that used it. That is, the IR bits were actually stable for many cycles. .P Sometimes you can make a design optimization that reduces the number of cells needed, but will not really change the overall functionality. For example, perhaps you only need a particular signal to be latched under a few conditions. However, if there is no harm in always latching the signal, then remove all of the conditional statements on the latch. While the value of this signal will not always be "correct", you only need the signal to be correct when you look at the signal. .P Another example of removing unnecessary logic is in the inputs to the controller. For example, if you have an 8-bit instruction, but the controller only uses a subset of the bits, only latch the bits that are needed. .P The design complexity increases with the number of outputs that you are controlling. In particular, in a state-machine design, if there are 10 outputs of the state machine, then the logic to set these 10 outputs must be included for every state. Synopsys will do as much gate-level optimization as it can, but eliminating outputs all together will usually result in a savings. One way to do this is to encode outputs. This is normally done for register select lines and specifications, but can also be used to encode ALU operations and other things. .H 1 "Lager and the Standard Cell Router" The standard cell router will default to a certain number of rows. This can be easily changed to fit the aspect ratio required. However, lager will always generate rows horizontally. You can get a short, wide layout by specifying only a few rows, but this is very inefficient. Lager will generate a more compact design if a greater number of rows are given. Therefore, a tall, skinny layout should be generated and then rotated to fit. Also, the outputs from the standard cell block can be assigned to a specific side. This does not seem to effect the overall area very much, but can greatly reduce the wiring overhead. Good luck...