Synopsys Notes Matt Biggio June 4, 1993 I used Synopsys and DMoct to generate the layout for the con- troller. After writing an initial verilog model, in which I at- tempted to follow the advise of the handout on _V_e_r_i_l_o_g _C_o_d_i_n_g _S_t_y_l_e_s, Synopsys generated 453 cells. My estimate for our chip at the time was that the controller would need to be about 150 cells to comfortably fit with our datapath. Instead of giving up entirely, I decided it would be instructive to try to see what I could do to get Synopsys to generate a more compact design. In the end, I was able to get Synopsys to generate a design with al- most the identical functionality as the 453 cell design in about 180 cells. Since I made notes of the steps I took to achieve these results, I thought I should pass along this information. The first source of information came from the reporting features of Synopsys and the general output as it performed the opera- tions. When a verilog file is first read in to synopsys, it re- ports on the inferred latches and tri-states, as well as the types of case statements that it found in the verilog file. The report of inferred latches should be used first to make sure that only the latches that were intended were inferred. If uninten- tional latches are being created, the most likely case is that there is some path through the verilog code in which a condition- al declaration has been made. In any _a_l_w_a_y_s block, every com- binationl (non-latched) output must be assigned in every path through the _a_l_w_a_y_s block. For example, if an output out1_s2 is intended to be combinational, then it must be set in both the _i_f or the _e_l_s_e of an _i_f-_t_h_e_n-_e_l_s_e block. The output must also be set for every case of a _c_a_s_e block. Examples: // Using if-then-else statements // always (@enable) // Combinational Logic if (enable == 1'b1) out = 1'b1; else out = 1'b0; always (@enable) // Latch with enable as clock and out as input if (enable == 1'b1) out = 1'b1; // Using case statements reg [1:0] state; always (@state) // Combinational Logic case (state) 2'b0: out = 1'b1; 2'b1: out = 1'b0; 2'b2: out = 1'b0; 2'b3: out = 1'b1; endcase always (@state) // Combinational Logic case (state) 2'b0: out = 1'b1; 2'b1: out = 1'b0; default: out = 1'b0; endcase always (@state) // Latch with state bits as clock and out as input case (state) 2'b0: out = 1'b1; 2'b2: out = 1'b0; 2'b3: out = 1'b0; endcase Another potential source for getting unwanted latches is if you make an assignment for an output in more than one _a_l_w_a_y_s block. It should be possible to have an output controlled completely by a single _a_l_w_a_y_s block. If the output is conditional on more that one signal, put all of these signals in the @ line of the always and use _i_f and _c_a_s_e statements to specify the output. For exam- ple, it would seem to be easier to reset the state bits of a state machine in its own always block. However, synopsys will generate two latches for each signal and then use combinational logic to combine the latch outputs. Example: // Reset Logic for state bits in separate always block // always (@Reset) // Combinational Logic if (Reset == 1'b1) state = 3'b0 // Normal next state logic // always (@Phi1 or opcode or state) if (Phi1) case (state) `STATE1: state = `STATE2; ... ... The above code will generate a Phi1 latch and a Reset latch. The output of the two latches will then be combined using ANDs or ORs to define the state output. Instead, the following form should be used: Example: // Next state logic // always (@Reset or Phi1 or opcode or state) // Combinational Logic if (Reset == 1'b1) state = 3'b else case (state) `STATE1: state = `STATE2; ... ... The information about the types of _c_a_s_e statements is helpful as well. By default, synopsys will assume that all _c_a_s_e statments are not parallel and not full. The parallel-case directive may be used in any _c_a_s_e statement where you know for certain that only one of the possible cases will be true. This is typically true for a state-machine based design. The parallel-case direc- tive is easy to include, just add a comment at the top of the case as follow: Example: // Parallel case directive example // ... case (state) // synopsys parallel_case `STATE1: .... In my case, the state-machine had to be in only one state at a time, and was always executing one instruction. Therefore, all of my _c_a_s_e statements could be declared to be parallel. This does not prevent you from having several cases share an identical execution block. In fact, it is good to identify where cases are identical and structure the code accordingly. Example: case (state) `STATE1: // execution falls through to `STATE2 `STATE2: begin ... end `STATE3: // This block is different from the above. begin ... end ... It is also possible to use the full-case directive, but more care must be taken. If every possible case is specifically included in the _c_a_s_e statement, then the full-case directive can be used. If every possible case is specified, then no default case should be included. Synopsys will generate somewhat fewer gates if this is the case. However, the benefit was only about 10 gates for me. Therefore, if every case is not needed, then it doesn't make sense to add them just to be able to use full-case. In many cases, full case is what you need. In my controller, I had 16 states, which covered every case of a 4 bit state variable. The full-case directive is added similarly to the the parallel-case. And, both can be used together. Example: // Parallel case directive example // ... // Full Case: (for state being 2 bits) case (state) // synopsys full_case `STATE0: .... `STATE1: .... `STATE2: .... `STATE3: .... endcase ... // Parallel and Full case: (for state being 2 bits) case (state) // synopsys parallel_case full_case `STATE0: .... `STATE1: .... `STATE2: .... `STATE3: .... endcase The schematic generated by Synopsys can be used to try to get an understanding of what it is doing, but other things should be tried first. For any reasonably sized design, the schematic gen- erated seems to look like a rat's nest. If you do look at the schematics, you can zoom in on various points and select the nets you want to follow. The real obstacle is that all of the inter- nal nets have names like n1254, which makes the schematic hard to follow. One thing that can be usefully gleaned from the schemat- ic is where any constant zero or constant one assignments are be- ing made. The standard cell library can not handle these cases and will give errors. The problem can usually be eliminated by identifying where the constants are being generated and restruct- ing this part of the code. Actually, this is usually an indica- tion of a bug. Synopsys will generate a much more efficient mapping if you give it more information. Synopsys is very good at cell reduction, but it does not use any high-level algorithms to simplify the design. Instead, Synopsys will implement exactly what you told it to. Therefore, an attempt should be made to simplify the design before passing it to synopsys. An example is with a state-machine design. If you have a 24 state machine in which 8 of the states are actually copies of other states, Synopsys will not reduce the state-machine to 16 states. It will generate gates for the 24 state machine. The total number of states is an important factor in the size of the controller, so it pays to do some paper design work to try to get an efficient state machine. Similarly, if you are using the current state or current instruc- tion to conditionally set outputs, the most obvious thing to do in the verilog code is something like: Example: if (state == 4'b0101) ... else ... Synopsys will generate a more efficient design if you are able to give it more information. If you know that in the particular state or instruction you are checking, that only a single bit (or a subset of bits) must be used to set the output, write this specifically in the verilog. If in the above example, only the state[2] bit must be used, then the following code is more effi- cient: Example: if (state[2] == 1'b1) ... else ... It is generally true that if you can eliminate unneeded wires and registers in the verilog by using other necessary elements, then synopsys will generate a more efficient design. That is, if you know that some signal is just the inverse of another signal, el- iminate one of them and check for the opposite condition. That is, use != instead of ==, or vice-versa. Another area where you can help out Synopsys is in state assignment. The normal pro- cedures should be used to try to reduce the amount of logic need- ed to generate the next-state information. In cases where the state-machine is simply looping, a shift-register design for the state bits is more efficient. The state assignment issue is not a simple one, and may require some trial-and-error experimenta- tion. The state assignments can also dramatically effect your ability to use single state bits for _i_f statements as described above. Obviously, if the number of states can be encoded in 4 bits instead of 5 bits, less gates will be required. That is, if a state machine definition requires 17 states, some effort should be given to eliminating one additional state. Always give size information on constants. Synopsys will assume 32 bit comparis- ons if no size information is given. These extra comparisons will sometimes be optimized back out, but it is better to expli- city define constants to the number of bits that are actually needed. The bit-definitions in Verilog make the code less read- able in my opinion, so I chose to use define statements for most constants. Some examples: `define LOW 1'b0 `define HIGH 1'b1 `define STATE0 3'b0 `define STATE1 3'b1 `define STATE2 3'b2 `define STATE3 3'b3 The last input to Synopsys that can have a huge impact on the number of cells generated is in the constraints that are given. Synopsys will do a very good job of reducing the number of cells if you specify that the "Smallest" design should be generated. Unfortunately, the fewest cells does not always produce the "smallest" design since the number of connections may go up as the cell count goes down. However, this is still a very produc- tive tool. Also, the first time the optimization step is per- formed by Synopsys, it does not reduce the number of cells to minimum. The algorithm used seems to be affected by the starting point, and therefore, improved results can be achieved by running the optimizer several times. In my experience, the size generat- ed after two optimizations worked out to be the best, but this could change based on the specific design. I generally ran it through two optimizations as I compared design changes to be con- sistent. Another area where some logic can be saved is by cheating on the timing types. If you know what you are doing, you can make cases for specific instances where you can use the wrong timing type and save a latch. In particular, if something is latched, and then not used until several cycles later, it is OK to use the output of the single latch. This was the case for the Instruc- tion Register (IR) in my design. The IR was latched by Phi2, generating a stable 1 signal. The controller needed to use the IR bits during Phi2 as well, so theoretically, a Phi1 latch is needed to generate the stable 2 signal. However, I knew that the IR would not be latched again before the states that used it. That is, the IR bits were actually stable for many cycles. Some- times you can make a design optimization that reduces the number of cells needed, but will not really change the overall func- tionality. For example, perhaps you only need a particular sig- nal to be latched under a few conditions. However, if there is no harm in always latching the signal, then remove all of the conditional statements on the latch. While the value of this signal will not always be "correct", you only need the signal to be correct when you look at the signal. Another example of re- moving unnecessary logic is in the inputs to the controller. For example, if you have an 8-bit instruction, but the controller only uses a subset of the bits, only latch the bits that are needed. The design complexity increases with the number of out- puts that you are controlling. In particular, in a state-machine design, if there are 10 outputs of the state machine, then the logic to set these 10 outputs must be included for every state. Synopsys will do as much gate-level optimization as it can, but eliminating outputs all together will usually result in a sav- ings. One way to do this is to encode outputs. This is normally done for register select lines and specifications, but can also be used to encode ALU operations and other things. The standard cell router will default to a certain number of rows. This can be easily changed to fit the aspect ratio re- quired. However, lager will always generate rows horizontally. You can get a short, wide layout by specifying only a few rows, but this is very inefficient. Lager will generate a more compact design if a greater number of rows are given. Therefore, a tall, skinny layout should be generated and then rotated to fit. Also, the outputs from the standard cell block can be assigned to a specific side. This does not seem to effect the overall area very much, but can greatly reduce the wiring overhead. Good luck...