I am new to FPGA programming and Verilog. I know how to draw the state machine of a code and design a datapath for it. I am currently working on such a project but I am facing a problem that I have never encountered before.
My design was working with a frequency of 65 MHz until this morning. I added a few more states, summarized below, this design has only one multiplier and adder. Both are 64bit signed and not pipelined. Both of them have 64bit 8×1 multiplexers on both inputs. The multiplexer assigns values in the always@(*) block. Block memory has 512 words with 64-bit wordsize.
s0: select a and b values in the multiplexer, then calculate a * b and write the result to the x register
s1: select c and d values in multiplexer, then calculate c + d and write the result to y register
s2: select x and y values in the multiplexer, then calculate x + y and write the result to the z register
s3: set the block memory's address field to z // this is the 2Port Ram I got from the IP Catalog, I did not check the read output option when I first added it, it will give output in the next state, let's call it "q"
s4: select q and f values in multiplexer, then calculate q * f and write the result to w register, w register is directly connected to block memory's data_b input
s5: set the block memory's address field z again and set write enable signal to 1
go back to s0 again, this is a for loop
In simulation, everything looks correct but, I have the problem below.
FMax and other values when the design is like this:
FMax: 3.98 MHz
Setup Summary
Clock: clk
Slack: -250.096
End Point TNS: -26732.101
Minimum Pulse Width Summary
- Clock: clk
- Slack: -3.166
- End Point TNS: -1840.877
FMax and other values when s4 and s5 states are deleted:
FMax: 67.99 MHz
Setup Summary
- Clock: clk
- Slack: -13.708
- End Point TNS: -8226.240
Minimum Pulse Width Summary
- Clock: clk
- Slack: -2.636
- End Point TNS: -1498.869
As far as I understand, the problem occurs when I connect the output of the block ram to the multiplexer at the input of the multiplier. Why is there such a drop in frequency even though even though I did each step in a seperate state? There are similar codes in other parts of my design except for writing to block memory, for example, I take 2 outputs from 2Port ram at the same time, multiply them in the next state and write them to a register and it does not cause such a drop. What should I do to fix this problem?
Here is the report for top failing parts:
I thought I distributed the tasks to the states well, I loaded a task to each state, but the critical path became unexpectedly long and reduced FMax a lot.