VHDL coding tips and tricks: VHDL: A synthesisable-friendly 'for' loop? Must Read!

Monday, March 8, 2010

VHDL: A synthesisable-friendly 'for' loop? Must Read!

 THIS ARTICLE WAS THOROUGHLY UPDATED ON 3rd MAR 2024!

     Engineers often encounter the need to implement intricate algorithms using VHDL, many of which involve for or while loops. While translating such algorithms into high-level languages might seem straightforward, those who have attempted it in VHDL understand its inherent complexity.

    If the code is intended solely for functional simulation, leveraging VHDL's for loops presents minimal challenge, as they closely resemble those found in other high-level languages. However, when the code needs to be synthesized for hardware, a significantly deeper level of consideration is warranted in the design process. As the adage goes, one must adopt a mindset of 'thinking hardware' rather than 'thinking software'.

    Let's delve into this concept through a series of VHDL codes...

Code 1:

LIBRARY ieee; USE ieee.std_logic_1164.ALL; use IEEE.NUMERIC_STD.ALL; ENTITY tb_for_loop IS END tb_for_loop; ARCHITECTURE behavior OF tb_for_loop IS signal clk : std_logic := '0'; signal output : integer := 0; BEGIN clock_gen: process begin clk <= '0'; wait for 100 ns; clk <= '1'; wait; end process; -- what we want to do here is to sum integers 0 to 3. -- so output should 0+1+2+3=6. -- Lets see what happens when we simulate the code. for_loop: process(clk) begin if (rising_edge(clk)) then for x in 0 to 3 loop output <= output + x; end loop; end if; end process; END;

    Anyone familiar with software programming would anticipate the output signal to be 6 after completing the for loop. However, during simulation, the output was observed to be 3.

What caused this discrepancy?

    output is defined as a signal within the entity. In VHDL, signals are updated only at the conclusion of the process. Consequently, regardless of the number of iterations, the output retained its initial value of 0. This initial value was then added to 3 (the value of x in the final iteration), resulting in the final outcome of 3.

    Let's attempt a different approach to achieve the expected functionality. In the second version, we substitute the for_loop process with the one provided below.

Code 2:


for_loop: process(clk)
    variable temp : integer := 0;
begin
    if (rising_edge(clk)) then
        for x in 0 to 3 loop
            temp := temp + x;
        end loop;
        output <= temp;
    end if;
end process;

    What distinguishes the code above is the utilization of variables. In contrast to signals, variables undergo immediate updates. Upon completing all calculations, the value stored in the variable is assigned to the output signal. Consequently, as anticipated, we obtain a value of 6 at the output.

A hidden issue:

    However, an issue arises here. While a simple for loop with four adders might not pose a problem, if the for loop entails numerous resource-intensive mathematical operations, synthesizing the design could prove challenging. Why? The objective is to attain the final result within one clock cycle, necessitating that all iterations in the for loop occur within this timeframe. Consequently, resources are duplicated and concatenated. Instead of utilizing a single adder four times, the tool seeks to employ four adders interconnected. Clearly, this approach is impractical for resource-intensive operators.

So what can be done now?

    We need to ensure that each iteration occurs in a distinct clock cycle, allowing one adder to be reused over 4 clock cycles. This optimization helps conserve valuable resources. How can we achieve this? Let's examine the following code.

Code 3:


LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

ENTITY tb_for_loop IS
END tb_for_loop;

ARCHITECTURE behavior OF tb_for_loop IS

signal clk : std_logic := '0';
signal output : integer := 0;
signal x : integer := 0;

BEGIN

clock_gen: process
begin
    clk <= '0';	
    wait for 100 ns;
    clk <= '1';	
    wait for 100 ns;
end process;

-- process for summing integers from 0 to 3.
for_loop: process(clk)
begin
    if (rising_edge(clk)) then -- at every positive edge of clk.
        if(x <= 3) then -- for values less than or equal to 3.
            output <= output + x; -- add current value of output to current value of x.
            x <= x+1; -- increment x.
        end if;
    end if;
end process;
END;

    The third version of the VHDL entity, upon simulation, accurately produced the output of 6 after 4 clock cycles. We can confidently assert that it will utilize only one adder during synthesis.

Simulation waveform for synthesisable for loop implemented using if statement


    This concept can also be extended to nested for loops. In such instances, we simply employ nested if statements to increment the indices effectively.


5 comments:

  1. Very informative and helpful! I'm doing FPGA programming and was frustrated about the loop.

    Thanks!

    ReplyDelete
  2. Can u pls tell me how to do it for three nested loops ?

    ReplyDelete
  3. Hi mate, thanks for the tutorial! it was very insightful. this solution will make inferred latches thought right? for a FSM there is no way to avoid it?

    ReplyDelete