VHDL coding tips and tricks: July 2011

Friday, July 29, 2011

Some tips on reducing power consumption in Xilinx FPGA's

   Power estimation and power reduction is an important part of any design. Especially in wireless devices, the reduction in power is a very important factor. In this article I will note down some points, on how to reduce the power consumption for xilinx based designs.

1)BRAM Enable signal:
  Every BRAM has an enable signal which by default is high always. Most of the HDL coders never care to disable it even when the BRAM is not used. But when this enable signal is ON BRAM consumes a lot of power. It doesn't matter whether you change the address or write the data. So always have a control logic which will control the bram enable signal.

2)Low power option in coregen for BRAM's:
   You can create a BRAM entity file using Xilinx's Core generator software. There are several options available in coregen to help you achieve what you want. For low power designs select the "Low power" option in coregen.

3)Decide on LUT or BRAM:
   Suppose you want instantiate a memory in your design. Rather than going straight at BRAM or LUT, give it some thought. Xilinx says that for small memory blocks( less than 4 Kbits) LUT consumes less power than BRAM. Similarly for large memory blocks( more than 4 Kbits) BRAM uses less power for its operation than LUT-RAM. So from design to design, switch to LUT-RAM or BRAM depending on the size of memory block.

4)Global reset:
   All FPGA devices have an internal global reset path. When the device is switched OFF and then ON, all the flip flops and memories are reset to their initial state. But when we define one more reset signal in the HDL code, Xilinx creates a second reset. This second reset is relatively low and hence not recommended. But if you still want to use them make sure it is synchronous, so that the number of the control signals in your design is low.

5)Initialization of Registers:
   It is recommended that we initialize the registers in our design. Normally we do this for safe simulation purposes. But during synthesis, these initialization values will be connected to the INIT pin of the flip flops. Remember that this will work only for bits and bit vectors. It will not work for integer or natural types.

6)DSP slice Utilization:
   Depending on how complex your FPGA is it will have some number of DSP slices. These components are highly efficient with low power consumption and high speed. All the DSP blocks in Xilinx FPGA are synchronous. So when we define asynchronous behavior for these operations, XST can't implement it using DSP slices. This will decrease the efficiency of your design.



Note:- I realized these tips after watching a Xilinx tutorial video recently. You can too watch it here.


Delay in VHDL without using a 'wait for' statement!

   Introducing a delay in VHDL is pretty easy with a wait for statement. But it has the disadvantage that it is not synthesisable. Most of the practical designs, so require another way to introduce a delay.
   In this article I will use a counter and state machine to introduce the delay. We have an input signal, and we want to assign it to the output only after (say) 100 clock cycles. With this objective in my mind , first I have drawn a state machine.

There are two states in the state machine -  idle and  delay_c.
when the system is in idle state it waits for a valid input bit at the port data_in. whenever the data is valid, valid_data will go high. Upon receiving  a valid data, the system moves to delay_c state where a counter, c is incremented every clock cycle till it reaches the maximum delay value(Here it is 100). Once the max count is reached system goes back to idle state and the input data will be assigned to output data. This process goes on and on.

I have written the VHDL codes and testbench code for the above state machine. See the below simulated result to see how it works:

    The VHDL code is given below. It is well commented, so I wont be explaining it any further.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;

entity delay is
port(   Clk : in std_logic;
        valid_data : in std_logic; -- goes high when the input is valid.
        data_in : in std_logic; -- the data input
        data_out : out std_logic --the delayed input data.
        );
end delay;

architecture Behaviora of delay is

signal c : integer := 0;
constant d : integer := 100; --number of clock cycles by which input should be delayed.
signal data_temp : std_logic := '0';
type state_type is (idle,delay_c); --defintion of state machine type
signal next_s : state_type; --declare the state machine signal.

begin

process(Clk)
begin
    if(rising_edge(Clk)) then
        case next_s is
            when idle =>
                if(valid_data= '1') then
                    next_s <= delay_c;
                    data_temp <= data_in; --register the input data.
                    c <= 1;
                end if;
            when delay_c =>
                if(c = d) then
                    c <= 1; --reset the count
                    data_out <= data_temp; --assign the output
                    next_s <= idle; --go back to idle state and wait for another valid data.
                else
                    c <= c + 1;
                end if;
            when others =>
                NULL;
        end case;
    end if;
end process;   

   
end Behaviora;

The following testbench code was used to test the code.


LIBRARY ieee;
USE ieee.std_logic_1164.ALL;

ENTITY tb IS
END tb;

ARCHITECTURE behavior OF tb IS

   signal Clk : std_logic := '0';
   signal valid_data : std_logic := '0';
   signal data_in,data_out : std_logic := '0';
   constant Clk_period : time := 5 ns;

BEGIN

    -- Instantiate the Unit Under Test (UUT)
   uut: entity work.delay PORT MAP (
          Clk => Clk,
          valid_data => valid_data,
          data_in => data_in,
          data_out => data_out
        );

   -- Clock process definitions
   Clk_process :process
   begin
        Clk <= '0';
        wait for Clk_period/2;
        Clk <= '1';
        wait for Clk_period/2;
   end process;
   -- Stimulus process
   stim_proc: process
   begin       
      wait for 100 ns; 
        valid_data <= '1';
        data_in <= '1';
      wait;
   end process;

END;

The above design was successfully synthesized in Xilinx ISE software. Note that there are lot of different situations in which a delay can be introduced. But understanding the above concept well, will help you in most of the cases.

There is one limitation to this design. If the input value keep changing before the delay count is reached, then it will only take the first valid value. If you want a delay pipeline then you have to implement a FIFO whose size will depend on the amount of delay. In our case we will need  a FIFO size  of 100 bits. I will try to cover this in another article.  

Some of these articles may be helpful:
Delay generator using a counter.
Clock frequency converter in VHDL.

Tuesday, July 5, 2011

Simple vending machine using state machines in VHDL

   A state machine, is a model of behavior composed of a finite number of states, transitions between those states, and actions.It is like a "flow graph" where we can see how the logic runs when certain conditions are met. state machines are used to solve complicated problems by breaking them into many simple steps. 
   There are many articles available in the web regarding this topic. I found sequential circuit design pretty useful. If you are new to state machines I suggest you read that article before proceeding here. 
   In the above said article they have explained a simple vending machine problem and how to create a state machine diagram to solve it. I am not going much into the theory behind it. I have written the VHDL code for the Mealy model they have given. Refer to Fig 11.10 for this.


The VHDL code is given below:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity vend_mach is
port(   Clk : in std_logic;
        x,y : out std_logic;
        i,j : in std_logic
    );     
end vend_mach;

architecture Behavioral of vend_mach is

--type of state machine and signal declaration.
type state_type is (a,b,c);
signal next_s : state_type;

begin

process(Clk)
begin
    if(rising_edge(Clk)) then
        case next_s is
            when a =>
                if(i='0' and j='0') then
                    x <= '0';
                    y <= '0';
                    next_s <= a;
                elsif(i='1' and j='0') then
                    x <= '0';
                    y <= '0';
                    next_s <= b;
                elsif(i='1' and j='1') then
                    x <= '0';
                    y <= '0';
                    next_s <= c;
                end if;
            when b =>
                if(i='0' and j='0') then
                    x <= '0';
                    y <= '0';
                    next_s <= b;
                elsif(i='1' and j='0') then
                    x <= '0';
                    y <= '0';
                    next_s <= c;
                elsif(i='1' and j='1') then
                    x <= '1';
                    y <= '0';
                    next_s <= a;
                end if;
            when c =>
                if(i='0' and j='0') then
                    x <= '0';
                    y <= '0';
                    next_s <= c;
                elsif(i='1' and j='0') then
                    x <= '1';
                    y <= '0';
                    next_s <= a;
                elsif(i='1' and j='1') then
                    x <= '1';
                    y <= '1';
                    next_s <= a;
                end if;        
        end case;      
    end if;
end process;
   
end Behavioral;

The testbench code tests the functionality of the code:

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;

ENTITY tb IS
END tb;

ARCHITECTURE behavior OF tb IS
 
signal Clk,x,y,i,j : std_logic := '0';
constant Clk_period : time := 10 ns;

BEGIN

    -- Instantiate the Unit Under Test (UUT)
   uut: entity work.vend_mach PORT MAP (
        Clk => Clk,
        x => x,
        y => y,
        i => i,
        j => j
        );

   -- Clock process definitions
   Clk_process :process
   begin
        Clk <= '0';
        wait for Clk_period/2;
        Clk <= '1';
        wait for Clk_period/2;
   end process;
   
   -- Stimulus process(applying inputs 'i' and 'j').
   stim_proc: process
   begin      
    wait for Clk_period*2;
    i <= '0';j <= '0'; wait for Clk_period*2;
    i <= '0';j <= '1'; wait for Clk_period*1;
    i <= '1';j <= '0'; wait for Clk_period*1;
    i <= '1';j <= '1'; wait for Clk_period*1;
         
    i <= '0';j <= '0'; wait for Clk_period*1;
    i <= '0';j <= '1'; wait for Clk_period*2;
    i <= '1';j <= '1'; wait for Clk_period*1;
    i <= '1';j <= '0'; wait for Clk_period*1;
         
    i <= '0';j <= '0'; wait for Clk_period*1;
    i <= '0';j <= '1'; wait for Clk_period*1;
    i <= '1';j <= '1'; wait for Clk_period*1;
    i <= '1';j <= '1'; wait for Clk_period*1;
         
    i <= '1';j <= '0'; wait for Clk_period*2;
    i <= '1';j <= '1'; wait for Clk_period*1;
    wait;
   end process;

END;

The simulation waveform is attached below. Carefully go through the various signal values to see how the flow works.

The code is synthesisable. To get a clear understanding of the concepts take another problem on state machines from web and write the vhdl code for it using state machines.

I have written some other articles related to state machines. You can browse through them here.