The Anatomy of an FPGA
In my last post I discussed which Field Programmable Gate Arrays (FPGAs) are best suited for use in projects where you want to combine them with a Microcontroller in hobbyist and maker projects. Some of the feedback that this has generated asked if it might be possible to give an overview of what an FPGA contains so that a beginner coming to the subject can have a basic understanding of what they have to offer. So before I begin the series on the development of an FPGA shield for an Arduino I thought I would post a blog that aims to provide an introduction to this subject for someone that is a relative beginner to FPGA programming.
If you read my previous post them you will have noticed that although I compared FGPAs from three different manufacturers they shared a number of common features. These elements are the fundamental building blocks of all FPGA technologies and I’ll start with the two general purpose elements the Look-Up Tables (LUTs) and the Registers.
LUTs can be considered to be 1 bit wide memories that are initialized as ROMs during the configuration of the FPGA. These typically have 4 inputs which can be thought of as address lines (in the case of the Spartan-6 FPGA family from Xilinx 6 inputs). This arrangement means that any logical combination of the four input bits can be mapped into a single bit result. LUTs can therefore model all of the common logic functions, or any combination thereof, across the input signals to define a single bit output.
This concept can best be illustrated by the use of a truth-table with a simple example. If we start with a 4-input AND gate then this will map into the LUT as follows.
Of course you do not need to use all four inputs into the LUT as it is possible to have 1, 2, 3 or 4 inputs logic combinations; although a 1-input would be rather a waste. The following truth table illustrates a 3-input XOR gate.
These are simple examples to illustrate the concept but the same principal applies to all LUT applications within an FPGA. It should be clear now that many of the logic functions offered by the 74-series devices can all be implemented within the LUT structure and this is why small Complex Programmable Logic Devices (CPLDs) and small FPGAs are popularly used to implement ‘glue logic’ in larger designs. As mentioned the previous examples were relatively simple but a LUT can model multiple combinations of signals like E = AC’ + AB’ + BCD’ + AD’ , where the ‘ character indicates the NOT function, can implemented within a single LUT.
A feature of FPGAs is that the LUT outputs can be fed into the inputs of other LUTs, which means that you can implement logic functions with more than 4 input signals. An example of where is might occur would be in a compare operation where you want to trigger and event when two 8-bit signals are the same; in VHDL this might be statements like…
if (A = B) then
statement a ;
statement b ;
end if ;
Where A and B are 8-bit values.
We can evaluate this statement using LUTs by feeding pairs of bits from each value into four separate LUTs and within each LUT we generate a 1 if the pair of bits are the same.
We then take the four outputs from these LUTs and feed them into a fifth LUT that implements a 4-input AND function. We now have two levels of cascaded LUTs that will produce a value of ‘1’ if the two input signals A and B are the same and a ‘0’ for all other values.
If you are wondering how you derive the values that are loaded into the LUTs during configuration then you don’t need to worry about this. This task is performed by your development tools in a process called synthesis. When you have completed your High Level Design (HDL) design you is analysed, synthesized and elaborated by your tool set. During these processes your HDL is decomposed into a series of logic equations which are then subsequently partitioned into the LUTs for you.
The LUTs are one of the essential implementation elements of you design so the more LUTs you have the more complex your design can be. LUTs are a common metric used by the FGPA vendors to allow you to assess the capabilities of their devices.
It is useful to have a top level understanding of the synthesis process as the way you write your HDL impacts on the number, and more importantly the depth, of your LUT chains. The more LUTs your logic needs to pass through in order to get to the answer the slower your overall design will be. When designers look to make their FGPAs run faster they frequently refer to ‘pipe-lining’ which is a technique of breaking down these long LUT chains into smaller chains that can be resolved within your clock period. These intermediate values are then stored in single bit registers before being fed into the next chain in a synchronous designing approach.
Before we look at the registers in more detail I want to highlight one common problem LUTs can cause in a design that frequently catch out the beginners using FPGAs (and experienced designers too).
The output from a LUT is asynchronous so when the input changes there will be a short period when the output transitions to the new value. If more than one input signal changes then minor variations in input timing may cause the output to generate the wrong signal for a very short period of time before settling to the correct value. LUTs are very fast at generating their output so these errors can be very short in length and are commonly called glitches. These glitches can cause havoc in some designs so you should be aware that they can arise if your design is asynchronous. Glitches become increasingly likely when you have cascaded LUTs as these introduce further delays as the signals propagate through the routing fabric of the FPGA from one LUT to the next; increasing the possibility of significant signal skewing. If you are familiar with combinatorial logic designs in your projects then you are probably already very familiar with these types of problems.
One reason that many designers fail to take account of the possibility of glitches within their CPLD and FPGA design is in part due to the toolchain that we use in developing these devices. An essential part of the design process is to simulate a design and every FPGA vendor provides a tool within its tool set to allow you to do this. These simulations can be run in two ways the first, and most common, is as a functional simulation while the second is a gate level simulation.
This latter type of simulation extracts the timing information from your design once it has been placed and routed within the FPGA. As it is based upon the actual FPGA implementation it can simulate the delays associated with the LUTs and routing paths and so will show you glitches within the simulation. These types of simulation however take far longer to run and require much more processing power; so they are not commonly used.
With a functional simulation there is not timing information within the model unless you specifically add it in your HDL. These types of simulations are faster and use less processing power so they are the preferred method during the design phases of an FPGA but they are less likely to catch glitches that may occur on the real hardware.
These glitches can be eliminated from your design by using a synchronous design technique, which bring us conveniently to the next main FPGA element – registers.
The registers are another essential element of an FPGA fabric and these are used to store the ‘state’ of FPGA at a selected time point defined by the clock driving the register. These are implemented as D-Type flip-flops and can be considered as a memory cell, a zero-order hold or a delay line, they can also normally be forced to be set or reset.
Registers allow you to create synchronous designs that perform operations at regular intervals. If you have no experience of what this means in an FPGA then this can be considered to be similar to the operation of a microcontroller where your software instructions are executed every N clock cycles.
You typically get an FPGA register associated with each LUT in a device although some of the Lattice Semiconductor devices have 6 registers for every 8 LUTs.
Registers are the essential component for building all kinds of logic constructs, in particular counters which are very common in FGPA designs. Because of the prevalence of counters in HDL the FPGA vendors include special routing resources within their devices for the connection of a carry signal between registers to build faster counters. As with LUTs you don’t need to concern yourself too much with the details of these interconnects as they are handled by your tool chain when they synthesis, elaborate and place and route your design.
The use of a register can eliminate the effect of glitches from your LUTs as they act as a zero-order hold that samples the output at a particular point in time, normally the rising edge of a clock signal. If you compare the two VHDL section the first asynchronous code is potentially glitch prone while the second synchronous code with the register added will not glitch the output.
Asynchronous code snippet
E <= A or B or C or D ;
Synchronous code snippet
if (rising_edge(clk)) then
E <= A or B or C or D ;
end if ;
Registers do have their own set of parameters that you need to consider in your designs that effect the time you have available for a signal to propagate between registers. These are the set-up and hold times which limit the number of LUTs you can have between your registers or limit the maximum clock rate of your design. If one of your inputs changes during the set-up and hold time around the clock edge then this may cause to output to ring in an unstable fashion and it may not settle on the expected value!
The Other Bits!
The LUTs and Registers are the fundamental building blocks of an FPGA but if you look at the specification of a device you will also find a whole range of additional elements in addition to these two. These have been added by the FPGA suppliers to improve the efficiency of the devices when implementing different design features. I will provide a quick overview of these main elements along with a brief explanation of their purpose – although these are fairly self-evident.
When we were discussing the Registers I indicated that they are effectively a 1-bit memory but they are a very inefficient way of implementing memory within a design. To overcome this FPGA manufacturers implement discrete memory within a design that are much more efficient. These are normally either a small number large memory block of between 9kbits and 18kbit or a large number smaller distributed memory blocks of a between 3-500bits and sometimes these smaller block can be derived by consuming some of the LUTs (which are memories). Each vendor has their own method of creating the distributed memory but they all use larger memory block embedded within the FPGA fabric. These dedicated memory blocks provide two benefit; the first is they are very efficient in terms of power and resource and the second is they are very fast.
Memories can normally be configured with a number of ports and can model either RAM or ROM, they are also used to implement FIFOs within a design. With memories you can either write your HDL to infer a memory and allow the synthesis tools to resolve the implementation or you can use IP provided by the FPGA manufacturers to specifically instantiate a memory through an interactive software Wizard.
If you have a specific project in mind then it is worthwhile thinking about the amount of memory you are likely to need when you are selecting the appropriate FPGA to use.
Phase Locked Loops
All of the registers within an FPGA require a clock signal and the FGPAs typically include a PLL to either condition the clock or to allow you to generate different clock frequencies for use within your design. This can be if you have your project runs at a master frequency and you want your FPGA to run at a multiple of that rate to allow it a number of clock cycles to complete a sequence of operations for every system clock cycle.
I would warn the beginner against having more than one clock operating within your FPGA until you have become more familiar with FGPAs. Clock domain crossing of signals is topic for the experienced designer and you really need to know what you are doing when you have multiple clock domains within a device. If you do need to run things at different rates then try and do this by using a clock enable signal to reduce a master clock to effectively an fclk/N rate.
PLLs are typically instantiated within a design using a software Wizard where you set the various clock rates required. It is also worth checking that your input clock meet the minimum input clock rate of the PLL you are using as these do have constraints that will impact on you design choices.
FPGAs are frequently used to implement Digital Signal Processing (DSP) applications that require the use of a multiplier. To increase the efficiency of the devices the vendors again provide you with hardware implemented multipliers within the FPGA fabric. Some of these are implemented in special DSP blocks that allow easy implementation of Multiply and Accumulates (MACs) which are an important element of digital filter implementations. As with memories these can either be inferred within your HDL or they can be deliberately instantiated using a Wizard. These multipliers will typically allow the multiplication of two 16-bit values (either signed or unsigned). If you are likely to be designing digital filters then you will want an FPGA with at least some multipliers in the fabric. You should also check how many clock cycles they take to produce a result as some of them are pipelined and take several cycles to produce the answer.
Beyond these simple elements you will then find a plethora of other Hard IP within an FPGA that have been added by the manufacturers to simplify the implementation of different functions and interfaces. I have detailed a few below but these are advanced subjects and if you are looking at these in your projects then you probably feel comfortable with FPGAs and can refer to the supplier’s documentation.
- PCIe cores
- Memory Controllers
- ARM cores
The last subject I would like to touch on in this post is an explanation of the IO within an FPGA and the signalling standard that can be applied. This is particularly of interest if you are thinking of combining an FPGA with an Arduino board as the Uno and Leonardo are 5V devices which can cause interfacing issues with FPGAs.
The IO on an FPGA is typically configured into Banks which each have their own power supply level. In most of the devices that you are likely to be using for your projects these will be operating at 3.3V and not 5V as most devices simply do not support 5V as an IO voltage. This means that you need to be careful when you are connecting a microcontroller to your FPGA as it is likely that you will need to use some form of level translation to protect your FPGA inputs.
The inputs to most FPGA IO banks have a reversed diode to the banks VCC supply and if the input exceeds the threshold for forward conduction this will inject current into the FPGA potentially damaging the device. There are a small number of FPGAs that when operating with 3.3V on their IO banks are 5V tolerant but always check the data sheet for the device as this a horrible thing to get wrong.
With 3.3V IO you can drive the microcontroller with LVTTL signalling as the ‘High’ is still >2.4V and a low is still < 0.8V. You can also use an open drain signal but this would need to be pulled to 3.3V and not 5V.
I hope this post gives you an insight into the elements of an FPGA so that you begin to understand what elements you should be looking for in a device for your first, or next, FPGA project. In my next post I am going to start the development of an FPGA shield for the Leonardo and in this I will explain why I selected the FPGA I plan to use.