A Simple Microcontroller Coprocessing Encryption Engine (Part 3)

Implementation

In the last posting I covered how we are going to use the Encryption Engine to secure our network. So within this posting I want to start looking at the implementation of the Encryption Engine within the CPLD.

Before I start with the design details I want to explain the toolchain I will be using for this design. In an earlier post I indicated that I was going to target my initial design on a Lattice Semiconductor MachXO2 device. So I will be using the Lattice Diamond Software version 3.7 for the latter stages of the design. You can get a copy of this software from www.latticesemi.com and then you simply request a free license.

Bundled with the free software is the Aldec Active-HDL Lattice Edition II Mixed Language simulation tool. My version of this is 10.2 but yours may be different. So this is where I will be doing the majority of the design work. Once the simulations are all working correctly I will migrate the files over into the Lattice Diamond Software for synthesis and place and routing.

This isn’t going to be a lesson in Active-HDL as I’m more interested in exploring the design of the Encryption Engine rather that the creation process. So if you are not familiar with the tools then you can access the guide from the start page of Lattice Diamond. I will provide you with some screen shots to hopefully clarify any points as I go through the design. And I will also be including some screen shots of the simulation outputs.

Lattice Diamond Project Setup

I have initially created a Lattice Diamond Project that I will use to encompass the design. I have set the following parameters in this project:

Lattice Diamond Project Summary
Lattice Diamond Project Summary

I have chosen to use the MAchXO2-1200-HC in the 100 pin TQFP for two reasons. The first is that this will be the part I am planning to use on the Arduino Shield I’m going to design at a later stage. While the second reason is that I just happen to have a board I previously designed with this device fitted. Conveniently this board also has an SPI interface to the CPDL too.

From within this project I launched Active-HD and created a new workspace and a new VHDL design targeted at Lattice MachXO2 technology.

The Algorithm Implementation of MX encryption function

So, I now want to look at the XXTEA algorithm. I want to look at how I am going to break it down into individual processing steps. Remember from my previous posts I explained that we only need to implement the encryption part of the algorithm.

I have repeated this below so you don’t need to refer back to it.

Encryption Algorithm
Encryption Algorithm

We can simplify our implementation of this algorithm by using the fact that we are only encoding an eight-byte counter value. This means that the value of n is always 2 and rounds is always 32. So the inner loop, based using the index p, only operates for p = 0.  So from looking through the structure of the loop we can now see that the first time we calculate MX, y = v[1] and z = v[1] and the next time we calculate MX y = v[0] and z = v[0]. This is repeated for each round of the outer ‘do’ loop.

Therefore we can simplify our application to dispense with the variable y and simply use z. This approach will save us CPLD resources by not requiring registers to store y.

MX Function
MX Function

The secret key we use for encryption is 128-bits in length and this is broken down into four 32-bit values. So as not to use up too many registers within the CPDL holding fixed values I am going to store the secret key in volatile memory. This will be a small array of 32-bit values which I will call key for consistency with the algorithm.

The array index value, (p&3)^e, is then based upon the value of p AND’ed  with the value 3. Due to our fixed value of n being 2 we already know that p only ever has a value of 0 so (p&3) will always equal 0. So this value is then XOR’ed with e to get the key index to use; and zero XOR’ed with e always give us the value of e.

Looking in more detail at the value of e we can see that it is equal to the value of sum right shifter two bits and AND’ed with 3. This simply means that e is equal to the third and fourth bits sum, i.e. e = sum(3 downto 2).

The memory array storing the key value will be implemented without a registered output to keep the number of clock cycles required to calculate the value of MX to a minimum.

So with the information we have discussed so far we can create a VHDL process that will implement this as a process. I have added some ‘extra’ elements within the process that I will explain below but here is my first pass:

VHDL Implementation
VHDL Implementation

The value of MX will only be calculated when the signal en_encrypt (enable encryption) is asserted. Also the indexed value of key has been replaced with the signal key, which will be the output from the key memory.

Pipelining?

This implementation uses a large number of signals to generate the value of MX. So this means there will be several layers of LUTs (Look Up Tables) between the source registers and the MX registers. These layers will increase the signal propagation delays and reduce the maximum clock rate of the system. As we are running at relatively slow data rates this will probably not be an issue but if we wanted to run faster we could pipeline the design for a faster clock speed but taking more clock cycles.

You may wonder why we would want run at a higher clock rate but take more clock cycles? Well this may arise when the encoder is only a part of the CPLD and you have other functions which require a faster clock rate but you still want to timing close the design.

In my next posting I will look at the heart of the encryption algorithm and explain how we can combine MX into the Encryption Engine.