A Simple Microcontroller Coprocessing Encryption Engine (Part 5)

Defining the Encryption Entity

In this post I am going to combine the two previous elements discussed to create a complete encryption engine entity. I will then explain how I have created a VHDL test-bench to simulate the design for validation purposes. But before that a little explanation is in order…

A confession!

Before I go into this design in more detail I am going to have to confess to a degree of confusion on my part. So I originally listed the GitHub version of the C implementation in an earlier post as the source of the test vectors I would be using. It quickly became apparent once I started coding the VHDL that my results were inconsistent with these vectors. A quick online search revealed that I was not alone with this dilemma. In summary the result you get from the C code depend upon your compiler and also the endianness you apply to the data supplied. So with the first test vectors of all zeroes for both data and key values you can get a result which has the right bytes in the wrong order! This is fairly obvious considering that both the key and counter are identical for either endianness.

A C implementation

I wanted to be able to dig into the interim results of the processing loops to validate the VHDL coding. So I needed to get a C implementation of the code to run on my PC using my Windows compiler. Therefore I have provided this C version along with how I interpreted the counter and key values.

#include <stdio.h>
#include <stdlib.h>
#define MX (((z >> 5) ^ (z << 2 )) + ((z >> 3) ^ (z << 4))) ^ ((sum ^ z) + (k[e] ^ z))

void xxtea_enc(unsigned long* v, unsigned long* k)
{
  unsigned long z, sum, e, DELTA=0x9e3779b9;
  unsigned int q;
  q = 32; //6+52/2
  sum = 0;
  z = v[1];
  do {
    sum += DELTA;
    e = (sum >> 2) & 3;
    z = v[0] += MX;
    z = v[1] += MX;
    printf("%2d,%08lX,%08lX:\n",q,v[0],v[1]);
  } while (--q) ;
}

void xxtea_dec(unsigned long* v, unsigned long* k)
{
  unsigned long z,sum, e, DELTA=0x9e3779b9;
  unsigned int q;
  q = 32; //6+52/4
  sum = q * DELTA;
  while (sum != 0)
  {
    e = (sum >> 2) & 3;
    z = v[0];
    v[1] -= MX;
    z = v[1];
    v[0] -= MX;
    sum -= DELTA;
  }
}

int main()
{
  unsigned long v[2], k[4];
//  v[0] = 0x00000000;
//  v[1] = 0x00000000;
//  v[0] = 0xffffffff;
//  v[1] = 0xffffffff;
  v[0] = 0xfffefcf8;
  v[1] = 0xf0e0c080;
//  k[0] = 0;
//  k[1] = 0;
//  k[2] = 0;
//  k[3] = 0;
  k[0] = 0x01020408;
  k[1] = 0x10204080;
  k[2] = 0xfffefcf8;
  k[3] = 0xf0e0c080;
//  k[0] = 0x9e3779b9 ;
//  k[1] = 0x9b9773e9 ;
//  k[2] = 0xb979379e ;
//  k[3] = 0x6b695156 ;

  xxtea_enc(v,k);
  printf("Encode => %08lX,%08lX\n",v[0],v[1]);
  xxtea_dec(v,k);
  printf("Decode => %08lX,%08lX\n",v[0],v[1]);
  return 0 ;
}

I compiled this code on my PC running Windows 10 using the Code::Blocks IDE 16.01 which uses the GNU GCC Compiler. So with the various combinations of key and counter generate a different set of results as shown below. I have validates these against another online encoding I found so I am happy that with Windows and my compiler I am generating a consistent set of results.

  v[0] = 0x00000000;
  v[1] = 0x00000000;
  k[0] = 0x00000000;
  k[1] = 0x00000000;
  k[2] = 0x00000000;
  k[3] = 0x00000000;
Encode => 053704AB,575D8C80

  v[0] = 0x00000000;
  v[1] = 0x00000000;
  k[0] = 0x01020408;
  k[1] = 0x10204080;
  k[2] = 0xfffefcf8;
  k[3] = 0xf0e0c080;
Encode => 2FF05E3A,48DCA976

  v[0] = 0xffffffff;
  v[1] = 0xffffffff;
  k[0] = 0x9e3779b9 ;
  k[1] = 0x9b9773e9 ;
  k[2] = 0xb979379e ;
  k[3] = 0x6b695156 ;
Encode => C01402E9,1BF08FF6

  v[0] = 0xfffefcf8;
  v[1] = 0xf0e0c080;
  k[0] = 0x01020408;
  k[1] = 0x10204080;
  k[2] = 0xfffefcf8;
  k[3] = 0xf0e0c080;
Encode => 706126E2,6F21599F

These revised test vectors are the set I will be using to validate the performance of my VHDL. Also I have enclosed a full copy of the C programme below so you can validate the design on you own system.

C File PDF

main.c

A VHDL implementation

With that revised explanation out of the way I will now show you the design of the VHDL that will generate the same results.

I am going to use the first iteration of the design to prove the VHDL implementation at a functional level. Once I have got this version working correctly I will then look at targeting it to run in the selected FPGA. As the first version will only operate in the simulation environment it is not going to be resource constrained.

The Entity

Building upon the previous work the entity declaration of the component will be as follows:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all ;
entity xxtea_encrypt is
  port
  (
    --------------------------------------------------------------
    -- Inputs
    --------------------------------------------------------------
    clk                       : in  std_logic ;
    en_encrypt                : in  std_logic ;
    encryption_key            : in  unsigned(127 downto  0) ;
    counter                   : in  unsigned( 63 downto  0) ;
    --------------------------------------------------------------
    -- Outputs
    --------------------------------------------------------------
    cypher                    : out unsigned( 63 downto  0)
  ) ;
end entity xxtea_encrypt ;

There are only four inputs to the entity and a single output.

clk – this is the master clock that controls the overall operation of the entity and defines the overall encryption rate.

en_encrypt – this is the control signal which when asserted causes the component to encrypt the counter value using the encryption_key.

encryption_key – this is the 128-bit secret key value that is used in the encryption process.

counter – this is the 64-bit counter value that will be encrypted to generate the cypher text.

cypher – this is the 64-bit result of the encryption that is used to encode the plaintext within a system.

As the arithmetic within the algorithm is 32-bit unsigned the three vector values are declared the same type and the library ieee.numeric_std is include in the file to support the arithmetic functions.

The Architecture

Once the entity declaration is complete I have defines the architecture that is implemented within this entity. So before I define the individual processes I have declared the constants and signals used within the entity. Also I have defined an alias for the e signal which will make it simpler to select the appropriate key segment use at various stages for calculating MX.

------------------------------------------------------------------
architecture rtl of xxtea_encrypt is
------------------------------------------------------------------
-- CONSTANTS
------------------------------------------------------------------
constant DELTA      : unsigned(31 downto  0) := x"9e3779b9" ;
------------------------------------------------------------------
-- SIGNALS
------------------------------------------------------------------
signal MX           : unsigned(31 downto  0) := (others => '0') ;
signal key          : unsigned(31 downto  0) := (others => '0') ;
signal q            : integer range 0 to 32  := 0 ;
signal sum          : unsigned(31 downto  0) := (others => '0') ;
signal v0           : unsigned(31 downto  0) := (others => '0') ;
signal v1           : unsigned(31 downto  0) := (others => '0') ;
signal z            : unsigned(31 downto  0) := (others => '0') ;
signal clk_count    : integer range 0 to 7   := 0 ;
------------------------------------------------------------------
-- ALIAS
------------------------------------------------------------------
alias e             : unsigned( 1 downto  0) is sum(3 downto  2) ;
------------------------------------------------------------------
begin

Processes

I have split the entity into four separate processes that implement the various stages. In the first one I define a process for selection the key segment based upon the value of e. I want this to be a simple multiplexer that cuts the 128-bit key into four 32-bit segments and I have implemented this as an asynchronous process. As the encryption key is fixed this means that a change in e has a full clock period to settle before the value is used.

In the real system this will be less than a clock period to account for set-up and hold timing within the FGPA but at this stage I’m only interested in the simulated functional operation.

------------------------------------------------------------------
-- Within the encryption process we use different 32-bit slices of
-- the 128-bit encryption key. The selection of the appropriate 
-- key slice is based upon the value of e for the unique case of 
-- only encoding a 64-bit counter value.
------------------------------------------------------------------
key_selector : process (encryption_key, e) is
begin
  case (e) is
    when b"11" =>
      key               <= encryption_key( 31 downto  0) ;
    when b"10" =>
      key               <= encryption_key( 63 downto 32) ;
    when b"01" =>
      key               <= encryption_key( 95 downto 64) ;
    when others =>
       key              <= encryption_key(127 downto 96) ;
  end case ;
end process key_selector ;

The next two processes are the MX_calculator and encryption_loop which I have previously covered. So the final process is the output of the cypher text once the encryption is complete. I have chosen to implement this as a clocked process to ensure that none of the intermediate values of the calculation are made visible at the outputs while the calculation is being made. This will help improve the overall security of the system as visibility of these values would compromise the encryption.

------------------------------------------------------------------
-- Once the encryption is complete then output the cypher value.
------------------------------------------------------------------
cypher_output : process (clk) is
begin
  if rising_edge(clk) then
    if (q = 32) then
      cypher            <= v0 & v1 ;
    end if ;
  end if ;
end process cypher_output ;
------------------------------------------------------------------
end architecture rtl ;

I have embedded the full file in pdf format below so you can download it and take a look at the full thing. With this now completed I need to test it by creating a test-bench that will load the test vectors.

VHDL File PDF

xxtea_encrypt.vhd

The testbench

Having created the entity I now want to wrap this up in a testbench that I can use to simulate the design. I’m going to take a couple of shortcuts with the testbench so it will not be self-testing. This means that I’ll validate the design by manually checking the outputs in the simulation waveform window. I’ll do this because, at this point, I’m still exploring the overall functionality and I’ll only convert the test bench to be self-testing with a pass/fail output once I’ve a complete system solution.

I’ve created the shell of the testbench using the generation tool in Active-HDL and I’ve edited this to provide the necessary drivers.

As you would expect from a testbench the top entity declaration has no ports. So the component declaration defines the xxtea_encryption entity with its associated ports as discussed above. I’ve defined a constant called CLOCK_PERIOD to allow me to create a clock signal for the whole simulation running at 10MHz. As this is only a functional simulation I don’t need to concern myself with the actual timing characteristics of the FPGA.

There are three processes in the testbench to create the stimulus for the unit under test. The first is a simple clock generator with a 50% mark-space ratio clock derived from the constant CLOCK_PERIOD.

------------------------------------------------------------------
-- Create the system clock running at 10MHz.
------------------------------------------------------------------
system_clk : process is
begin
  clk                         <= '0' ;
  wait for CLK_PERIOD/2 ;
  clk                         <= '1' ;
  wait for CLK_PERIOD/2 ;
end process system_clk ;

So the second process generates the encryption enable signal en_encrypt that will run the encryption with the various counter and key values. This signal has short periods when it is set low to restart the encryption process for each combination of inputs.

------------------------------------------------------------------
-- Create the enable signal after a delay of 5 system clock cycles
------------------------------------------------------------------
enabler : process is
begin
  en_encrypt                  <= '0' ;
  wait for CLK_PERIOD*5 ;
  en_encrypt                  <= '1' ;
  wait for 30 us ;
  en_encrypt                  <= '0' ;
  wait for CLK_PERIOD*5 ;
  en_encrypt                  <= '1' ;
  wait for 30 us ;
  en_encrypt                  <= '0' ;
  wait for CLK_PERIOD*5 ;
  en_encrypt                  <= '1' ;
  wait for 30 us ;
  en_encrypt                  <= '0' ;
  wait for CLK_PERIOD*5 ;
  en_encrypt                  <= '1' ;
  wait for 30 us ;
  en_encrypt                  <= '0' ;
  wait ;
end process enabler ;

The en_encrypt signal is asserted for 30us as this is more than the 196 clock cycles the full encryption process takes.

And the last process generates the stimulus of the encryption_key and counter values with timing approximately aligned with the enable signal but slightly in advance. I’ve done this to ensure that I don’t get any odd effects arising from delta delays in the simulation.

------------------------------------------------------------------
--  Create a sequence of test vector pairs
------------------------------------------------------------------
test_vector : process is
begin
  encryption_key          <= x"00000000000000000000000000000000" ;
  counter                 <= x"0000000000000000" ;
  wait for 30 us ;
  encryption_key          <= x"0102040810204080fffefcf8f0e0c080" ;
  counter                 <= x"0000000000000000" ;
  wait for 30 us ;
  encryption_key          <= x"9e3779b99b9773e9b979379e6b695156" ;
  counter                 <= x"ffffffffffffffff" ;
  wait for 30 us ;
  encryption_key          <= x"0102040810204080fffefcf8f0e0c080" ;
  counter                 <= x"fffefcf8f0e0c080" ;
  wait ;
end process test_vector ;

The last couple of elements in the testbench instantiate the unit under test at the end of the architecture and then configure the various testbench elements.

------------------------------------------------------------------
-- UUT : xxtea_encrypt
------------------------------------------------------------------
port map
  (
    clk                       => clk ,
    en_encrypt                => en_encrypt ,
    encryption_key            => encryption_key ,
    counter                   => counter ,
    cypher                    => cypher
  ) ;
end architecture TB_ARCHITECTURE;
------------------------------------------------------------------
-- Configure the testbench instances
------------------------------------------------------------------
configuration TESTBENCH_FOR_xxtea_encrypt of xxtea_encrypt_tb is
  for TB_ARCHITECTURE
    for UUT : xxtea_encrypt
      use entity work.xxtea_encrypt(rtl) ;
    end for ;
  end for ;
end configuration TESTBENCH_FOR_xxtea_encrypt ;

Again I’ve embedded the pdf so you can see the whole thing.

VHDL File PDF

xxtea_encrypt_TB.vhd

Simulation

Now that I have created the testbench it is a simple matter of running this in Active-HDL and then examining the waveform window. And from this I was able to see that the simulated data I generated from the testbench agreed with the test vector values I generated from my ‘C’ version above. So check out the waveforms in the attached PDF.

Simulation Waveform File PDF

xxtea_encrypt_waveform.pdf

Conclusions

So I’ve now shown that my VHDL code will correctly encode the counter and key values to produce the cypher text as predicted. This component now need to be embedded into a system that I can use to create the encryption co-processing engine. I will start taking a look at how this will be done in my next post.

Leave a Reply

Your email address will not be published. Required fields are marked *