



## **Agenda**

- Introduction FPGA for Embedded
- The C2H Compiler
  - Overview
  - How the C2H compiler works
  - Optimising performance/size
- More information







## **Today's FPGA Devices Meet Embedded System Requirements**

- Wide range of fast I/O
- High-performance Digital Signal Processing (DSP) blocks
- Abundant logic
- Substantial embedded memory
- Low Cost FPGA and Structured ASIC families
- Soft Processor cores













# LG Electronics 3G Base Station



"Altera HardCopy Stratix devices provide a low-risk, cost-optimized, high-volume solution for our next generation 3G base station, eliminating the need for us to use an ASIC or standard product. By offering industry-leading density and a seamless migration path from Stratix FPGAs to HardCopy devices, Altera improves our time-to-market and lowers our costs, enabling us to penetrate new markets."

-Bong-Bin Park, Senior Vice President, CDMA Research Lab

#### **Application:**

3G Base Station

#### **Industry**:

Wireless Communications

#### **Altera Value Proposition:**

- Highest FPGA Capacity Enabled One-Chip Solution
- FPGA Flexibility Delivered Short Timeto-Market
- HardCopy Structured ASIC Provided Path to Cost Reduction

#### **Altera Products Chosen:**







# Toolrama DiabloSport Predator



#### **Application:**

**Automotive Diagnostic Tool** 

#### **Industry**:

**Automotive** 

#### **Altera Value Proposition:**

- Nios Processor + SOPC Builder + Low Cost Cyclone FPGA = Perfect Microprocessor Solution
- FPGA Flexibility Enables Acceleration via Custom Peripherals

"In our Predator product, the industry-leading low cost and flexibility of the Altera solution enabled us to replace an off-the-shelf processor, add features, and increase performance. Based on our successful experience with Cyclone devices, we will be adopting the Cyclone II family for all of our new product development, which will enable us to deliver even greater functionality at lower cost."

-Ivan Kotzig, Chief Engineer

**Altera Products Chosen:** 



Nios<sup>®</sup>II



## **Tait TM8100, TM8200, TM9100, and TP9100 Radios**



"Altera products provide us with the flexibility to configure combinations of complex embedded and signal processing functions that are not available in a single off-the-shelf processor solution. With Cyclone devices we create the exact combination of peripherals and functions we require, including complex multirate channel filters, automatic gain control, frequency control loops, and complex high performance modems."

-Tony Berggren, Radio Architectures Technology Leader

**Application:** 

Digital Portable Radio

**Industry:** 

Wireless Communications

#### **Altera Value Proposition:**

- Flexible Platform for Rapid
   Development of Multiple Product
   Lines
- Customization Not Possible with Standard Products
- Adopting Nios II Processor Reduces
   Power Consumption

**Altera Products Chosen:** 



Nios<sup>®</sup>II



## Navman TRACKFISH 6600



"Replacing an off-the-shelf processor with a Cyclone series device running a Nios processor enabled us to achieve the highest visual quality, as well as bring these improvements to a wide variety of display sizes. These benefits, combined with power consumption and cost savings, have led us to adopt the Nios processor as our preferred embedded solution."

-Shane Dooley, Marine GPS Product Manager

#### **Application:**

Marine Navigation Instrument

#### **Industry:**

**Digital Consumer** 

#### **Altera Value Proposition:**

- Reduced Component Cost and Power Consumption by Integrating Off-the-Shelf Processor
- Nios-Based Custom
   Microcontrollers Address Specific
   Design Requirements

#### **Altera Products Chosen:**



Nios<sup>®</sup>II



## Intevac NightVista



**Application:** 

Night Vision Camera

**Industry:** 

Industrial

#### **Altera Value Proposition:**

- Lower Cost Solution than DSP Processor
- High Integration, Small Form Factor
- Low Power

"Replacing an off-the-shelf DSP processor allowed us to reduce five separate boards of components to a single board occupied mainly by a Cyclone device running an embedded Nios processor. The Altera-based approach reduced our component costs by at least 20 percent and decreased our power consumption to a fifth of what it had been."

**Altera Products Chosen:** 



Nios<sup>®</sup> II



-David Main, Engineering Group Manager

# Sanyo PLV-Z5 Home Theater Projector









"By using the Stratix FPGA plus HardCopy structured ASIC solution for the PLV-Z5, we can design new features and functions more quickly and cost-effectively than alternative silicon solutions allow."

-Kazuto Sugimura, General Manager of Technology Unit, Projector Central Business Unit

#### **Application:**

**LCD Projector** 

#### **Industry**:

**Digital Consumer** 

#### **Altera Value Proposition:**

- Rapid, Cost-Effective Product Development Not Possible with Alternative Solutions
- Altera Device + Nios II Implements Image Enhancement Functions Resulting in Award-Winning Product

#### **Altera Products Chosen:**





# Loewe Spheros and Xelos LCD Televisions



"For several years Loewe has leveraged the flexibility of FPGAs, enabling us to respond quickly to differing display requirements without major PCB modifications. The density of the Cyclone II family now enables us to integrate our Image+ picture improvement algorithms at an attractive cost. Cyclone II's timescales matched our development schedule and Altera's on time delivery allowed us to meet our launch target for the Image+ equipped TV sets."

-Roland Bohl, Director R&D

#### **Application:**

LCD TVs

#### **Industry**:

**Digital Consumer** 

#### **Altera Value Proposition:**

- Cyclone Series FPGAs Deliver Low-Cost Image Enhancement Functions
- Flexible LCD Timing Broadens Pool of Available Panels, Reducing Manufacturing Costs

#### **Altera Products Chosen:**





## Blaupunkt TravelPilot Rome Car Audio / Sat.Nav. Unit



"The combination of Altera's programmable solutions for the automotive industry and excellent development support shortened our design time by six months. We were able to reduce design complexity by replacing multiple standard components with a single Cyclone device hosting a Nios II embedded processor, which eased our development effort and increased our product quality and reliability."

-Georg Sandhaus, Director of System Engineering

**Application:** 

**Auto Navigation** 

**Industry:** 

**Automotive** 

#### **Altera Value Proposition:**

- Low-Cost, Flexible Graphics Processing and Control Functions
- Shortened Design Time by Six Months
- Cyclone + Nios II Processor Enable Platform for Rapid Product Development

**Altera Products Chosen:** 





## Host Automation H2-EBC100 and H2-ECOM100





"Utilizing the Nios processor and Cyclone FPGA approach, we can get the exact mix of peripherals we need, in a package that we need, at a reasonable cost. In addition, we can reduce the number of unique parts in our inventory by using the same hardware platform for all of our designs."

> -Bob Palermo Senior Design Engineer

#### **Application:**

100 Base-T Ethernet Controllers for a PLC

#### **Industry**:

Industrial Automation

#### **Altera Value Proposition:**

- Nios Processor + Low Cost
   Cyclone FPGA = Perfect
   Microprocessor Solution
- FPGA Flexibility Enables
   Connection to Proprietary PLC
   Backplane and Custom
   Microcontroller Peripheral Set

#### **Altera Products Chosen:**



Nios<sup>®</sup>II



### Phoenix Contact ILC150 PLC



"Phoenix Contact have been using Altera and the Nios Soft CPU since 2002 to develop a scaleable hardware platform used in products like the ILC 350 ETH and ILC 390 PN 2TX-IB. We have also been able to develop a new, highly compact generation of controller called the ILC 150 ETH. The combination of an entirely FPGA-based platform with the NIOS soft CPU has enabled us to deliver a very small but powerful controller with integrated Ethernet and INTERBUS interfaces at a extremely competitive market price"

- Roland Bent,

**Application:** 

**Process Logic Controllers** 

**Industry:** 

Industrial

#### **Altera Value Proposition:**

- Scaleable Platform
- Low Cost
- Obsolescence free

**Altera Products Chosen:** 



Nios<sup>®</sup> II



© 2007 Altera Corporation/P Marketing and Development, Phoenix Contact

## Siemens AG SIMATIC MV220



"For the colour area Sensor SIMATIC MV220 we needed to integrate a complete image processing system into a compact form factor, presenting serious performance, heat dissipation and cost challenges. Using the flexible combination of Cyclone series FPGAs with the Nios II embedded processor we were able to achieve our goals and deliver a high performance, highly integrated and cost effective solution."

Jens Hauffe, Product Manager
 Dr. Peter Thamm, Project Leader
 Siemens AG

#### **Application:**

Color area sensor for manufacturing and packaging applications

#### **Industry**:

Industrial

#### **Altera Value Proposition:**

- High Performance
- Low Cost
- Flexibility

#### **Altera Products Chosen:**







### **Reducing System Costs - Integration**



## Replace External Devices with Programmable Logic





## **FPGA Provides Hybrid Approach**



Functionality is supported in most appropriate location:

- External CPU
- FPGA based CPU(s)
- FPGA Logic



## Differentiate Your System With FPGA Based Functionality













## **SOPC Builder System Design**

#### 1. Select & Configure IP

#### 2. Select Connections









## **Cuts Weeks Off Development Time**



## the embedded masterclass

## **Traditional System Design**







## **SOPC Builder Integration**







### **Nios II Processor Overview**

- Family of configurable 32-bit RISC processors
- Automated processor configuration and integration of peripherals via SOPC Builder
- Integrate custom logic to add custom features and boost performance
- Performance up to 300 DMIPs (/f, Stratix III)
- Cost as low as 25¢ of logic (/e, Cyclone III)









#### **Traditional Processor Acceleration**



Potential Issues With Power, Board Layout, Memory Speed, Device Cost & Availability





## **Accelerate Only What's Needed**



## Transfer Processing to Hardware Highly Effective - Minimal Impact to System

(eg. Image Rotation - performance of 95MHz Nios II with C2H Accelerator equivalent to 1.4 GHz PowerPC)



## **Accelerating Software in FPGA**

- Add custom instruction
  - Ideal for discrete operations
- Add hardware accelerator
  - Processor & accelerator can run concurrently
  - More work per clock
  - Lower f<sub>MAX</sub>, power, cost
  - Ideal for block operations





Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation



the embedded masterclass





## **SW->HW Accelerator Integration**

#### **Standard Flow**

- Profile Code
- Identify Bottlenecks
- Re-partition Memory
- Create an Accelerator
  - HDL
    - Design
    - Simulate
    - Integrate
  - Software driver
    - Write function
    - Integrate
    - Re-build 'C' code
- Verify
- Repeat based upon results

#### **Altera Flow**

- Profile Code
- Identify Bottlenecks
- Re-partition Memory
- Create an Accelerator

CB23 8EU - HDL

- Design
- Software driver
  - Write function
- Integrate HW and SW with SOPC Builder
- Verify
- Repeat based upon results





## Altera C2H – The SW/HW Accelerator Solution

 Generates a custom hardware accelerator from an ANSI C function.





## "Secret Sauce" Behind C2H Compiler

#### SOPC Builder

- Automatically connects CPU, accelerator & memory
- Understands memory latencies (for HW scheduling)
- Memory connection gives accelerator access to variable data and allows de-referencing of 'C' pointers

#### Avalon System Interconnect Fabric

- Automatically generated
- High bandwidth custom interconnect
- Master-Slave based design
- Dedicated connections with slave-side arbitration
- Any slave can connect to any master



the embedded

## C2H Leverages Over 30 Man-Years of Investment in SOPC Builder & Avalon



## the embedded masterclass

## **C2H Design Flow**







- All data types
- All operators
- All control-flow constructs
- All looping constructs
- Macros
- Function-calls
- Pointer and array access
- Exceptions
  - Floating point
  - Recursion







### C2H Works Best With...

- Time consuming loops (block based data)
  - Take data from buffer/s
  - Process (maths/computation intensive)
  - Write data to buffer/s
- Systems with multiple memories
  - Access to only one external RAM can become a bottleneck
- Use custom instructions for HW operations that do not involve blocks of data





### C2H Is Not A Good Fit For...

- Developers with no intention of using Nios II
- Applications that need the very highest performance (development effort/time not an issue)
- Applications that require smallest implementation (number of LEs)
- Exception:
  - C2H can be used to confirm bottleneck analysis and then replace with hand coded HDL later





### Hand Crafted Accelerator vs. C2H

#### **Standard Flow**

- Profile Code
- Identify Bottlenecks
- Re-partition Memory
- Create/Modify an Accelerator
  - HDL
    - Design
    - Simulate
    - Integrate
  - Software driver
    - Write function
    - Integrate
    - Re-build 'C' code
  - Import into SOPC Builder
- Repeat based upon results

#### **C2H Flow**

- Profile Code
- Identify Bottlenecks
- Re-partition Memory
- Select C function
  - Right click to accelerate
  - Build
- Repeat based upon results







### **Dramatic Performance Boost**





# Step 1: Identify Software Bottlenecks

```
main ()
{ ...variable declarations...
  init();
  while (!error && got_data())
    do_user_interface();
    gather_statistics();
    if (got_new_data())
      d_transform(in_buf, out_buf);
    check for errors();
  cleanup();
```



the embedded





### Step 2: Right-Click to Accelerate

```
Ctrl+Z
                                   Revert File
main ()
                                   Cut
                                                        Ctrl+X
   ...variable declar
                                                        Ctrl+C
                                   Сору
                                                        Ctrl+V
   init();
                                   <u>Paste</u>
                                   Shift Right
                                   Shift Left
   while (!error &&
                                   Comment
                                                        Ctrl+/
                                   Uncomment
                                                        Ctrl+\
                                   Content Assist
                                                        Ctrl+Space
       do_user_interf
                                   Add Include
                                                        Ctrl+Shift+N
                                   F<u>o</u>rmat
                                                        Ctrl+Shift+F
       gather_statist
                                   Show in C/C++ Projects
       if (got_new_da
                                   Refactor
           d_transform(
                                   Open Declaration
                                                        F3
                                   Open Type Hierarchy
                                                        F4
       check_for_erro
                                   All Declarations
                                   All References
                                 →I Run To Line
   cleanup();
                                 I► Resume At Line
                                 Add Watch Expression...
                                   Accelerate with the Nios II C2H Compiler
```





Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation





#### **Build Hardware Unit for Function**

```
d_transform (int *t, int *p)
  ...setup code...
  for (i = 0; i < Buf_Size; i++)</pre>
       ...loop overhead...
                                                        Hardware
       *t = transform (*p);
                                                       Accelerator
       t++;
       p++;
  ...exit code...
```





#### **Integrate Into System**



- 1. Generate SOPC Component
- 2. Integrate into HW system
- 3. Generate SW control function
- 4. Integrate into SW build
- 5. Build Software and/or Hardware









#### **Select C2H Build and Run Options**

- Accelerated functions tab gets created
  - Select desired options then Build project









#### **Automated Acceleration With C2H**



# HW Accelerator Software Wrapper File

In Application or Release directory in Nios II IDE

```
#ifdef CHAC THIS
#else
#include "stdio.h"
#include "io.h"
#ifndef ACCELERATOR C2H DEMO RUN FILTER CPU INTERFACEO BASE
#define ACCELERATOR C2H DEMO RUN FILTER CPU INTERFACEO BASE 0x008108E0
#endif
 inline volatile int run filter ( int * dest_ptr, int * source ptr, int length )
 IOWR 32DIRECT (ACCELERATOR C2H DEMO RUN FILTER CPU INTERFACEO BASE, (4), (int) (dest ptr));
 IOWR 32DIRECT (ACCELERATOR C2H DEMO RUN FILTER CPU INTERFACEO BASE, (8), (int) (source ptr));
 IOWR 32DIRECT (ACCELERATOR C2H DEMO RUN FILTER CPU INTERFACEO BASE, (12), (int) (length));
 /* Write 1 to address 0 starts the accelerator */
 IOWR 32DIRECT(ACCELERATOR C2H DEMO RUN FILTER CPU INTERFACEO BASE, (0 * sizeof(int)), 1);
 /* Poll. When read from address 0 returns 1, the accelerator is done */
 while (IORD 32DIRECT (ACCELERATOR C2H DEMO RUN FILTER CPU INTERFACEO BASE, (0 * sizeof(int))) == 0)
 return *(volatile int *)(ACCELERATOR C2H DEMO RUN FILTER CPU INTERFACEO BASE + (1*sizeof(int)));
#endif /* CHAC THIS */
```





#### **Logic Generation**

- "What you type is what you get"
- Every arithmetic operator you type...
  - Makes equivalent hardware unit in accelerator

- Control-flow statements generate control state machine
- Pointer and array references create memory master interfaces
- Results strongly depend on target algorithm and coding style
  - Some algorithms accelerate better than others
  - Coding style impacts size and speed of accelerator





#### **Example:**

```
long long MAC
 (int *a, int *b, int len)
  long long result = 0;
  while (len > 0) {
    result += *a++ *
    len--;
  return result;
```

- 1 32x32 multiplier
- 3 32-bit incrementers
- 1 64-bit adder
- 1 32-bit comparator
- 2 read-masters
- Nominal control logic



Every converted software function has its own dedicated hardware state machine





#### **Assignments**

#### General rule:

- "=" operator translates into a registered HW operation
- Calculation of the value takes one clock cycle



#### Two exceptions:

- Assignments that require zero logic elements in hardware (0 cycles)
- Assignments that use complex arithmetic,
   these require logic that can take multiple cycles (>1 cycle)





- Certain logical and bitwise operations involving constants are trivial and require no logic
  - In hardware, they are performed by manipulating wires
  - If an assignment consists solely of such operations (see table below),
     then its result is not registered

| Operators That Can Result in Unregistered Assignments |                      |                                 |
|-------------------------------------------------------|----------------------|---------------------------------|
| Operator                                              | Description          | Required Condition              |
| >>                                                    | Right bitwise shift  | Right-hand side is constant     |
| <<                                                    | Left bitwise shift   | Right-hand side is constant     |
| &                                                     | Bitwise AND          | Either operand is constant      |
| - 1                                                   | Bitwise inclusive OR | Either operand is constant      |
| ^                                                     | Bitwise exclusive OR | Either operand is constant      |
| ~                                                     | Bitwise inversion    | Right-hand side is unregistered |
| )                                                     | Type cast            | Right-hand side is unregistered |



the embedded



#### **Mapping Software To Hardware**

- Software operations are assigned to hardware states
- Multiple software operations can be assigned to single hardware state (executed in parallel)
- Parallel execution is limited by data dependencies
  - Eg. If operation B depends on a value calculated in operation A, then B cannot execute until A has completed







#### Introduction To Data Dependencies

```
int foo(int a, int b, int c)
{
  int x ,y, z;
  x = a * b;
  y = b * c;
  z = x + y;
  return z;
  a
}
```

States assigned directly from dependency graph









#### Performance/Resource Report

- C2H compiler results:
  - Logic created
    - Masters
    - Multipliers
  - Mapping of 'C' code to hardware states
  - Loop latency
  - Clocks per loop iteration (CPLI)





#### **Looping Data Dependencies**

```
int foo( int a,
         int b,
         int c )
  int x, y;
  int i = 0;
 while (i < 5)
    a = a + 5;
    i++;
  return a;
```

States assigned directly from dependency graph









## **CPLI = Clocks Per Loop Iteration**



© 2007 Altera Corporation

Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation



# the embedded masterclass

## **Hardware Pipelines And CPLI**





#### **Optimising Data Dependencies**

```
int foo( int a,
         int b,
         int c)
  int x, y;
  int i = 0;
 while (i < 5)
    a += x + y + 5;
    a = a + 5;
    i++;
  return a;
```

# Change code to eliminate last stage



© 2007 Altera Corporation

ATTERA.

## **CPLI Optimised**







CPLI = 2
2 Multipliers
3 Adders

© 2007 Altera Corporation

 $Altera,\,Stratix,\,Cyclone,\,MAX,\,HardCopy,\,Nios,\,Quartus,\,and\,\,MegaCore\,\,are\,\,trademarks\,\,of\,\,Altera\,\,Corporation$ 



#### **Optimising CPLI**



CPLI = 3

CPLI = 2

```
34int foo new (int a, int b, int c)
                                                             34 int foo new (int a, int b, int c)
 35 {
                                                             35[
 36 int x, y;
                                                             36 int x, y;
    int i = 0;
                                                             37 int i = 0:
 38
 39
     while (i < 5)
                                                             39 while (i < 5)
  40
  41
                                                                     x = a * b;
  42
         v = b * c:
                                                                     v = b * c;
  43
         a += x + y;
                                                                     a += x + y + 5;
  44
         a = a + 5:
                                                             44// a = a + 5;
  45
         i++;
                                                             45
                                                                    i++;
  46
                                                             46
  47
      return a:
                                                                 return a:
C2H X Properties Console Problems
                                                           C2H X Properties Console Problems
            - ine:40 Loop CPLI=3
                                                                     - → The accelerated function contains 1 loop.
                  Loop latency: 4
                                                                       = i → file:../demo.c line:40 Loop CPLI=2
              Cycles per loop iteration (CPLI): 3
                                                                             Loop latency: 3
                    Critical loop variable: a
                                                                          🗀 🗁 Cycles per loop iteration (CPLI) : 2
                🖆 🧀 Assignments in critical path
                                                                                Critical loop variable: a
                       line 41: x = (a * b): state 0 ---> 1
                                                                            - Basignments in critical path
                    -- line 42: y = (b * c): state 0 ---> 1
                                                                               line 41: x = ( a * b ): state 0 ---> 1
                     line 43; a += (x + y); state 1 ---> 2
                                                                                -- line 42: y = (b * c): state 0 ---> 1
                      line 44: a = (a + 5): state 2 ---> 3
                                                                               line 43: a += ( ( x + y ) + 5 ): state 1 ---> 2
              - Bcheduling information per state
```



#### **Optimising Hardware Resource**

```
int foo( int a,
         int b,
         int c )
  int x, y;
  int i = 0;
 while (i < 5)
    y = b * c;
    a += x + y;
    a = a + 5;
    i++;
  return a;
```

# Change code to reduce number of multipliers

$$(a * b) + (b * c) = b * (a + c)$$



© 2007 Altera Corporation



# **Resource Optimised**





© 2007 Altera Corporation

Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation







```
43 int foo new (int a, int b, int c)
  39 int foo new (int a, int b, int c)
                                                               44 (
  40 (
                                                               45 int x, y;
  41 int x, y;
                                                                  int i = 0;
  42
     int i = 0;
                                                               47
  43
                                                               48 while (i < 5)
  44
     while (i < 5)
                                                               49 {
  45
                                                                       x = a + c;
  46
          x = a * b;
                                                                       v = b * x;
  47
          y = b * c;
                                                                       a += y + 5;
  48
          a = a + 5;
                                                               53
                                                                       i++;
  49
          i++;
  50
                                                               55 return a:
  51 return a;
                                                               56}
◆C2H X Properties Console Problems
                                                             ◆C2H X Properties Console Problems
       -- O Use software implementation

    Use hardware accelerator in place of software implementation.

                                                                    O Use software implementation
     🖆 🧀 Build report
       🖶 🧀 Summary
                                                                  🖆 🗁 Build report
       🖆 🧀 Glossary
                                                                    🖶 🧀 Summary
        - Besources
                                                                    ⊕ Glossary
          i About Resources
                                                                    📥 🔑 Resources
          🖶 🗁 The accelerated function requires 2 Multipliers.
                                                                       i About Resources
                                                                       È ← The accelerated function requires 1 Multiplier.
             i About Multipliers
             b = b = b = a + b
                                                                          i About Multipliers
             \rightarrow line 47: y = (b * c)
                                                                          \rightarrow line 51: y = (b * x)
        ⊕ Performance
                                                                     🖶 🗁 Pertormance
```



#### Removing the Accelerator







#### **Benefits of C2H Compiler**

#### Improves productivity:

- Automated process
- Hit the performance target quicker
- Get more value out of the FPGA
- Finish designs sooner
- Accelerate legacy systems that are struggling to support new features

C2H Compiler: High Productivity Hardware Acceleration



#### **More Information on C2H**



- www.altera.com/C2H
  - C2H Overview
  - C2H White paper
  - Online demonstration
  - C2H User Guide
  - Image rotation and FIR design examples
- 1 hour tutorial "Nios II C2H Compiler Fundamentals" www.altera.com/training
- AN420: Optimizing Nios II C2H Compiler Results (including design files)
- AN 417: Accelerating Functions with the C2H Compiler: Scatter-Gather DMA with Checksum (including design files)



Nios II processor. The second becomes a hardware accelerator, replacing the C code with equivalent logic in the FPGA. The speedup for the hardware version is typically more than two orders of magnitude. The exact speedup



depends on the target FPGA

