Firmware Development Kit
The ExaNIC FPGA development kit unlocks the FPGA technology within the ExaNIC, allowing customers to develop applications that run directly within the network card firmware. This allows for a number of interesting applications, some of which are demonstrated in examples provided with the development kit. The following examples come with the development kit, including the requisite source code for each:
- A 'fast responder' example shows how to pre-load the card with a reply ahead of time, and send it based on a simple mask/pattern match over received frames.
- A 'ping example' demonstrates various functionality, including sending frames directly from the card, making use of hardware timestamping, and using custom frames to communicate with software.
- A 'steering example' demonstrates how to perform user-defined flow steering. A simple destination IP based flow steering example is provided, which can easily be modified to perform steering based on application layer information.
- A 'bridging example' demonstrates how to bridge two ports together, such that traffic received on one port is transmitted out of another.
- A 'loopback example' allows users to benchmark the MAC latency of the ExaNIC. This example simply sends a response packet on receipt of the first byte off the wire.
Note that this document relates to the firmware development kit for the Ultrascale series of ExaNICs. This has several changes when compared to the older 7-series FPGA FDK (ie for ExaNIC X2 and X4). Please contact Exablaze for FDK documentation on this older series of cards.
A mailing list is available to be notified when updates to the ExaNIC FDK are released. Please feel free to add yourself here
Prior to using the development kit customers must install Xilinx Vivado 2015.4 or later, which can be obtained from the Xilinx website. Note that Exablaze has observed instances where incorrect logic is synthesized using Vivado 2016.1 and 2016.2 so recommends not using these versions. Good results have been achieved with later versions, eg 2016.4 and 2017.1
Xilinx added support for the XCKU035 FPGA used in the ExaNIC X10 and X40 to the free WebPACK license, so a paid Vivado license is not required to use the FDK on these cards.
The ExaNIC development kit ships as a tar file that contains a project directory structure. Untar the project directory structure to a convenient location.
The ExaNIC development kit ships with a build system for five fully functional example
applications. The build system consists of a Makefile and a Vivado TCL script (
The Makefile launches Vivado and instructs it to run the TCL script. All project source files and
the Vivado build process are defined within the TCL script.
The Makefile defines a number of targets and hardware platforms, which can be set by passing variables into the make command like so:
$ make TARGET=trigger PLATFORM=x10
The valid targets are
loopback and the two hardware platforms are
x40. In addition to the targets and platforms, there are also two build options:
- NOREBOOT=1 will disable FPGA reload when the PCIe reset line is asserted. This is useful if you want to load a bitstream onto the FPGA via JTAG, then perform a system reboot.
- NOTANDEM=1 will disable Xilinx tandem boot mode. Tandem has occasionally shown issues that result in systems failing power on self test for some builds. If you encounter any such issues building with this option may assist.
The build system will generate a number of files in the
outputs/ directory, including a standard
ExaNIC firmware image with a .fw extension that be can be
flashed to an ExaNIC with the
exanic-config after flashing & rebooting will result in something similar to the following:
$ exanic-config Device exanic0: Hardware type: ExaNIC X10 Board ID: 0x00 Temperature: 70.6 C VCCint: 0.94 V VCCaux: 1.86 V Function: customer application Firmware date: 20170106 (Fri Jan 6 01:30:05 2017) Customer version: 1485732321 (588f3d31)
Every FDK built by Exablaze is unique and watermarked for the customer it was built for. The Firmware date listed above is the date this FDK was built by Exablaze. The Customer version is the date/time the customer built this image. The date command can be used to convert this number to a human readable form if required:
$ date -d @1485732321 Mon Jan 30 10:25:21 AEDT 2017
Transmit Buffer Size
The "base" logic provided with the FDK (eg PCS/MAC) uses a fixed portion of the FPGA's resources.
For example, the
trigger_example built for the X10 uses approx 10% of total BRAMs, and when built for the X40
this increases to approx 20%.
These BRAMs are used, in part, as transmit buffers for host software where packets are staged in the FPGA prior to transmission onto the network.
The default (per port) transmit buffer size is as follows:
- ExaNIC X10 stock firmware: 128 kByte
- ExaNIC X40 stock firmware: 64 kByte
- ExaNIC X10 FDK: 32 kByte
- ExaNIC X40 FDK: 32 kByte
The transmit buffer size is reported by exanic-config when passing the verbose (
$ exanic-config exanic0 -v Device exanic0: Hardware type: ExaNIC X10 Board ID: 0x00 Temperature: 50.0 C VCCint: 0.95 V VCCaux: 1.85 V Function: network interface Firmware date: 20170116 (Mon Jan 16 22:01:01 2017) PPS out: disabled Port 0: Interface: enp1s0 Port speed: 10000 Mbps Port status: enabled, no SFP, no signal, no link MAC filters: 64 IP filters: 128 TX buffer size: 128kB MAC address: 64:3f:5f:01:29:32 RX packets: 31019943 ignored: 0 error: 0 dropped: 0 TX packets: 3000605026
These transmit buffer sizes are not scalable by customers using the FDK, however Exablaze can rebuild the FDK with smaller/larger sizes on request. For example, if you needed more BRAMs for your custom logic and are prepared to have smaller TX buffers available for host software, we can shrink the buffers down to 16 kByte (per port). For architectural reasons it's not possible to reduce them down beyond 16k.
The ExaNIC development kit can ship as a time limited demonstration version. After two hours of operation features of the ExaNIC will progressively shut down and stop working. After this time the host can be rebooted/power cycled to reset the two hour timer. Should you wish to purchase a full license and have this time limitation removed, contact the Exablaze sales team to discuss licensing options.
The ExaNIC development kit provides full access to all network transmit and receive datapaths,
as well as a register and memory space that can be accessed by the user's software application.
At the top level of the design hierarchy
exanic_devkit.v wraps both the ExaNIC IP core netlist
and the custom user application. This top level wrapper provides connections between the
ExaNIC netlist and the user's application. The example designs provide these connections and
can be used as a starting point for adding further functionality.
The user interface has the following clocking reset and status signals as inputs:
clk_net(1 bit), a 161 MHz clock generated from the Ethernet transmit clock. All signals with the
_netsuffix are synchronous to this clock.
clk_host(1 bit), a 250 MHz clock generated from the PCIE bus clock. All signals with the
_hostsuffix are synchronous to this clock.
rst_n(1 bit), an enable line asserted soon after the clock is valid and present.
hw_time_net(32 bit), a counter that is shared with the timestamp counter for received packets, having 6.2 ns resolution. This counter is synchronous to the
hw_time_host(32 bit), this is simply
hw_time_netcrossed into the host clock domain (
devkit_version(32 bit) provides a UNIX timestamp indicating the time at which the ExaNIC netlist was built at the factory.
The user register interface allows the user application to implement up to 2048 readable and/or
writeable 32 bit registers. On this interface, reads and writes happen a full 32 bit word at a time,
with no individual byte enables. All signals in this section are synchronous to
interface is implemented using the following signals:
reg_w_en(1 bit), asserted on the same cycle as
reg_w_datato indicate a register write request from the host.
reg_w_data(32 bit), the write data from the host.
reg_w_addr(11 bit), the address of the register the host wants to write to. This address increments for each 32 bit word, and is not a byte offset.
reg_r_addr(11 bit), the address of the register the host wishes to read. As with the write address, this address increments for each 32 bit word.
reg_r_en(1 bit), a read enable signal asserted with
reg_r_addrthat indicates the address is valid.
reg_r_data(32 bit) is the data for the register selected by
reg_r_addr. Data must be provided when
reg_r_ack(1 bit) must be asserted when
reg_r_datais valid in response to a read. The user logic has 16 cycles in which to assert
reg_r_ackin response to
reg_r_enbefore the read times out. The PCIe logic will reply with an unsupported request TLP on timeout.
The user memory interface allows the application to implement a write-only (for the host)
memory space. Reading back of this memory by the host is not supported. This can be useful for
the implementation of transmit buffers and maps well to block memories. All writes are
performed synchronous to
clk_host. This interface is implemented using the following signals:
mem_w_en(32 bit), 32 bit write byte enable, asserted for each byte offset from
mem_w_addrthat the host wishes to write to. The LSB (bit 0) of the write enable signal refers to the byte at offset 0 from the write address.
mem_w_addr(19 bit), the memory offset at which the host intends to write to. This is the DWORD offset (32 bit) from the development kit region in BAR2.
mem_w_data(256 bit), up to 32 bytes of data (selected by the write enables) that the host wishes to write.
Note that the memory interface is always 'address aligned'. This means that in practice,
mem_w_addr[2:0] is always zero, and the byte enables must be used to determine which bytes
will be written to.
The user application can monitor all received frames and tag, modify, or drop them prior to
them being transferred to the host. All received data is provided synchronous to
first byte seen by the user application will be the first byte of the destination MAC address - the
SFD and preamble are not provided. The receive data is provided via the following signals:
rx_data_net(64 bits per port), 8 data bytes as received obtained from the wire. Byte 0 (bits 7 to 0) is the first to be received off the wire.
rx_sof_net(1 bit per port), asserted on the same cycle as the first 8 bytes received from the wire. The 64 bits of data will contain the 6 byte destination MAC address and the first 2 bytes of the source MAC address.
rx_eof_net(1 bit per port), asserted on the very last cycle of a received frame. The last bytes seen will include the four byte received CRC.
rx_len_net(3 bit per port), asserted on the same cycle as the EOF, indicates how many bytes in the final data signal are valid. As an example, if this reads 1, then only the bottom byte of data (bits 7 to 0) is valid.
rx_vld_net(1 bit per port), asserted to indicate that receive data is valid. Due to the overhead of the 64b/66b encoding used in 10G ethernet, there may be cycles intra-frame that do not contain valid data.
rx_err_net(1 bit per port), asserted to indicate an abnormal frame termination condition. This can occur when the sender aborts the frame early, or if the link is lost in the middle of a frame. If a frame is terminated with
rx_err_net, there will be no
rx_eof_netfor that frame.
rx_crc_fail_net(1 bit per port), will be asserted on the same cycle as EOF to indicate that the CRC does not match the received data.
rx_timestamp_net(32 bit per port), is a counter that serves as the timestamp for the first byte of the received frame with 6.2ns of resolution.
Note that the width of each of the above signals scales with the number of ports. To select the
set of signals for a given port, use bit slicing. For example, RX data for port 0 will occupy
rx_data_net[63:0], and RX data for port 1 will occupy
rx_data_net[127:64]. The example designs
show how to perform this bit slicing or indexing for each of the signals on this bus. This note
applies to all of the ethernet frame interfaces in the FPGA development kit.
A minimum of 1 spare cycle is required between EOF and SOF being asserted. Normally frames coming off the wire will have at least this (even at full line rate, due to ethernets minimum Inter-Frame Gap). However if the user application is generating frames to send up to the host in addition to those coming off the wire then this requirement must be observed by the user logic.
The receive net signals described above are all inputs, but the user application has a duplicate
set of signals for the receive path that are outputs. These signals mirror the above signal names,
but are suffixed with
_host instead of
_net. These host signals are synchronous to
outputs map directly to the ethernet frame DMA engines for each port. To allow the port to
function as a regular ethernet interface, the user can simply connect the net interface to its
corresponding host interface using the provided asynchronous FIFO. There are two additional
rx_match_host(8 bit per port), allows the user application to tag frames with an 8 bit code with application specific meaning. This code will be provided in the information section of each chunk of the frame that is transferred to the host.
rx_buffer_host(6 bit per port), allows the user application to steer frames to different userspace buffers on the host system. For more information on custom flow steering, see the flow steering example design section of this document.
The example in the figure below shows a frame as it traverses the card frame interfaces. This
frame has a destination MAC of
64:3F:5F:01:00:00 and a source MAC of
the ethertype value, 0x0800, indicates that it is an IP frame. Notice that the SOF flag is asserted
only on the first beat, and the EOF and LEN flags are set on the last beat of the frame. The LEN
field indicates that 3 bytes of the frame are valid on the last beat.
An important change relative to the X2 and X4 FPGA development kits is the interpretation of len = 0 during the end of frame. This condition now means that no bytes are valid. This is different from the X2 and X4 kits, where this condition was used to signal that 8 bytes were valid.
The host transmit interface is also exposed to the ExaNIC development kit application. The user
application can monitor and modify frames that are being transmitted by the host, as well as
transmit frames of its own. Ethernet frames transmitted by the user application must start with
the first byte of the destination MAC address, and end at the last byte of the payload. Logic
within the ExaNIC automatically calculates, appends and transmits the CRC. The FPGA
application has the following signals which connect through to the ethernet transmission logic,
all of which are synchronous to
tx_data_net(64 bits per port), the data to be transmitted, 8 bytes at a time. The first byte that will be placed on the wire is located at bits 7 to 0.
tx_sof_net(1 bit per port), to be asserted on the same cycle as the first 64 bits of data.
tx_eof_net(1 bit per port), to be asserted on the same cycle as the last 64 bits of transmitted data.
tx_len_net(3 bits per port), to be set on the same cycle as EOF, indicating the number of bytes of data valid in the last cycle.
tx_vld_net(1 bit per port), asserted by the user application when the transmit data output is valid. Once
tx_sof_netis asserted, this signal must remain high until the end of frame.
tx_ack_net(1 bit per port), acknowledge signal provided to the user application. The ExaNIC can be considered to have read transmit data from the application for any any rising clock edge during which 'ack' and 'vld' are simultaneously high.
tx_err_net(1 bit per port), is an output for the user application. This signal allows the user application to send a frame with an incorrect FCS. If this signal and
tx_ack_netare high for one clock cycle while transmitting a frame, the frame will have an incorrect FCS.
A corresponding set of signals provide frames transmitted by the host. These have the same
names as the above signals, but with
_net replaced with
_host. The host transmit signals are
driven by the ExaNIC transmit engine, and provide access to frames sent from software. The
host transmit signals are synchronous to
clk_host. A port will act as a normal ethernet interface
if the net signals are directly connected to the corresponding host signals using the provided
Normally, the ExaNIC transmit engine will pad frames sent down from host software that are below
the minimum frame size (<64 bytes). The user application can elect to disable this padding on a
per port and per frame basis by asserting the
disable_tx_padding flag. The flag is sampled at
The example design provides code that shows how to multiplex FPGA generated frames with the host data path. It also provides code for a state machine that conforms to this interface, reading frames from block RAM.
The user application can interface with software via its address space, as well as via modifying and tagging received packets prior to them being transferred to the host. In the first instance, pointers to the register and memory address space can be obtained using libexanic, calling:
exanic_get_devkit_registers()to get a pointer to unsigned 32 bit values the register space, and
exanic_get_devkit_memory()to get a pointer to byte values in the memory space.
The value and meaning of the registers and memory in these address spaces are dependent on the user's FPGA application.
Utilities for reading and writing to the user register space are provided in the
For example, in
trigger_example.v of the FDK, the registers are defined like so:
/* Register reads. */ always @ (posedge clk_host) begin reg_r_ack <= reg_r_en; case (reg_r_addr) 'h0: reg_r_data <= FIRMWARE_ID; 'h1: reg_r_data <= VERSION; 'h2: reg_r_data <= armed; 'h4: reg_r_data <= match_length; ... ...
FIRMWARE_ID is defined to be
32'hEB000001, so reading register 0 yields:
$ ./exanic-devkit-register-read exanic0 0 0x000: 0xEB000001 (-352321535)
The user application can also communicate with the host via dummy ethernet frames. An example of this is shown in the ping example application, where a dummy frame with a custom ethertype is DMA transferred to the host. This frame is received using libexanic and contains user-defined data.
TCP stack integration
The ExaNIC driver package includes support for exasock extensions. These extensions allow applications to obtain the next set of TCP headers for a particular socket. When used in conjunction with the development kit, these functions allow the host to manage TCP state (through transparently bypassed kernel sockets, via exasock) and allow the card to send 'fast' responses in response to user defined events.
Within the driver source tree, the
exasock-tcp-responder-example.c example application
shows how to use these functions with the trigger example firmware. This example shows how
normal UNIX socket calls can be used to make a TCP connection to a server, with the card
sending a TCP reply in response to a received UDP packet.
The ExaNIC FPGA development kit ships with source code for IP cores that are useful for performing common tasks.
Field extract (field_extract.v)
The field extract core can be used to extract an arbitrary length field from received frames. To use the core, instantiate it by specifying the following two parameters:
- BYTES: The byte width of the field to extract.
- OFFSET: The offset in bytes of the field in the frame, measured from the start of the frame.
The core can be wired directly to the development kit frame interfaces via its
inputs. The field extract core will strobe the
field_vld output for one clock cycle when the field
output contains the value of field in the currently received frame.
Frame mux (frame_mux.v)
The frame mux core provides a way to share a single frame output interface (for example,
tx_usr) between two sources of frames. It provides buffering so that interfaces that
cannot be 'stalled', such as the receive interface, can be arbitrated without loss of data. A
typical application is shown in the ping example application, where the
frame mux is used to share the host DMA datapath between received frames and FPGA generated frames.
The frame mux also allows two ports to be 'bridged' together, much like the ExaNIC bridging functionality. As an example, the frame mux can be used to connect port 0 receive to port 1 transmit, whilst also allowing the host to transmit via port 1. In this mode of operation, the frame mux has an optional FCS removal mode. This is required because received packets are provided to the user application with the FCS present, however the FCS must be removed prior to passing them to the transmit interfaces.
The frame mux core has the following parameters:
- DEPTH: The total buffering depth of the two FIFOs contained within the frame mux. This is the maximum number of QWORDs that the frame mux can store.
- IN0_DELAY, IN1_DELAY: The amount of 'prebuffering' to apply to a particular input of the mux, prior to providing it to the output. This is useful when connecting the receive of one port to the transmit of another, since Ethernet clock mismatch may result in transmitter starvation unless enough of the packet is available in a buffer prior to beginning the transmit process.
- STRIP_FCS0, STRIP_FCS1: selects whether to remove the last 4 bytes from a particular input. Useful for removing the FCS from a received packet prior to transmitting it out another port.
Valid/ack bus mux (vabus_mux.v)
The valid/ack bus mux core provides the same functionality as the frame mux core but without any buffering or registering delays. This is useful where latency is important. A typical application is the muxing of custom transmit logic together with the normal ExaNIC transmit logic. This use case is shown in both the trigger and ping examples.
Custom framegen (custom_framegen.v)
The custom framegen core generates a custom, broadcast, ethernet frame, that contains 4 QWORDS that are set by inputs to the module. This is useful for generating packets on the card and sending them to the host application. An example of this is shown in the ping example application, where the custom framegen core is used to send timestamps to the host.
The CUSTOM_ETHERTYPE parameter to the module allows the user to specify the ethertype of the frame. Setting the ethertype to a non-standard value will result in normal kernel processes safely ignoring the packet.
Asynchronous FIFO (async_fifo.v)
The asynchronous FIFO provides fast clock domain crossing between two domains. Data is
written into the the FIFO synchronous to
wren is asserted by the user, provided
the FIFO is not asserting
Data is read from the FIFO synchronous to
clk_read, on any cycle when
rden are both
Flag Synchronizer (flag_sync.v)
The flag sync module is used to cross a single bit flag between two asynchronous clock domains. The flag should be asserted for a single cycle in the input clock domain. Internal logic will then safely cross this flag such that it will then be asserted for a single cycle in the output clock domain. Note that this module assumes that the flag will be asserted relatively infrequently in the input clock domain.
The full source code is provided for all of the example applications described in this section. In all of the following examples a convention is used whereby register zero (0) in the development kit register address space reports a 'firmware ID'. This firmware ID is read by the software side of the example to verify that the correct firmware is running on the ExaNIC.
The trigger example application allows users to pre-load the card with a pattern, mask and reply frame. The application performs a match on port 0 of any incoming frame against the pattern and mask, and if a match occurs the application will transmit the reply frame. This application can be used as a starting point for more advanced custom logic.
All source code for this application is included in the
src/trigger_example directory of the
development kit package. The files include:
- ram_256_64.v, which implements a block RAM interface compatible with the development kit memory addressing scheme.
Two sample C applications for interfacing with this application is provided in the ExaNIC driver
examples/devkit/. One example,
libexanic-responder-example, shows how
to use the low level API to preload the card with a frame. The other,
shows how to use Exasock extensions to integrate the host TCP state with the FPGA
In both of these applications software application primes the FPGA trigger to match on incoming IP frames, and loads in a dummy reply. The application reports any time the FPGA logic has triggered. The libexanic application can be started using
$ ./libexanic-responder-example exanic0
The exasock example will attempt to connect to the specified TCP address/port combination. Once a connection has been established, any UDP packet that is received on the UDP port will trigger a 'hello world' packet to be sent via the TCP connection. The exasock application can be started using:
$ exasock ./exasock-tcp-responder-example <udp-port> <tcp-addr> <tcp-port>
Note that the example application is only implemented on the FPGA for port 0, and all ports operate as normal network interfaces.
The ping example uses an ICMP echo request to perform a hardware timestamped ping. The firmware takes a source IP address and destination IP address. This triggers a state machine to start by checking an ARP table for an entry that resolves the remote IP to a MAC address. If no entry for the IP address is found, the hardware sends an ARP request for the IP out on the wire and waits for a reply. When an ARP reply is received, an entry is inserted into the ARP table and the ARP table lookup performed again. The hardware then sends an ICMP echo request, filling the body of the request with a hardware timestamp, then waits for a reply. When the reply is received, the hardware sends a custom frame to the software application that contains the transmit and receive timestamps. Both ICMP and ARP requests have timeouts of 1 second associated with them, and will result in an error message sent to the host on timeout.
The ping example demonstrates the following functionality within the devkit:
- Sending pre-defined packets with values of certain fields substituted with values calculated in the FPGA. This is demonstrated in the ARP framegen and ICMP framegen modules.
- Parsing received packets and extracting information from them. The ICMP echo parse and ARP parse modules demonstrate this functionality.
- Communicating with the software application by sending a custom frame from the FPGA via the DMA interface. The custom framegen module and frame mux modules demonstrate how to interleave custom frames with frames that are received from the wire.
- Basic lookup table example (ARP table).
- Use of hardware timestamping functionality.
To run the ping example, use:
$ ./ping-example <device> <dst-ip> <src-ip>
This will send ARP and ICMP packets originating from
src-ip to the host at
device must be an ExaNIC with the ping example firmware loaded.
Flow steering example
The devkit can be used to perform flow steering based on any field within an ethernet frame. The raw frame API, libexanic, can be used to allocate DMA buffers, each of which is automatically assigned a unique ID. This ID can be passed to the card and provided to the RX host interface in conjunction with the frame in order to steer the frame to that buffer. Applications include per-symbol filtering of market data or more advanced, stateful filtering.
The flow steering example provided in the devkit demonstrates how to use this functionality to steer IP packets destined for a particular IP address to a designated buffer. Applications that monitor this buffer will only see packets that are destined for this IP address. Users can adapt this application to their requirements.
rx_buffer_host port in the devkit can be used to pick the host receive buffer that the current
frame is sent to. The value applied by the user application to this port must be ready at the
same time as the 15 th valid data beat is applied to the corresponding
rx_host interfaces, or at
the end of frame, whichever occurs first. Once set, this value must remain the same for the
duration of the frame until EOF+2 cycles.
The bridging example demonstrates the use of the frame mux for bridging of two ports on the card. Bridging involves looping back any received data on one port to the transmit datapath on another port.
The loopback example demonstrates the latency of the ExaNIC MAC layer. It does this by sending a packet out of port 0 as soon as the start of frame is seen on the RX datapath of port 0. Note that this demo logic just sends a small frame of all 0xFF's (plus CRC) - it does not loopback the incoming frame.
Testbench and functional model
The ExaNIC development kit is provided with a full functional model for all of the individual
interfaces. This can be found in the
tb/ directory of the package. The testbench consists of the
bench.v,the top level harness that wraps the various modules contained in the functional simulation.
address_access.v, contains tasks that simulate access to the BAR0 and BAR2 memory spaces in the development kit (for example, register access and memory copies).
control.v, contains various control tasks and generates the timestamp counter.
dma_sim.v, simulates the ExaNIC frame DMA interface. Will log frames that have been transferred successfully, and indicate error conditions.
transmit_sim.v, simulates the ExaNIC ethernet transmit interface. Will log frames that have been transferred successful and indicate error conditions.
frame_sim.v,simulates either host frame transmission or frames received from the wire.
test_cases.v, container for user test cases. Users can add their own simulation directives here.
bench.prj, a project file for the Xilinx simulator that lists all files that make up the simulation. New files for a project should be added here to make sure they are picked up by the simulator.
start_sim.sh, a shell script that starts the Xilinx simulator in console mode. To start in graphical mode, use the switch -gui.
The example in
test_cases.v shows how users can exercise the various elements of the
functional mode, and provides a test case for the example design. Users can add their own test
cases to this file as necessary.
Users can start the example testbench by running:
This will cause the testbench to be compiled and xsim to start in command line mode. From the xsim prompt, the simulation can be run for 10 microseconds by entering:
% run 10us
Debugging with Vivado
You can use Xilinx Chipscope Pro Integrated Logic Analyzer (ILA) to debug your FPGA designs. Xilinx documentation on how to use Chipscope for debugging can be found here.
The default build of the
trigger_example includes a Chipscope core to probe the following signals:
The definition of any signals to be probed must include the tag
(* mark_debug="true" *).
A TCL script can be used to insert the ILA core to the netlist. In the given example,
is sourced to insert the core. The signals will be captured with respect to a clock which is specified by the user. This clock is
specified with the command
connect_debug_port (line 13 of
debug.tcl file). The signals that need to
be captured should be listed at the TCL script using
connect_debug_port (line 16 of
In the given example, the
compile.tcl file is modified such that
debug.tcl will be sourced after synthesis. The command
which is required for the probes file to be written is at to the end of
The default behaviour of the ExaNIC is to reconfigure the FPGA when the host is reset. When loading on an image via JTAG (as is the case when using Chipscope), it's important to disable this automatic reboot mechanism, otherwise the image that's in flash will be reloaded into the FPGA.
This is done by adding the
NOREBOOT=1 flag when building the image:
make PLATFORM=x40 TARGET=trigger NOREBOOT=1
After you have built an image, you should connect the Xilinx Platform Cable to the JTAG connector of the ExaNIC. This is the small edge connector in the top right corner of the ExaNIC. An adapter cable can be supplied by Exablaze to connect the ExaNIC to the Xilinx Platform Cable.
More information on the Xilinx Platform Cable can be found here. The Platform Cable should be connected to the machine running Vivado.
You should then open the Vivado Hardware Manager as shown below.
Start a Hardware Server session with the following command on Tcl Console
connect_hw_server or by selecting Open Hardware Manager from the Flow menu.
Then click on Open target and then Auto Connect.
Right click on the xilinx device and click Program Device. In the pop-up menu, enter the bitstream file and the probes file.
Now, you should be able to probe the nets specified during the creation of the debug core in debug.tcl.
All ExaNICs come with a recovery flash image
for cases where a corrupt flash image has been
written to the card. To start the card in recovery mode, hold down the small button marked
'recovery' located on the top edge of the card during a reboot of the host system. The red LED
on the rear panel of the ExaNIC will then be lit. When in recovery mode, the corrupt flash image
can be overwritten by using the
- Supports firmware update/reload without reboot (except eval FDK)
- Add Chipscope core to trigger example default build
- Fix bug where NOREBOOT=1 was ignored for user FDK builds
- Add optional support for iPXE with FDK, contact Exablaze for more info
- Fix bug in bridging example where frames could be corrupted host was sending a frame at the same time as a frame was being bridged from the other port
- Fix bug in testbench (frame_sim.v) where ACK was not properly processed
- Added support for synth with Vivado 2017.3
exanic_x40_devkitmodules has been renamed
exanic_*_devkitis instantiated from the
exanic_devkittop level, which has been updated accordingly, but if you are using a modified top level then it will need to be updated.
- Change to transceiver setting in PCIe core to improve compatibility with some systems. As a result, Vivado 2016.4 or later must be used.
- Fix bug in tandem logic placement constraints
- Fix bug in X40 FDK where QSFP status could be read incorrectly from host
- Improve Flash programming settings to address occasional programming failure
- Fix bug in
tx_disable_paddingwhere padding was disabled for the frame after the next frame, not the next frame
- NEW SIGNAL:
tx_disable_padding: per-port option to disable padding of <64 byte frames received from host software (safe to leave unconnected)
- Added ability to reduce size of TX buffers available for host use, in order to free up more BRAMs for user logic. Contact Exablaze for more info.
- Added buildlog to catch how/when the FDK was built
- Added support for synth with Vivado 2017.2
- Restored ability to build images that do not use TANDEM boot (NOTANDEM=1)
- Fix bug where HW_TIME register could read incorrectly from software
- Fix bug where transmit timestamps could occasionally be incorrect
- Fixed bug in frame_sim.v where EOF was set incorrectly
- NEW SIGNAL:
tx_err_net: allows the user to intentionally corrupt the FCS (safe to leave unconnected)
- Add support to FDK for synth with Vivado 2017.1
- Add false paths for ExaNIC FDK logic to assist in timing closure
- Add support to FDK for synth with Vivado 2016.3
- Fix bug in flow steering logic for X40 - DMA address decode now applies to upper 4 ports as well.
- Added synthesis support for Vivado 2016.1, 2016.2, was previously 2015.4 only. Note that Exablaze has observed instances where incorrect logic is synthesized using Vivado 2016.1 and 2016.2
- Fixed bug in loopback_example.v (
Changes versus 7 series development kit
The ExaNIC Ultrascale series of development kit includes a number of changes relative to the previous generation (ExaNIC X2 and X4) 7 series development kits. These changes have been made to reduce the latency of the ExaNIC MAC layer and increase the bandwidth of the PCIe interface.
The following are changes that will affect user designs that were targeted to the X2 and X4 development kits:
- Inclusion of a network clock domain (
clk_net), to which all the MAC layer is synchronous. Users who wish to bring across existing applications can make use of the included asynchronous FIFO to perform clock domain crossing.
- Memory bus is now address aligned. This reduces the amount of logic required to decode addresses on the memory bus, and ensures that a write to a byte at a specific address will always have data located in the same byte lane.
- Wider memory bus (32 bytes). The memory bus width has expanded to 32 bytes wide, with a corresponding increase in the write enable bus width.
- Ethernet framing interface case EOF & LEN = 0 now means no bytes valid on that beat. In the previous generation of development kit, this condition meant 8 bytes were valid on that data beat. This change more closely reflects the 10G ethernet end of frame condition and results in slightly lower latency.
_usrsignals have been renamed to
_netto clarify that they are synchronous to the network clock domain (
- CRC fail signal is now valid on the same cycle as EOF. In the previous generation devkits, this signal was valid on the cycle after the EOF.
tx_interfaces are now provided as wide arrays rather as separate interfaces for each port. For example, the 64 byte data path for each port in the network domain is concatenated together. This means that port 0 is
rx_data_net[63:0]and port 1 is
- A new build system that allows users to specify the target and hardware platform, as described above.