Exablaze logo

The ExaNIC Sockets acceleration library allows applications to benefit from the low latency of direct access to the ExaNIC without requiring modifications to the application. This is achieved by intercepting calls to the Linux socket APIs.

While ExaNIC Sockets should be compatible with most applications using Linux socket APIs, there are some cases where programs may not work as expected. Feedback and bug reports would be greatly appreciated (contact us at https://exablaze.com/support).

Software installation

Build the ExaNIC driver and libraries as per the ExaNIC Installation and Configuration Guide. ExaNIC Sockets is built and installed as a standard component, and the exasock kernel module is loaded automatically when an ExaNIC interface is brought up.

Usage

First ensure that the application works without ExaNIC Sockets. All IP addresses should be configured as if you were running the application through the normal Linux network interface corresponding to the ExaNIC.

Then, to accelerate the application, simply prefix it with the exasock command. For example, to run the UNIX netcat (nc) utility to listen for UDP datagrams on port 1234:

$ exasock nc -u -l 1234

Another simple example application that receives and sends UDP multicast datagrams is located in the ExaNIC source code distribution (examples/exasock/multicast-echo.c). Note that this is a normal Linux sockets application that can be run either with or without the ExaNIC Sockets acceleration library.

Sometimes it can be difficult to determine if the kernel bypass is functioning correctly. Setting the EXASOCK_DEBUG environment variable prints extra debugging information that can help. For example:

$ EXASOCK_DEBUG=1 exasock nc -u -l 1234
exasock: enabled bypass on fd 4

In this case, the message exasock: enabled bypass on fd 4 indicates that kernel bypass has been enabled for the socket associated with file descriptor 4.

Displaying Exanic sockets accelerated connections

To provide insight into the current ExaNIC accelerated socket connections the utility exasock-stat is provided. By default running exasock-stat will display all accelerated UDP and TCP, listening and connected sockets.

The exasock-stat application was introduced with the v2.0.0 ExaNIC driver and software package and can be found within the util directory. This application will build as part of the utils build however the libnl3-devel package is not present by default in which case building exasock-stat will be skipped. To ensure it can be built first run the appropriate command to install the missing libnl-3-dev package e.g. sudo apt-get install libnl-3-dev or sudo yum install libnl3-devel.x86_64 then re-run make && make install.

Then when running:

exasock-stat

you should see a table similar to the following:

Active ExaNIC Sockets accelerated connections (servers and established):
  Proto | Recv-Q   | Send-Q   | Local Address            | Foreign Address    | State
  UDP   | 0        | 0        | 192.168.10.10:12345      | *:*                | -

The coloumns shown are:

Proto:
   The protocol used by the socket (TCP or UDP)
Recv-Q:
   Connected: The count of bytes not copied by the user program connected to this socket
   Listening: The count of connections waiting to be accepted by the user program
Send-Q:
   Connected: The count of bytes not acknowledged by the remote host
   Listening: N/A
Local Address:
   Address and port number of the local end of the socket
Foreign Address:
   Address and port number of the remote end of the socket
State:
   The state of the socket


Extended Output not shown (-e/--extend enabled):
User:
   The username or the user id (UID) of the owner of the socket
PID:FD:
   PID of the process that owns the socket and value of the socket's file descriptor
Program:
   Process name of the process that owns the socket

Exactly what the application displays can be controlled by providing arguements from the command line. To see the arguements available run exasock-stat --help

Disabling acceleration per-socket

If only the ExaNIC Sockets acceleration library is used, then each socket bound to either an ExaNIC interface or to a wildcard address (INADDR_ANY) gets automatically accelerated (ie kernel is bypassed to allow direct access to the ExaNIC).

As of exasock version 2.0.0 it is possible to disable default acceleration on a given socket, even if bound to an ExaNIC interface (or bound to a wildcard address or joined a multicast group with an ExaNIC interface).

In order to use this feature the application is required to include the <exasock/socket.h> header file and to disable the acceleration as needs be for each socket. This is done by either setting the exasock private SO_EXA_NO_ACCEL socket option, or alternatively by calling the exasock_disable_acceleration() helper function.

Disabling acceleration on a socket is not allowed if the socket has already been accelerated (either by binding it to an ExaNIC interface or joining a multicast group with an ExaNIC interface).

Once acceleration has been disabled on a socket, it can no longer be re-enabled.

Documentation for both the exasock private socket option and the helper function can be found in the header file <exasock/socket.h>.

Multicast sockets

Versions of exasock older than 2.0.0 automatically accelerate each socket bound to a multicast address. Newer versions (2.0.0 and beyond) accelerate a multicast socket only if joined a multicast group (via IP_ADD_MEMBERSHIP socket option) with an ExaNIC interface. For any accelerated socket exasock version 2.0.0 (or later) receives multicast packets only from the interface with which the socket has joined the multicast group.

Warning exasock 2.0.0 and later: If a socket bound to a wildcard address (INADDR_ANY) is to be used for receiving multicast traffic, it is worth to keep in mind that it will always be accelerated. Multicast packets are going to be discarded on this socket unless it has been set with IP_ADD_MEMBERSHIP option to join given multicast group and multicast packets are arriving through the ExaNIC interface specified in the IP_ADD_MEMBERSHIP configuration.

Warning exasock 2.0.0 and later: If a socket bound to a multicast address but not associated with an ExaNIC interface through IP_ADD_MEMBERSHIP option is to be used, then it will not get accelerated. It will receive multicast packets through the native kernel networking stack instead.

exasock extension API

exasock 1.7.0 and later include a library called exasock_ext that allows user applications to detect that they are running under exasock and access functionality beyond the standard Linux socket calls. In particular, the current version provides a TCP acceleration feature that allows programmers to achieve even lower TCP latencies than possible with the normal send/sendto/sendmsg APIs.

Using this TCP acceleration feature, an application can construct partial or complete TCP packets ahead of time. These pre-built packets can then be transmitted through the lower level libexanic library, or can even be pushed to a user FPGA application on the ExaNIC card for ultra-low latency responses to triggers. Documentation for the extension functions is in the header file <exasock/extensions.h>, and an example program is located in the ExaNIC source distribution (examples/exasock/tcp-raw-send.c).

Warning If you are using the exasock TCP extension then you need to ensure that you do not modify the packet between exasock_tcp_calc_checksum() and exanic_transmit_frame() calls.

Warning If you are using the exasock TCP extension then you need to ensure that if you have prepared a TCP frame for transmission, that it is the next frame transmitted. If, for example, another TCP frame is sent via exasock using send(), then the prepared TCP frame will have the incorrect sequence number etc., and so must be discarded.

Known issues and limitations

  • Each thread that calls a blocking I/O call - e.g. select(), poll(), epoll_wait(), recv(), read() or accept() - will spin waiting on data. This normally provides optimal latency but can induce performance problems if there are more threads than available CPUs. Other blocking modes will be provided in the future.
  • If a socket is bound to a wildcard address (INADDR_ANY), it will only receive packets that arrive on ExaNIC interfaces when run with the acceleration library.
  • If exasock version older than 2.0.0 is used and a socket is bound to a multicast address, or exasock version 2.0.0 or newer is used and a socket has joined a multicast group (IP_ADD_MEMBERSHIP socket option) with an ExaNIC interface, it will only receive packets that arrive on ExaNIC interfaces when run with the acceleration library.
  • Connecting to an accelerated socket from the same host is not supported (for example, if a socket is bound to 192.168.1.1:80, then it is not possible to connect to 192.168.1.1:80 from the local host).
  • Transmitted multicast datagrams are not looped back to local sockets.
  • The MSG_WAITALL flag to recv() is not currently supported (to be resolved).
  • No support for recursive addition of epoll file descriptors to epoll sets.
  • No support for IP fragmentation.
  • Sockets may not be correctly maintained across fork() or execve().
  • Sockets cannot be transferred to other processes with sendmsg().
  • Some third party libraries, such as Google's tcmalloc, can crash when used with exasock. These crashes can occur when such libraries assume that they are the only library being preloaded with LD_PRELOAD, or that they are the last code to execute before the application's initializers and main() are called. A workaround is to build the application with the third party library linked statically, rather than dynamically. Alternatively many of these crashes can be avoided by manually ensuring that exasock is loaded after such libraries. For example, to run a program abc.bin with both exasock accelerated sockets and tcmalloc's fast malloc(), instead of using the exasock script, simply invoke the program as:
$ LD_PRELOAD="/path/to/libtcmalloc.so /path/to/libexasock_preload.so" abc.bin

Tips for best performance

  • Wherever possible, do not mix accelerated sockets with non-accelerated sockets and other file descriptors in select() and poll() calls.
  • For the best possible performance, pin threads to CPU cores in the CPU socket directly connected to the ExaNIC.