The ExaNIC Sockets acceleration library allows applications to benefit from the low latency of direct access to the ExaNIC without requiring modifications to the application. This is achieved by intercepting calls to the Linux socket APIs.
While ExaNIC Sockets should be compatible with most applications using Linux socket APIs, there are some cases where programs may not work as expected. Feedback and bug reports would be greatly appreciated (contact us at https://exablaze.com/support).
Software installation
Build the ExaNIC driver and libraries as per the ExaNIC Installation and Configuration Guide. ExaNIC Sockets is built and installed as a standard component, and the exasock kernel module is loaded automatically when an ExaNIC interface is brought up.
Usage
First ensure that the application works without ExaNIC Sockets. All IP addresses should be configured as if you were running the application through the normal Linux network interface corresponding to the ExaNIC.
Then, to accelerate the application, simply prefix it with the exasock
command. For example, to run the UNIX netcat (nc
) utility to listen
for UDP datagrams on port 1234:
$ exasock nc -u -l 1234
Another simple example application that receives and sends UDP multicast
datagrams is located in the ExaNIC source code distribution
(examples/exasock/multicast-echo.c
). Note that this is a normal Linux
sockets application that can be run either with or without the ExaNIC
Sockets acceleration library.
Sometimes it can be difficult to determine if the kernel bypass is
functioning correctly. Setting the EXASOCK_DEBUG
environment variable
prints extra debugging information that can help. For example:
$ EXASOCK_DEBUG=1 exasock nc -u -l 1234
exasock: enabled bypass on fd 4
In this case, the message exasock: enabled bypass on fd 4
indicates
that kernel bypass has been enabled for the socket associated with file
descriptor 4.
Displaying Exanic sockets accelerated connections
To provide insight into the current ExaNIC accelerated socket connections the utility exasock-stat
is provided. By default running exasock-stat
will display all accelerated UDP and TCP, listening and connected sockets.
The exasock-stat
application was introduced with the v2.0.0 ExaNIC driver and software package and can be found within the util
directory.
This application will build as part of the utils build however the libnl3-devel
package is not present by default in which case building exasock-stat
will be skipped.
To ensure it can be built first run the appropriate command to install the missing libnl-3-dev package e.g. sudo apt-get install libnl-3-dev
or sudo yum install libnl3-devel.x86_64
then re-run make && make install
.
Then when running:
exasock-stat
you should see a table similar to the following:
Active ExaNIC Sockets accelerated connections (servers and established):
Proto | Recv-Q | Send-Q | Local Address | Foreign Address | State
UDP | 0 | 0 | 192.168.10.10:12345 | *:* | -
The coloumns shown are:
Proto:
The protocol used by the socket (TCP or UDP)
Recv-Q:
Connected: The count of bytes not copied by the user program connected to this socket
Listening: The count of connections waiting to be accepted by the user program
Send-Q:
Connected: The count of bytes not acknowledged by the remote host
Listening: N/A
Local Address:
Address and port number of the local end of the socket
Foreign Address:
Address and port number of the remote end of the socket
State:
The state of the socket
Extended Output not shown (-e/--extend enabled):
User:
The username or the user id (UID) of the owner of the socket
PID:FD:
PID of the process that owns the socket and value of the socket's file descriptor
Program:
Process name of the process that owns the socket
Exactly what the application displays can be controlled by providing arguements from the command line. To see the arguements available run exasock-stat --help
Disabling acceleration per-socket
If only the ExaNIC Sockets acceleration library is used, then each socket
bound to either an ExaNIC interface or to a wildcard address
(INADDR_ANY
) gets automatically accelerated (ie kernel is bypassed to
allow direct access to the ExaNIC).
As of exasock version 2.0.0 it is possible to disable default acceleration on a given socket, even if bound to an ExaNIC interface (or bound to a wildcard address or joined a multicast group with an ExaNIC interface).
In order to use this feature the application is required to
include the <exasock/socket.h>
header file and to disable the acceleration
as needs be for each socket. This is done by either setting the exasock private
SO_EXA_NO_ACCEL
socket option, or alternatively by calling the
exasock_disable_acceleration()
helper function.
Disabling acceleration on a socket is not allowed if the socket has already been accelerated (either by binding it to an ExaNIC interface or joining a multicast group with an ExaNIC interface).
Once acceleration has been disabled on a socket, it can no longer be re-enabled.
Documentation for both the exasock private socket option and the helper function can
be found in the header file <exasock/socket.h>
.
Multicast sockets
Versions of exasock older than 2.0.0 automatically accelerate each socket bound
to a multicast address. Newer versions (2.0.0 and beyond) accelerate a multicast
socket only if joined a multicast group (via IP_ADD_MEMBERSHIP
socket option)
with an ExaNIC interface. For any accelerated socket exasock version 2.0.0 (or
later) receives multicast packets only from the interface with which the socket
has joined the multicast group.
Warning
exasock 2.0.0 and later:
If a socket bound to a wildcard address (INADDR_ANY
) is to be used for
receiving multicast traffic, it is worth to keep in mind that it will always be
accelerated. Multicast packets are going to be discarded on this socket unless
it has been set with IP_ADD_MEMBERSHIP
option to join given multicast group
and multicast packets are arriving through the ExaNIC interface specified in the
IP_ADD_MEMBERSHIP
configuration.
Warning
exasock 2.0.0 and later:
If a socket bound to a multicast address but not associated with an ExaNIC
interface through IP_ADD_MEMBERSHIP
option is to be used, then it will not get
accelerated. It will receive multicast packets through the native kernel
networking stack instead.
exasock extension API
exasock 1.7.0 and later include a library called exasock_ext
that
allows user applications to detect that they are running under exasock
and access functionality beyond the standard Linux socket calls. In
particular, the current version provides a TCP acceleration feature that
allows programmers to achieve even lower TCP latencies than possible
with the normal send/sendto/sendmsg APIs.
Using this TCP acceleration feature, an application can construct
partial or complete TCP packets ahead of time. These pre-built packets
can then be transmitted through the lower level libexanic library, or
can even be pushed to a user FPGA application on the ExaNIC card for
ultra-low latency responses to triggers. Documentation for the extension
functions is in the header file <exasock/extensions.h>
, and an
example program is located in the ExaNIC source distribution
(examples/exasock/tcp-raw-send.c
).
Warning
If you are using the exasock TCP extension then you need to ensure that you do not modify the packet between exasock_tcp_calc_checksum()
and exanic_transmit_frame()
calls.
Warning
If you are using the exasock TCP extension then you need to ensure that if you have prepared a TCP frame for transmission, that it is the next frame transmitted. If, for example, another TCP frame is sent via exasock using send()
, then the prepared TCP frame will have the incorrect sequence number etc., and so must be discarded.
Known issues and limitations
- Each thread that calls a blocking I/O call - e.g.
select()
,poll()
,epoll_wait()
,recv()
,read()
oraccept()
- will spin waiting on data. This normally provides optimal latency but can induce performance problems if there are more threads than available CPUs. Other blocking modes will be provided in the future. - If a socket is bound to a wildcard address (
INADDR_ANY
), it will only receive packets that arrive on ExaNIC interfaces when run with the acceleration library. - If exasock version older than 2.0.0 is used and a socket is bound to
a multicast address, or exasock version 2.0.0 or newer is used and
a socket has joined a multicast group (
IP_ADD_MEMBERSHIP
socket option) with an ExaNIC interface, it will only receive packets that arrive on ExaNIC interfaces when run with the acceleration library. - Connecting to an accelerated socket from the same host is not supported (for example, if a socket is bound to 192.168.1.1:80, then it is not possible to connect to 192.168.1.1:80 from the local host).
- Transmitted multicast datagrams are not looped back to local sockets.
- The
MSG_WAITALL
flag torecv()
is not currently supported (to be resolved). - No support for recursive addition of epoll file descriptors to epoll sets.
- No support for IP fragmentation.
- Sockets may not be correctly maintained across
fork()
orexecve()
. - Sockets cannot be transferred to other processes with
sendmsg()
. - Some third party libraries, such as Google's tcmalloc, can crash when used with exasock.
These crashes can occur when such libraries assume that they are the only library being preloaded with
LD_PRELOAD
, or that they are the last code to execute before the application's initializers andmain()
are called. A workaround is to build the application with the third party library linked statically, rather than dynamically. Alternatively many of these crashes can be avoided by manually ensuring that exasock is loaded after such libraries. For example, to run a programabc.bin
with both exasock accelerated sockets and tcmalloc's fastmalloc()
, instead of using the exasock script, simply invoke the program as:
$ LD_PRELOAD="/path/to/libtcmalloc.so /path/to/libexasock_preload.so" abc.bin
Tips for best performance
- Wherever possible, do not mix accelerated sockets with
non-accelerated sockets and other file descriptors in
select()
andpoll()
calls. - For the best possible performance, pin threads to CPU cores in the CPU socket directly connected to the ExaNIC.
This page was last updated on Apr-12-2018.