This tutorial gives some background for the Corsaro plugin architecture and describes how one could go about designing and implementing a new plugin.
Because Corsaro has been designed to make adding a new plugin as easy as possible, there is a minimal API for writing a plugin that can process packets and write output. In addition, some plugins will also implement the Corsaro-In plugin API, allowing the data that they generate to be de-serialized and used by other programs.
An example of a plugin that implements both the Corsaro-Out and Corsaro-In plugin APIs is the FlowTuple plugin included in the Corsaro distribution. The FlowTuple plugin extracts a key comprised of several values from the packet header and maintains psuedo-flows of packets that match this key. The optimized binary output generated by FlowTuple uses as little storage space as possible, but at the expense of human readability. Due to this, the FlowTuple plugin also implements the Corsaro-In API to allow other tools to further process FlowTuple data. An example of such a tool is cors-ft-aggregate which can re-aggregate FlowTuple distributions based on sub-keys.
Because of the bidirectional nature of corsaro plugins (Corsaro-Out and Corsaro-In ), this tutorial is split into three sections - the first describes the general process for creating and bootstrapping a new plugin, the second describes the API to be implemented to process packets and write output (implementing the Corsaro-Out plugin API), while the third section describes the API for reading existing corsaro plugin output data (the Corsaro-In API).
Bootstrapping a new plugin generally consists of two steps:
To ready a plugin for analysis code to be added, several steps must be followed:
Each corsaro plugin must have both interface (.h) and implementation (.c) files. Currently, the convention is for these files to be placed in the libcorsaro/plugins/
directory to keep them separate from the base corsaro code.
There is also a strict naming scheme which allows the plugin macros to function correctly. That is, a plugin has both a name and an ID. The name is used for assigning a namespace to the plugin and also for output file names, logs, etc. The ID is used to allow the plugin manager to maintain state about each plugin currently in operation.
Each plugin is free to choose any name provided it does not conflict with existing plugins. We suggest keeping names as brief as possible while conveying the purpose of the plugin (remember, this is how you will identify the file(s) each plugin has generated). For example, we use the name 'flowtuple' for our plugin which generates distributions of key fields (tuples) in the packet headers.
As mentioned earlier, the plugin name provides a unique namespace for each plugin. To this end, the source files for each plugin must follow the following format:
Building on our previous example, the FlowTuple code is located in the following two files:
The simplest of the two files needed for each plugin is the interface (.h) file. A minimal plugin needs only a single line in the header file:
The CORSARO_PLUGIN_GENERATE_PROTOS macro is located in corsaro_plugin.h and will expand to all of the required function prototypes to satisfy the plugin API. This is another example of why the plugin name is important - this macro assumes that the functions which implement the plugin API are prefixed with the name of the plugin. This can be seen when we look at the function prototypes that would be generated (by the pre-processor) for the FlowTuple plugin:
These prototypes specify the names of the functions that the implementation file must contain.
If you look at the actual header file (corsaro_flowtuple.h) for the FlowTuple plugin, you will find lots of other code also - this is used for the Corsaro-In API and is discussed in the Implementing the Corsaro-In API section.
Moving to the implementation (.c) file, there is two data structures that we recommend you define before implementing the API functions. These provide some state for the plugin (because there can potentially be multiple instances of Corsaro used simultaneously) and helper macros for accessing that state.
The first structure that is needed is an instance of corsaro_plugin which describes the plugin. This will be copied by the plugin manager when an instance of the plugin is created, so should be declared as static
. The plugin name and magic number are at the discretion of the plugin so long as they are unique across plugins. The plugin ID must match that which is listed in corsaro_plugin.h (adding a new ID is described in Integrating a New Plugin section). The remainder of the fields can be filled in using the CORSARO_PLUGIN_GENERATE_PTRS macro. The following is the corsaro_plugin definition for the FlowTuple plugin:
The second structure needed is a structure that will hold any per-instance state needed by the plugin. For example, pointers to output files, data structures for analysis, etc. This structure can be in any format, the pointer will be cast to void
by the plugin manager, but using a well-structured name will allow the use of convenience macros described below for retrieving the state from the plugin manager. The following is the definition of the state structure for the FlowTuple plugin:
If implementing the Corsaro-In API, the plugin will also require a state structure to use in Corsaro-In mode. While these two may be the same structure, we suggest keeping them separate to avoid confusion. The following is the Corsaro-In state structure for the FlowTuple plugin:
As mentioned earlier, there are two macros provided by the plugin manager (corsaro_plugin.h) which assist with the retrieval of plugin and state structures from the plugin manager. These are:
type
field assumes a state structure which is named corsaro_<type>_state_t
While these can be used alone to retrieve state and plugin structures from the plugin manager, it might be useful to wrap them in additional macros which are specific to the plugin being created. The core plugins included in the Corsaro distribution all define additional macros: one to retrieve the plugin structure, one to retrieve the Corsaro-Out state structure, and optionally one to retrieve the Corsaro-In state structure. The FlowTuple plugin defines all three of these as follows:
Each of these simply takes a pointer to the corsaro (or corsaro_in) structure and returns the appropriate pointer.
Once these structures and macros have been defined, stub API functions defined by the CORSARO_PLUGIN_GENERATE_PROTOS macro should be created, and then the plugin can be integrated into Corsaro.
There are 4 files that must be updated in order to include a new corsaro plugin:
configure.ac
libcorsaro/plugins/Makefile.am
libcorsaro/corsaro_plugin.c
libcorsaro/corsaro_plugin.h
configure.ac
contains a list of plugins that are available for compilation into Corsaro. Each plugin that is to be made available must be declared using the ED_WITH_PLUGIN
macro. This macro takes four arguments, the full name of the plugin (e.g. corsaro_pcap
), the short name of the plugin (e.g. pcap
), the 'macro' name for the plugin (e.g. PCAP
), and whether the plugin should be enabled by default (yes
or no
). The Raw pcap plugin is declared as follows:
Note that the order plugins are declared in this file is the default order in which they will be run. That is, a plugin declared after all others will be given a packet to process once all other plugins have processed it.
Using the values defined by the ED_WITH_PLUGIN
macro declaration in the configure.ac
file, add the plugin to libcorsaro/plugins/Makefile.am
as follows:
The corresponding Makefile.am
entry for the Raw pcap plugin ED_WITH_PLUGIN
example given above is:
The code that needs to be added to libcorsaro/corsaro_plugin.c
is very similar to that which was added to libcorsaro/plugins/Makefile.am
in the previous section. The format is as follows:
Using the Raw pcap example again, the entry would be:
The final change that must be made is to add a unique ID for the plugin to the corsaro_plugin_id enum in libcorsaro/corsaro_plugin.h
.
The ID value can be any number not taken, but the plugin manager will allocate memory to hold plugins for every possible value up to the maximum, so we suggest keeping this number reasonably small. The ID value does not affect the order in which plugins are run, this is (currently) determined either by the order of the ED_WITH_PLUGIN
macros in configure.ac
or at run-time by using corsaro_enable_plugin. The ID value defined for a plugin in this list must be the same value given in the corsaro_plugin structure described in the Implementation Code section.
For example, the ID definition for the Raw pcap plugin is:
At this point, the (stub) plugin is fully integrated into Corsaro, and should compile cleanly. Because we have altered files needed by autoconf, the autoreconf -vfi
command should be used to regenerate configure
and each Makefile
. Also remember that unless the ED_WITH_PLUGIN
macro in configure.ac
has a yes
as the final argument, the plugin will need to be explicitly enabled by passing --with-<short_name>
to configure
(e.g. --with-pcap
).
To fully rebuild everything, use the following:
There is also a build_latest.sh
script included in the distribution which will take care of these tasks, and also build a distribution tarball that can be tested on another system.
This section describes each of the functions that must be implemented to fully comply with the Corsaro-Out API. Each function has some code snippets that are almost always used. The actual function implementations will have corsaro_<plugin_name>_
prefixed to the names. For example, the Raw pcap implementation of the alloc function is called corsaro_pcap_alloc
.
The required functions are:
The alloc function is called when the plugin manager needs to instantiate a new instance of a plugin. If you followed the steps earlier in this tutorial (see Implementation Code), then this function should simply return a pointer to the static corsaro_plugin structure. The plugin manager will then make a copy for this specific instance.
For example, the Raw pcap plugin uses the following static corsaro_plugin structure definition and alloc function:
The init_output function is called by Corsaro to ready a plugin for use in the Corsaro-Out mode. This event should be used to establish any state necessary for analyzing packets. Plugins should expect the next event to be start_interval .
We will use the RS DoS plugin as an example for a simple init_output function which establishes some state, and opens an output file to write data to. The full corsaro_dos_init_output function is as follows:
Breaking it down, there are three important steps:
Because Corsaro is designed such that multiple instances of a plugin can be used at once, per-instance state must not be stored in static structures. Instead we create a special state structure and register it with the plugin manager so that it can be retrieved at any time it is needed.
Corsaro provides a utility function, malloc_zero, which will allocate and zero a block of memory. If the malloc
fails, we issue a log message using corsaro_log and jump to the err
block which simply calls the corsaro_dos_close_output
function described in the next section.
To register the newly created state structure with the plugin manager, we simply call the corsaro_plugin_register_state function:
This function takes three arguments, a pointer to the plugin manager (contained in the corsaro state structure), a pointer to the plugin which is registering the state, and a void pointer to a state structure. We use the PLUGIN
macro which was described in Implementation Code to retrieve a pointer to the plugin structure.
Because the RS DoS plugin uses a generic output file, we simply used the corsaro_io_prepare_file function as described in IO Framework . We store the returned corsaro_file pointer in the state
structure for later use. If the file could not be opened, we issue a log message and jump to the err
block to clean up and return an error code.
These are the common steps that nearly every plugin will follow when initializing for output. At this time you should also establish any other state required. For example, the RS DoS plugin initializes a hash table and stores a pointer in the state structure:
The close_output function should free any state that was established in the init_output function (or during processing). The close_output function should also be able to free partial state - i.e. when an error occurs during initialization, and as such some (or all) of the state is unallocated. The corresponding close_output function to the _init_output_function shown above for RS DoS plugin is as follows:
It starts by using the STATE
macro to retrieve a pointer to the state structure stored for this plugin, if this pointer is not NULL
, it then proceeds to free the hashtable and output file (if they are still allocated). The output file is closed using the corsaro_file_close function described in the IO Framework section. Once the state has been free, the state structure itself is freed by calling corsaro_plugin_free_state and passing a pointer to the plugin manager, and to 'itself' - the structure for the current plugin.
The start_interval function is used by Corsaro to inform a plugin that a new interval (see the Intervals section) has begun. The plugin may need to initialize some per-interval state at this point. It is perfectly acceptable to ignore a start_interval event (i.e. simply return 0
) if the plugin has no use for it.
The end_interval function is used by Corsaro to inform a plugin that the current interval is ending. The plugin may need to write out aggregated data (using the Corsaro IO Framework ) for the interval, and possibly free some per-interval state at this point. As with the start_interval event, a plugin may safely ignore an end_interval event.
The process_packet event is likely where the majority of a plugin's analysis code will go. The function is called once for every packet that Corsaro processes. Corsaro provides two arguments to the function - a pointer to the current corsaro state structure, from which the plugin and plugin state structures can be retrieved, and also a pointer to a corsaro_packet structure.
A corsaro_packet is a lightweight wrapper around a libtrace packet and a corsaro_packet_state structure. The libtrace packet encapsulates the actual packet that was captured, and all the regular libtrace API functions may be used on it. See the libtrace API documentation for more details. Corsaro provides a simple macro, LT_PKT, which dereferences the corsaro_packet pointer to allow access to the libtrace packet. For example, the RS DoS plugin uses this simple check to ensure that there are no non-IPv4 packets in the trace:
The corsaro_packet_state structure which is contained in the corsaro_packet structure is used to pass meta-data about a packet down the plugin chain. Currently, corsaro_packet_state structure (in corsaro_int.h) must be altered to store new meta-data. This will likely change in a future version. To make use of the packet state, a plugin would add an appropriate field to the structure, and then after some analysis of a packet, set the field to an appropriate value. The state is then passed down the plugin chain, and successive plugins can leverage the meta-data derived by the earlier plugin.
For example, if we had some code to perform geolocation of IP addresses, we could augment each packet with the latitude and longitude of the source. First we would add fields to the corsaro_packet_state structure:
Then in the process_packet function, our geolocation plugin would take the packet provided, find the geolocation data for the source address, and store the latitude and longitude in the packet state structure:
Under Development
This section describes each of the functions that must be implemented to fully comply with the corsaro-in API. Each function has some code snippets that are almost always used. The actual function implementations will have corsaro_<plugin_name>_ prefixed to the names.
This section assumes that the corsaro-in API has already been fully implemented (in order to have generated the data), but it is theoretically possible to only implement the alloc function from that API and still have the corsaro-in API function correctly.
The required functions are:
@ theres a nice function in corsaro_plugin.c that will probably do this for you if you followed the conventions about how to name the output file
@ provide the code snippet to demonstrate the peek function