AutoFocus has two directories that are important to its operation: the working directory and the web directory. You can have multiple instances of AutoFocus (say analyzing the traffic of different links) running on the same computer and sharing the same AutoFocus directory, but each instance needs to have a working directory and a web directory of its own.
The web directory contains the final results of the analysis: the root html pages and the javascript files giving names to protocols, ports, IP addresses and IP prefixes. Its subdirectories form a multi-level hierarchy representing week, days, three hour intervals, half hour intervals and, within each of these, separate directories for each category. Each directory in this hierarchy eventually has plots of the traffic and the javascript files with the traffic reports.
The files hostnames.js, networks.js, ports.js and protocols.js from the web directory are "user serviceable. You can extend them withe names for the networks and hosts that come up in your traffic often, and the GUI will also display these names. The first matching entry in the networks.js file is used, so be careful not to put a more general prefix before a more specific one. You need not re-run the data processing part after you change these files, just open the main.html file anew.
The categories.txt file allows you to divide the traffic into categories. Each category is defined by one or more lines in the file, with each line specifying a cluster. Each flow belongs to the category associated with the first line in categories.txt it matches, just like with Cisco ACLs. You cannot use ranges for protocol and for ports the only ranges allowed are 1024-65535 and 0-1023. See example_categories.txt for an example of a more complex categories.txt. You can edit this file before first running AutoFocus if you have a good idea of your traffic mix and the dominant categories of traffic. Alternatively, you can use the textual reports generated by the first run as inspiration. The special category called "discard" can be used to discard flow records that you don't want to count against the total. For example if you are get traffic between internal subnets mixed with traffic between your site and the outside, but you are only interested in the later, you can put all traffic from internal subnets to other internal subnets in the "discard" category. Whenever you change the categories.txt file, you must run initialize_working_directory.pl before feeding any more data into process_data.pl. After reinitialization, you can feed it old data too, it will overwrite the old web pages with the new results. It is probably a good idea to keep adjusting the categories.txt file say once every three months to make sure that it keeps track with shifts your traffic mix.
If you use NetFlow data, the active_timeout option in AutoFocusConfig specifies the duration of the longest flow record in seconds. If there are flows in the input longer than this value, AutoFocus might abort. On the other hand if you set this value to high, it might "sit on the data" for too long before generating the output.
If you use traces, you don't have to use ipsumdump to parse them. You
can use any program that generates ASCII output with one packet per
line that looks like the line below.
1043481602.876421 10.56.5.149 172.18.128.183 6 80 63582 1500
The cpu variable is meant to convey the computational power of your CPU to AutoFocus. If you want to analyze large amounts of data, you have many categories, or you CPU is indeed slow, leave it on "slow". Increasing it to "medium" or "large" will instruct AutoFocus to compute reports for all categories down to three hour and half hour time intervals respectively.
If you feel brave enough to go in and hack the perl scripts, here's some information that might help. There are two distinct uses for the traffic data: the plots and the reports. The split_raw_data directory is used for the plots and to build the data in the summaries directory which is used for the reports, but all data goes first to the tmp directory. The tmp directory contains directories with ASCII flow record files, one per category, to which new records might be added. Each record that spans more than a one minute bin is broken up into a record for each bit it is active in. These directories get moved to split_raw_data once we know no more data can land in them. The summaries hierarchy contains compact gzipped ASCII summaries of the traffic called synopses that describe the traffic of the current week, day, three hour interval and half hour interval. These are used to generate the traffic reports that show up in the web based interface.