Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
www.caida.org > data : passive : codered_worms_dataset.xml
UCSD Network Telescope -- Code-Red Worms Dataset

|   Data Sources:    Passive    Active    Other    External   |

The Dataset on the Code-Red Worms

The first incarnation of the Code-Red worm (CRv1) began to infect hosts running unpatched versions of Microsoft's IIS webserver on July 12th, 2001. The first version of the worm uses a static seed for it's random number generator. Then, around 10:00 UTC in the morning of July 19th, 2001, a random seed variant of the Code-Red worm (CRv2) appeared and spread. This second version shared almost all of its code with the first version, but spread much more rapidly. Next, on August 4th, a new worm began to infect machines exploiting the same vulnerability in Microsoft's IIS webserver as the original Code-Red virus. Although the new worm had no relationship to the first one outside of exploiting the same vulnerability, it contained in its source code the string "CodeRedII" and was thus named CodeRed II. Finally, on September 18, 2001, the Nimda worm began to spread via backdoors left by CodeRedII, as well as via email, open network shares, and compromised web sites.

This dataset contains information useful for studying the spread of the Code-Red version 2, and CodeRedII worms. The dataset consists of a publicly available set of files that contain summarized information that does not individually identify infected computers.

Data included in the Code-Red Dataset are:

  • Code-Red July: the first Code-Red version 2 outbreak (July 19-20, 2001)
    • distribution of start and end times of hosts performing port 80 TCP SYN scanning
    • distribution of durations of time code-redv2-infected computers were observed to be scanning
    • country distribution of code-redv2-infected computers
    • a file containing a table with the following eight tab-separated fields for each observed IP address: start time, end time, top-level domain, country, latitude, longitude, AS number, and AS name
  • Code-Red August: the second Code-Red version 2 outbreak and beginning of the spread of the CodeRedII worm (August 1-20, 2001)
    • distribution of start and end times of hosts performing port 80 TCP SYN scanning
    • distribution of durations of time code-redv2-infected computers were observed to be scanning
    • country distribution of code-redv2-infected computers
    • a file containing a table with the following seven tab-separated fields for each observed IP address: start time, end time, top-level domain, country, latitude, longitude, AS number

Data source:

  • Code-Red July:
    The data source for this dataset includes packet headers collected from a /8 network at UCSD (the UCSD Network Telescope), timestamp/IP address pairs for TCP SYN packets received by two /16 networks at Lawrence Berkeley Laboratory (LBL), and sampled netflow from a router upstream of the /8 network at UCSD. These three data sources are used to maximize coverage of the expansion of the worm. Between midnight and 16:30 UTC, a passive network monitor recorded headers of all packets destined for the /8 research network. After 16:30 UTC, a filter installed on a campus router to reduce congestion caused by the worm blocked all external traffic to this network. Because this filter was put into place upstream of the monitor, we were unable to capture IP packet headers after 16:30 UTC. However, a second UCSD data set consisting of sampled netflow output from the filtering router was available at the UCSD site throughout the 24 hour period. Vern Paxson provided probe information collected by Bro on the LBL networks between 10:00 UTC on July 19, 2001 and 7:00 on July 20, 2001. We have merged these three sources into to produce the Code-Red July dataset.
  • Code-Red August:
    The data source for this dataset includes only packet headers collected by a passive monitor on a /8 network at UCSD (the UCSD Network Telescope). Beginning August 4th, this data contains a mix of hosts infected by Code-Red version 2 and CodeRedII. It is not possible to determine which worm caused a host to send TCP SYN packets to port 80.

Caveats that apply to this dataset:

  • The .ida vulnerability utilized by the Code-Red worms was exploited via TCP connections to port 80. Because the UCSD Network Telescope did not respond to connection attempts, this dataset does not consist solely of worm traffic. All TCP SYN packets to port 80 received are included in these summaries, including non-worm traffic.
  • The DHCP Effect significantly impacts this dataset, particularly after the first 24 hours of each cycle of worm spread. Changing IP addresses on dynamically addressed machines cause an order of magnitude difference between the number of IP addresses active in any two hour period and the number of IP addresses active in a week. This dataset does not include IP address, so keep in mind that each start/end time or duration does not necessarily uniquely identify an infected computer. It identifies only a newly active IP address, with no information about whether that IP address represents a computer previously known to be infected.

Referencing this Dataset

When referencing this data, please use:

The UCSD CAIDA Dataset on the Code-Red Worms - July and August 2001,
https://www.caida.org/data/passive/codered_worms_dataset.xml.
Also, please, report your publication to CAIDA.

UCSD Network Telescope Datasets

References

For more information on the Code-Red-related worms (Code-Redv1, Code-Redv2, CodeRedII), see:

  Last Modified: Tue Oct-13-2020 22:22:00 UTC
  Page URL: https://www.caida.org/data/passive/codered_worms_dataset.xml