Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
www.caida.org > publications : papers : 2019 : learning_regexes_extract_router
Learning Regexes to Extract Router Names from Hostnames
M. Luckie, B. Huffaker, and k. claffy, "Learning Regexes to Extract Router Names from Hostnames", in ACM Internet Measurement Conference (IMC), Oct 2019.

The data supplement is designed to be used with sc_hoiho, which is included as part of scamper.

|   View full paper:    PDF    Data Supplement    DOI    Related Presentation    |  Citation:    BibTeX    Resource Catalog   |

Learning Regexes to Extract Router Names from Hostnames

Matthew Luckie2
Bradley Huffaker1
kc claffy1
1

CAIDA, San Diego Supercomputer Center, University of California San Diego

2

University of Waikato

We present the design, implementation, evaluation, and validation of a system that automatically learns to extract router names (router identifiers) from hostnames stored by network operators in different DNS zones, which we represent by regular expressions (regexes). Our supervised-learning approach evaluates automatically generated candidate regexes against sets of hostnames for IP addresses that other alias resolution techniques previously inferred to identify interfaces on the same router. Conceptually, if three conditions hold: (1) a regex extracts the same value from a set of hostnames associated with IP addresses on the same router; (2) the value is unique to that router; and (3) the regex extracts names for multiple routers in the suffix, then we conclude the regex accurately represents the naming convention for the suffix. We train our system using router aliases inferred from active probing to learn regexes for 2550 different suffixes.

We then demonstrate the utility of this system by using the regexes to find 105% additional aliases for these suffixes. Regexes inferred in IPv4 perfectly predict aliases for ≈85% of suffixes with IPv6 aliases, i.e., IPv4 and IPv6 addresses representing the same underlying router, and find 9.0 times more routers in IPv6 than found by prior techniques.

Keywords: Hoiho, measurement methodology, regular expression learning, software/tools, topology
  Last Modified: Wed Dec-15-2021 16:33:58 UTC
  Page URL: https://www.caida.org/publications/papers/2019/learning_regexes_extract_router/index.xml