Winging Network Automation like a champ.
When interacting with Network Devices, one of the first challenges aspiring Network Automation engineers must face is how to read data coming from the CLI in a programmatic way. This process is often referred to as text parsing.
Historically, Network Engineers resorted to shell scripting
and Unix text processors like Awk, Grep and Sed in order to extract
data from plain text, but even reading a single field from a block
of text can easily become a long sequence of pipes, awks and greps
to get the information you need. Worse still, it may be you need
not one but several of the fields contained within a block of
text. Let’s take a look at the output you’d get from typing
a show ip interface brief
on a Cisco IOS device:
# show_ip_interface_brief.txt
Interface IP-Address OK? Method Status Protocol
GigabitEthernet1 10.10.20.48 YES NVRAM up up
GigabitEthernet2 unassigned YES NVRAM administratively down down
GigabitEthernet3 unassigned YES NVRAM administratively down down
Loopback21 unassigned YES unset up up
Loopback2050 192.168.60.50 YES manual up up
How would you parse this for the interface status of each of the interfaces in the above example using unix tools? One first attempt would be to use AWK and specify space as a separator:
# Print fifth column of the table, using empty space as a separator
awk -F ' ' '{print $5}' show_ip_interface_brief.txt
Of course, running the above command has its problems. For starters, the status field itself may contain spaces; in fact, that’s in display on the second and third lines of the output, where the status field is administratively down! This means we’d actually be getting the following results from awk:
Status
up
administratively
administratively
up
up
Not exactly what we wanted.
An alternative would be to use some sort of regular expression matching. We can determine a pattern from the text and write a Regular expression that will match specific patterns like IP addresses, hostnames, interface names and status keywords.
The following expression, for example, can be used to identify a Cisco Interface name.
# ^[A-Z] means the line can begin with any uppercase character
# [A-Za-z]+ means we can have one or more upper and lower case characters
# [/0-9]+ means we can have a combination of numbers and slashes
^[A-Z][A-Za-z]+[/0-9]+
By extension, one Regular expression that would match all fields in the file show_ip_interface_brief.txt
would look something like:
^([A-Z][A-Za-z]+[/0-9]+)\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|unassigned)\s+([A-Z]+)\s+([A-Za-z]+)\s+(up|administratively down)\s+(up|down)
Notice we repeated the expression we built for matching interface names at the very beginning of our
expression above. In this case the output of show ip interface brief
begins with the name of
each interface! We follow that with a Regular expressions that matches either IP addresses or
the ‘unassigned’ value. I won’t explain the whole regular expression but you can probably get an
idea that building this was a lot of work. This Regular expression sets capture groups for all of the
six fields on the output, so it is an improvement over a naive usage of Awk. Capture group 1 will
contain the matched interface name, capture group 2 will contain the matched IP address, and capture
group 5 would have the status value we wanted for each line.
Still, there must be a better way, right?
Parsing Libraries are meant as a way to faciliate and abstract away the scary nuances of Regular Expressions and provide a simplified way to gather data from text. While several libraries exist across a multitude of Programming Languages, I’ll be focusing on Python’s Template Text Parser or TTP. If you’re looking for alternatives, you can also take a look at other parsing libraries:
As I mentioned in a previous post, one can see parsing as essentially the process of taking plain text and turning it into structured data. In the context of Python, this would usually be a combination of lists and dictionaries. TTP in particular can also serialize parsed data into formats such as csv, json and others.
TTP is at its core a very user-friendly Parsing library. Don’t let that fool you into thinking that you can only use it for the simplest of tasks; TTP also has advanced features such as parser templating, macros and support for multiple output formats. We’ll be taking a look at those soon.
For now, let’s try to solve our original issue with TTP; the idea was to collect each interface’s status from show ip interface brief. In TTP, rather than writing long regular expressions to match specific fields (we still can do that, but in most basic cases we aren’t required to), we just specify a template with placeholders that TTP can use to understand what the text output looks like. What I mean by that is:
To get a feel for how that works, let’s take look at how to extract interface descriptions and ip addresses from show run interface:
csr1000v-1#show run interface GigabitEthernet 1
Building configuration...
Current configuration : 171 bytes
!
interface GigabitEthernet1
description MANAGEMENT INTERFACE - DON'T TOUCH ME
ip address 10.10.20.48 255.255.255.0
negotiation auto
no mop enabled
no mop sysid
end
There is a textual pattern to this block of configuration:
We can account for that pattern by copying and pasting the above output into a text editor and replacing the information we want to collect with placeholders:
The above syntax may remind some of you of Jinja syntax. TTP Templates have some
similar ideas to Jinja but are meant to do the exact opposite operation: while
Jinja produces text from structured data, TTP reverse-engineers text into
structured data. We can name the data to be extracted using the
{{ <variable-name> }}
construction. Notice we used the
ORPHRASE
and IP
filters to be intentional about our collection.
ORPHRASE
filter matches a word or phrase; interface descriptions may be multiple words in length.IP
filter matches an IPv4 Address in dot-notation.Behind the scenes, filters like IP
and ORPHRASE
are just regular expressions
bundled with TTP so you don’t need to write common patterns yourself. You
can always create your own Regular Expressions and use them as filters, but we’ll
talk in details about that in another post. Parsing the original output with our
template will yield the following result in json format:
[
{
"description": "MANAGEMENT INTERFACE - DON'T TOUCH ME",
"interface_name": "GigabitEthernet1",
"ip_address": "10.10.20.48",
"subnet_mask": "255.255.255.0"
}
]
The beauty of TTP lies in the fact that it’s pretty good at grouping data from
same-level hierarchies. In our above case, we have a flat hierarchy (no nested
data structures, just plain key-value pairs for each interface). The fact that
the four values are being grouped together becomes even more apparent if we give this
same parser the output from show run | section interface
:
[
[
{
"description": "MANAGEMENT INTERFACE - DON'T TOUCH ME",
"interface_name": "GigabitEthernet1",
"ip_address": "10.10.20.48",
"subnet_mask": "255.255.255.0"
},
{
"description": "Network Interface",
"interface_name": "GigabitEthernet2"
},
{
"interface_name": "GigabitEthernet3"
},
{
"description": "Loopback",
"interface_name": "Loopback21"
},
{
"description": "Test Loopback",
"interface_name": "Loopback2050",
"ip_address": "192.168.50.10",
"subnet_mask": "255.255.255.0"
}
]
]
Notice that the same parser template could handle the single-interface case as well as the multiple-interface case.
The parser was able to detect a semi-structure to the text thanks to the way we defined
our template. Since it saw the ‘structure’ we established beginning with interface ...
five times, it generated five different dictionaries, one for each of the interfaces.
Notice that the naming convention we chose to represent our data matters. In a way, we can say that the parser we just made models all of our interfaces using four values:
As such, we can emulate model-driven automation by passing all of our CLI output
through a parser that can then return information compliant with a coherent
data-model across all of our interfaces. Going back to our initial example with
show ip interface brief
, we can devise the following parser template:
Just like last time, we essentially copy-pasted the whole
table and replaced the set of data we want to extract with jinja-style placeholders.
Notice we added a <group>...</group>
tag to our new parser. This is something we’ll
elaborate on in the following section. For now, just know that groups add higher-level
hierarchies to data models.
Notice we use a special placeholder called {{ _start_ }}
to tell the parsing
engine we want it to start matching data AFTER it sees the header for our text table. Otherwise,
the header containing the words “Interface … IP-Address.. OK? … “ would be parsed just like
any other line in the table, producing potentially unexpected results. Applying the above
parser to our show ip interface brief
output yields:
[
{
"GigabitEthernet1": {
"admin_status": "up",
"ip_address": "10.10.20.48",
"method": "NVRAM",
"ok": "YES",
"oper_status": "up"
},
"GigabitEthernet2": {
"admin_status": "administratively down",
"ip_address": "unassigned",
"method": "NVRAM",
"ok": "YES",
"oper_status": "down"
},
"GigabitEthernet3": {
"admin_status": "administratively down",
"ip_address": "unassigned",
"method": "NVRAM",
"ok": "YES",
"oper_status": "down"
},
"Loopback2050": {
"admin_status": "up",
"ip_address": "192.168.60.50",
"method": "manual",
"ok": "YES",
"oper_status": "up"
},
"Loopback21": {
"admin_status": "up",
"ip_address": "unassigned",
"method": "unset",
"ok": "YES",
"oper_status": "up"
}
}
]
Groups allow us to add hierarchies to our data models. Thanks to the
<group>...</group>
tag in our previous
example, TTP created a nested data structure for each interface.
Notice that each interface name
was assigned a key within our output JSON; data pertaining to that interface was logically put
inside a dictionary specific to that interface. If we parse the above json into a list of
dictionaries in Python, we can access data via regular list and dictionary indexing. Since
lists and dictionaries are Iterable Python objects, we can use them in our for
loops to
act on all elements of the resulting data. In the snippet below, we print every interface’s
administative status:
data_dict = data[0]
# intf receives key (interface name), status receives dictionary with state info
for intf, status in data_dict.items():
print(f"Status for interface {intf} is {status['admin_status']}")
This results in:
Status for instance GigabitEthernet1 is up
Status for instance GigabitEthernet2 is administratively down
Status for instance GigabitEthernet3 is administratively down
Status for instance Loopback21 is up
Status for instance Loopback2050 is up
By grouping everything under the name of the interface, we built a structure the follows
a similar logic to a YANG-model list
node; YANG lists
are keyed structures,
so in our case the name of the interface is a key to get the underlying data (such
as oper status, ip address, and so on). To be more clear and intentional, we can write a
YANG model that’s roughly equivalent to the structure of the data produced by our parser:
... <definitions, imports above>
list interface {
leaf name {
type string;
}
leaf admin_status {
type enumeration {
enum "administratively down";
enum "up";
}
}
leaf oper_status {
type enumeration {
enum "up";
enum "down";
}
}
leaf description {
type string;
}
leaf method {
type string;
}
leaf ok {
type string;
}
leaf ip_address {
type inet:ipv4-address;
}
key name;
}
This is where we can begin to see a link between Data Modelling and Parsing. Even if our devices are purely CLI based, we can write a parser that will abstract those low-level details into an actual data model; this becomes even more powerful if we can write parsers that do this across different vendors. Since every networking vendor follows different CLI syntax, we can simply build a base YANG model we want all of our devices to follow, regardless of vendor. Based on that model we build our own TTP parser for each vendor. As we saw, the process of building a parser is as simple as copy-pasting CLI output and putting placeholders where you need to gather data from, while also defining groups to logically add hierarchies to your model.
Going back to how groups operate, you can specify groups by introducing the
<group>...</group>
pair of XML tags in your parser. The important thing to consider
is the name for your group, which can be set using the name
attribute on the tag.
Groups can have a static name, where the name attribute receives a static string value,
or a variable name. Groups with static names are similar to container
type nodes
in YANG; they are merely namespaces you can access to retrieve more specific data.
The above parser produces a top-level key in our json results called “interfaces”. All of the interface’s data from our previous example are grouped inside.
[
{
"interfaces": {
"GigabitEthernet1": {
"admin_status": "up",
"ip_address": "10.10.20.48",
"method": "NVRAM",
"ok": "YES",
"oper_status": "up"
},
"GigabitEthernet2": {
"admin_status": "administratively down",
"ip_address": "unassigned",
"method": "NVRAM",
"ok": "YES",
"oper_status": "down"
},
"GigabitEthernet3": {
"admin_status": "administratively down",
"ip_address": "unassigned",
"method": "NVRAM",
"ok": "YES",
"oper_status": "down"
},
"Loopback2050": {
"admin_status": "up",
"ip_address": "192.168.60.50",
"method": "manual",
"ok": "YES",
"oper_status": "up"
},
"Loopback21": {
"admin_status": "up",
"ip_address": "unassigned",
"method": "unset",
"ok": "YES",
"oper_status": "up"
}
}
}
]
TTP also gives us much more versatility and control over how parsing is done
through macros. In a nutshell, macros are a block of python functions that you can
use inside your parsing statements. They allow us to conform our matched values to
specific formats (so as to comply with our data models) or do some extra processing
from certain values. Let’s go back to the output of show run interface <interface>
and
build a macro that turns dot-notation subnet masks into CIDR notation.
As a reminder, our output for show run interface GigabitEthernet1
was:
interface GigabitEthernet1
description MANAGEMENT INTERFACE - DON'T TOUCH ME
ip address 10.10.20.48 255.255.255.0
negotiation auto
no mop enabled
no mop sysid
We can define Macros using a <macro>...</macro>
tag in our parser template
and defining regular python functions inside. Functions defined inside the macro
tag can then be used using the macro('<function_name>')
call as shown below:
The JSON result is now:
[
{
"GigabitEthernet1": {
"description": "MANAGEMENT INTERFACE - DON'T TOUCH ME",
"ip_address": "10.10.20.48",
"subnet_mask": 24
}
}
]
Overall, TTP has become my favorite text parsing library for Python. It is really easy to use and easy to explain while also giving total control to more veteran users, down to the regular expression level.
If you want to play around with it, you can get the library directly with pip:
pip install ttp
It’s also very simple to set it up and play around with it from Python, here’s the sample script I used to generate some of the output for this post:
This was meant as a basic introduction to TTP, as such, we barely scratched the surface on what this tool has to offer. I’ll probably get back to it some day in the future as there are interesting features to be looked at in more detail. If you are interested, there are also some TTP resources you can look into: