We have been working with flow technologies for almost a decade... you could say that we are a bit obsessed. What is most impressive about flow technology is how long it has remained relevant. More than two decades have passed since NetFlow was created, vendors continue to innovate with it, developers still create products to collect it, and network engineers and security professionals still trust it. Of particular interest to the ARIN community, with the latest version, you can now collect statistics on your IPv6 traffic which is useful for planning your migration, tracking what services are running on IPv6 / IPv4, comparing traffic volumes, and monitoring your network.
With this in mind, I want to pass on some knowledge, advice, and a story about the protocol.
What is flow data?
The origin story dates back to Cisco Systems in the mid-1990s with the creation of NetFlow. Since then 'NetFlow' has become an umbrella term that encompasses a variety of different iterations of the protocol (NetFlow v5, v9, IPFIX, Flexible NetFlow, NetStream, etc.). To determine which version a provider has implemented, it is best to read the manual or do a Google search.
One way to understand flow data use cases is to compare them to crime programs like Law & Order. When the police initiate an investigation, they often examine the suspect's phone records as a first step. Who did they call? What time was it How long did the call last? Answering these questions helps investigators find new leads and build a case.
Flow data is a phone record for network traffic. Collecting this data allows the operator to be the detective investigating who, what, where and when... great, right?
A more technical explanation is that the network equipment sends flow packets over UDP. The streams are sent to a collector where they are stored and displayed. Flow collection is all about observation points. In the image below, the entire network is producing flows, giving the network operator complete visibility.
If we take the same network and only enable router stream data, we can see a difference. In this example, we see an employee communicating with both a LAN-side resource (the green envelope) and the Internet (the blue envelope). When communication linked to the Internet crosses the router, flow data is generated and the collector never sees the communication from the LAN side. Understanding this concept is critical when deciding which equipment to collect data from.
There are many options to consider when selecting a flow collector. Commercial solutions, free solutions, and open source solutions are available. A free cloud-based collector is provided for ISPs and hosting providers.
Data enrichment
One of the best ways to maximize the value of your flow data is to enrich it with metadata. Country or autonomous system searches are a great way to add context around where conversations are taking place. Alert generation by comparing stream data with reputation feeds is a popular method for threat recognition.
Other methods include grouping by subnet, defining application ports, or using algorithms to generate alerts. Once you start experimenting with technology, it becomes apparent that there is a lot of room for creativity.
The following example shows how to group IP information by autonomous system. Taking this step provides a pleasant dashboard display and unlocks the ability to compare peer-to-peer traffic and traffic.
Flow Evolution: IPv6 Monitoring
The most positive changes in flow data occurred with the introduction of templates in NetFlow v9. Where v5 was a fixed format, v9 provided the flexibility to choose fields. This change also paved the way for the IETF IP Flow Information Export standard (IPFIX).
With the ability to customize the data, the statistics on IPv6 traffic arrived. IPv6 users can now easily compare IPv6/IPv4 traffic volumes, use reports to plan IPv6 migrations, and track services running on each protocol. In a v5 world that was not possible, that's why I find the change so significant.
There are many other doors that have unlocked template-based flow data. Some of my favorite examples include Layer 7 application attribution, performance metrics like latency or jitter, and the recent move by SD-WAN vendors to include elements that visualize how traffic traverses the mesh. All the good things.
Abstract
Be curious about what is possible. Stream data is often considered a simple tool that helps with bandwidth monitoring, but it is so much more. The protocol is extremely rich and providers continue to push the limits of what is possible. It's worth taking the time to keep an eye on it, especially as you prepare to transition to IPv6! Feel free to contact us if you have any questions.