During the last decade, applications dealing with high data throughput were limited to near real time operation. Take Network Intrusion Detection Systems (NIDS) for example; a crucial tool in network security —whatever your definition of security is.
Until a few years ago, such systems required expensive proprietary hardware solutions, tied to the hardware vendor’s —often poor— software tools. With the advent of cheap and powerful hardware and open source networking solutions, NIDS is now within the grasp of organizations of all sizes; route your traffic through a linux-powered router, then use a tool such as MantisNet’s Programable Packet Engine (PPE) to capture network traffic and at last send the data to a high performance streaming framework such as Kafka where, you can use extremely scalable SQL to analyze and react in real-time to threats using Lenses. Continuous SQL queries on streaming data, with Lenses ® can easily enable us to build our own Intrusion Detection system (IDS). But, before jumping into writing the proper SQL to begin detecting intrusions, we need to understand what exactly is an IDS.
An intrusion detection system (IDS) is a system that monitors network traffic for suspicious activity and issues alerts when such activity is discovered. While anomaly detection and reporting is the primary function, some intrusion detection systems are capable of taking actions when malicious activity or anomalous traffic is detected, including blocking traffic sent from suspicious IP addresses.
Intrusion detection systems can use different kind of methods to detect suspicious activities, including the following:
Network intrusion detection (NIDS)
Host intrusion detection (HIDS)
Signature-based intrusion detection
Anomaly-based intrusion detection
Intrusion detection systems are categorized as follows:
An active intrusion detection and prevention system will generate alerts and log entries but it can also be configured to take actions.
A passive intrusion detection and prevention system just detects malicious activity and can generate an alert or log entries but it will not take any actions.
In this post we will see the case of a Passive IDS and mainly about NIDS and Anomaly-based detection methods.
In order to present the power of continuous SQL queries via Lenses and in parallel build a simple IDS, we are going to focus on DNS traffic and a few well known vulnerabilities.
MantisNet provides a great docker image which captures all the traffic for DNS and n parallel, sends it all to a Kafka topic. Once the data is present in a topic you can use SQL queries in Lenses to create a Passive IDS applying NIDS and Anomaly-based detection methods. The SQL code required is quite simple.
Let’s see how we can create a few rules for IDS detection using Lenses.
MantisNet collects the data but for processing it needs to be pushed to Kafka in order to be processed as soon as it is available. Now you need to set the DNS collector to send the data to a Kafka topic.
Firstly, you need to run the DNS collector with the following command:
A typical DNS request in an IPv4 is 512-bytes UDP payload for transporting DNS messages. We can create a continuous SQL query to find all the requests which exceed this number as a violation of DNS protocol.
The continuous query topology/planner shows how we filter the traffic of the topic DNS_DHCP_TRAFFIC to validate our first rule for DNS validation where the filtered data will be sent to a new topic which will be auto-created.
You are done, you just created your very first IDS rule about DNS length UDP payload validation.
Let’s clarify that this is just an example case of a DNS validation request and if we want to expand this rule to be more accurate, we should take into account other data like if both server and client support EDNS or if it’s a task for zone transfers which both can user larger payloads. Supporting larger payloads over UDP is not advisable. If you do so then you may be confronted with amplification attacks leveraging Nameservers.
DNS was never intended to be used for data transfer. However, it has been used for this purpose by individuals with malicious intent for years.
DNS as a tunnel can be established while hiding data (in base64 encoded URLs) inside the DNS requests which then can be turned into real data on the destination DNS server. This can turn into a real threat when malicious software uses DNS to get data out of the company network, or even receive commands/updates from a command and control server. DNS uses an hierarchical system to determine the correct IP address for a domain as the following image shows:
So in the above example, instead of resolving blog.landoop.com we could send a different request:
As you can understand it’s really easy to use DNS for data transmission.
A typical DNS request is not that long, think of websites like google.com, mail.google.com, landoop.com, blog.landoop.com, etc. When using DNS tunneling the URL request character length is higher. The previous example illustrates that. With more information packed in the request, the request can easily go over the barrier of 60 characters long. This is quite uncommon request for a domain name. We can create a Lenses Continuous Query to find all the requests which exceed this number as a DNS tunneling threat.
We start by grouping the live records on source address and URL in order to keep the unique DNS messages by the same host and the same url. The query filters out those records where the URL request length exceeds 60 characters long. You can see the rule in Lenses as a Stream Topology:
The minimum length of the URL can be fine-tuned to fit your environment and your cases. You could start low (~50) and increase it if you get too many false-positives.
Another DNS Tunneling rule is to check the amount of numbers included in the URL. A typical URL does not consist of a lot of numbers. But when the data are encoded using base64 (a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation) the URL can potentially consist of a lot of numbers, so we can use this fact to detect potential DNS tunneling.
This rule checks for all DNS queries and determines if the URL consists of more than 4 numbers and in parallel it safely excludes DNS queries that consists of IP addresses which typically have more than 4 numbers. Of course, the number could be fine-tuned to fit into your environment. Also regular expression matching is an expensive operation but Lenses SQL can scale linearly using Connect or Kubernetes. You can see the rule in Lenses as a Stream Topology:
There is always a next level to which you can take your solution. References to AI and ML are all around us and you can now leverage streaming data to apply Mining and Machine learning models in a real time manner. You can also train your IDS system to recognize non signature-based attacks which are not predictable and you can apply Anomaly-based detection or even classify the severity of the incidents. Additionally you can use our Lenses Python library to connect the data into Jupyter and leverage your data science team quickly.
As you can see, it is a challenging task to apply IDS for large and high dimensional data streams. Data streams have characteristics that are quite distinct from those of statistical databases, which greatly impact the performance of the anomaly-based ID algorithms used in the detection process. These characteristics include, but are not limited to, the processing of large data as they arrive (real-time), the dynamic nature of data streams, the curse of dimensionality, limited memory capacity and high complexity. Thankfully, Lenses SQL Engine can scale the processing step linearly using Connect or Kubernetes and your network. Data and Security engineers can, therefore, focus on the real problem which in this case is to ensure their systems are secure.
Visit the documentation of Lenses to find out more about the Lenses SQL Engine.