Quite a lot of fashionable community threats contain information theft through abuse of community companies, which is termed information exfiltration. To trace such threats, analysts monitor information transfers out of the group’s community, significantly information transfers occurring through community companies not primarily supposed for bulk switch companies. One such service is the Area Identify System (DNS), which is crucial for a lot of different Web companies. Sadly, attackers can manipulate DNS to exfiltrate information in a covert method.
This SEI weblog publish focuses on how the DNS protocol might be abused to exfiltrate information by including bytes of information onto DNS queries or making repeated queries that comprise information encoded into the fields of the question. The publish additionally examines the overall site visitors analytic we are able to use to establish this abuse and applies a number of instruments obtainable to implement the analytic. The combination dimension of DNS packets can present a prepared indicator of DNS abuse. Nevertheless, as a result of the DNS protocol has grown from a easy handle decision mechanism to distributed database help for community connectivity, deciphering the combination dimension requires understanding of the context of queries and responses. By understanding the amount of DNS site visitors, each in isolation and in combination, analysts could higher match outgoing queries and incoming responses.
The information used on this weblog publish is the CIC-BELL-DNS-EXF 2021 information set, as revealed along side the paper Light-weight Hybrid Detection of Knowledge Exfiltration utilizing DNS based mostly on Machine Studying by Samaneh Mahdavifar et al.
The Function of DNS
DNS helps a number of sorts of queries. These queries are described in a wide range of Web Engineering Process Power (IETF) Request for Remark (RFC) paperwork. These RFCs embrace the next:
- A and AAAA queries for IP handle similar to a site title (e.g., “which handle corresponds to www.instance.com?” with a response like “192.0.2.27”)
- pointer report (PTR) queries for title similar to an IP handle (e.g., “which title corresponds to 192.0.2.27?” with a response of “www.instance.com”)
- title server (NS), mail trade (MX), and service locator (SRV) queries for the id of key servers in a given area
- begin of authority (SOA) queries for details about addresses on which the queried server could communicate authoritatively
- certificates (CERT) queries for encryption certificates pertaining to the server’s lined domains
- textual content report (TXT) queries for added info (as configured by the community administrator) in a textual content format
A given DNS question packet will request info on a given area from a selected server, however the response from that server could embrace a number of useful resource information. The scale of the response will depend upon what number of useful resource information are returned and the kind of every report.
As soon as analysts perceive the explanations for monitoring DNS site visitors and the context wanted for deciphering the monitoring outcomes, they’ll then decide what info is desired from the monitoring. This weblog publish assumes the analyst needs to trace exterior hosts which may be receiving exfiltrated info.
Overview of the Analytic for Figuring out Knowledge Exfiltration
The analytic lined on this weblog publish assumes that the networks of curiosity are lined by site visitors sensors that produce community circulate information or at the very least packet captures that may be aggregated into community circulate information. There are a number of instruments obtainable to generate these circulate information. As soon as produced, the circulate information are archived in a circulate repository or applicable database tables, relying on the evaluation instrument suite.
The strategy taken on this analytic is, first, to combination DNS site visitors related to exterior locations appearing like servers and, second, to profile the site visitors for these locations. Step one (affiliation) entails figuring out DNS site visitors (both by service port or by precise examination of the appliance protocol), then figuring out the exterior locations concerned. The second step (profiling) examines what number of sources are speaking with every of the locations, the combination byte depend, packet depend, and different revealing info as described within the following sections.
A number of totally different instruments can be utilized for this evaluation. This weblog publish will focus on two units of SEI-developed instruments:
- The System for Web-Stage Information (SiLK) is a group of site visitors evaluation instruments developed to facilitate safety evaluation of huge networks. The SiLK instrument suite helps the environment friendly assortment, storage, and evaluation of community circulate information, enabling community safety analysts to quickly question giant historic site visitors information units. SiLK is ideally fitted to analyzing site visitors on the spine or border of a big, distributed enterprise or mid-sized ISP.
- Mothra is a group of Apache Spark libraries that help evaluation of community circulate information in Web Protocol Stream Info Export(IPFIX) format with deep packet inspection fields.
Every of the next sections will current an analytic for detecting exfiltration through DNS queries within the corresponding instrument set.
Implementing the Analytic through SiLK
Determine 1 under presents a sequence of SiLK instructions to implement an analytic to detect exfiltration. The primary command applies a filter to regular, benign DNS site visitors, isolating DNS site visitors (recognized by protocol recognition as indicated by the appliance label of 53) coming from the inner community (classless inter-domain routing [CIDR] block 192.168.0.0/16) and of comparatively lengthy (70 bytes or extra) packets. The output of the filter is then summarized by vacation spot handle and transport protocol, counting bytes, circulate information, and packets for every mixture of handle and protocol. The ensuing counts are solely proven if the accrued bytes are 500 or extra. After making use of the analytic to benign DNS information, it’s utilized within the second sequence to DNS information encompassing compressed information for exfiltration.
Determine 1: SiLK Analytic and Outcomes
The leads to Determine 1 present that the community talks to a major DNS server, a secondary DNS server, and a public server. Within the benign case, the information is principally directed to the first DNS server and the general public server. Within the exfiltration case, the information is principally directed to the first DNS server and the secondary DNS server. This shift of vacation spot, in isolation, is just not sufficient to make the exfiltration site visitors suspicious or present a foundation for transferring past suspicion into investigation. Within the benign case, there’s a notable fraction of the site visitors directed to the general public DNS server at 8.8.8.8. Within the site visitors labeled as abusive, this fraction is lessened, and the fraction to a non-public DNS server (the exfiltration goal) at 224.0.0.252 is elevated. Sadly, given the restricted nature of SiLK circulate information, safety analysts have a tough time exfiltrating further site visitors. To go additional, extra DNS-specific fields are required. These fields are offered by deep packet inspection (DPI) information in expanded circulate information in IPFIX format. Whereas SiLK can’t course of IPFIX circulate information, different instruments corresponding to Mothra and databases can.
Implementing the Analytic through Mothra
The code pattern under exhibits the analytic applied in Spark utilizing the Mothra libraries. These libraries permit definition and loading of information frames with community circulate report information in both SiLK or IPFIX format. A knowledge body is a assortment of information organized into named columns. Knowledge frames might be manipulated by Spark features to isolate flows of curiosity and to summarize these flows. Defining the information frames entails figuring out the columns and the information to populate the columns. Within the code pattern, the information frames are outlined by the spark.learn.subject
operate and populated by information from both the captured benign site visitors or the captured exfiltration site visitors through Mothra’s ipfix
operate. Collectively, these features set up the information
information body.
The outcome
information body is constructed from the information
information body through a sequence of filtering and summarization features. The preliminary filter
restricts it to site visitors labeled as DNS site visitors, adopted by one other filter that ensures the information comprise DNS useful resource report queries or responses. The choose
operate that follows isolates particular report options for summarization: time, site visitors supply and vacation spot, byte and packet volumes, DNS names, DNS flags, and DNS useful resource report sorts. The groupBy
operate generates the summarization for every distinctive DNS title and useful resource report kind mixture. The agg
operate specifies that the summarization comprise the depend of circulate information, the counts of supply and vacation spot IP addresses, and the totals for bytes and packets. The filter
operate (after the summarization) restricts output to only these exhibiting a bytes-per-packet ratio of greater than 70 with fewer than three entries within the DNS Identify checklist. This final filter
excludes summarizations of site visitors that’s giant solely as a result of size of the response checklist moderately than to the size of particular person queries.
This filtering and summarization course of creates a profile of huge DNS requests and responses (separated by DNS flag values). Using DNS names as a grouping worth permits the analytic to tell apart repeated queries to comparable domains. The counts of supply and vacation spot IP addresses permit the analyst to tell apart repeated site visitors to a couple places as a substitute of uncommon site visitors to a number of places or from a number of sources.
val data_dir = ".../path/to/information"
import org.cert.netsa.mothra.datasources._
import org.cert.netsa.mothra.datasources.ipfix.IPFIXFields
import org.apache.spark.sql.features._
// In dnsIDBenign.sc:
val data_file = s"$data_dir/light_benign.ipfix"
// In dnsIDAbuse.sc:
// val data_file =
// s"$data_dir/light_compressed.ipfix"
val information = {
spark.learn.fields(
IPFIXFields.default, IPFIXFields.dpi.dns
).ipfix(data_file)
}
val outcome = {
information
.filter(($"silkAppLabel" === 53) &&
(dimension($"dnsRecordList")>0))
.choose(
$"startTime",
$"sourceIPAddress",
$"destinationIPAddress",
$"octetCount",
$"packetCount",
$"dnsRecordList.dnsRRType" as "dnsRRType",
$"dnsRecordList.dnsQueryResponse" as "dnsQR",
$"dnsRecordList.dnsResponseCode" as "dnsResponse",
$"dnsRecordList.dnsName" as "dnsName")
.groupBy($"dnsName",$"dnsRRType")
.agg(depend($"*") as "flows",
countDistinct($"sourceIPAddress") as "#sIP",
countDistinct($"destinationIPAddress") as "#dIP",
sum($"octetCount") as "bytes",
sum($"packetCount") as "packets")
// .filter($"packets" > 20)
.filter($"bytes"/$"packets" > 70)
.filter(dimension($"dnsName") < 3)
.orderBy($"bytes".desc)
}
outcome.present(20,false)
The code pattern under exhibits the output of dnsIDExfil.sc on benign and on compressed information, the information units used within the previous SiLK dialogue. The presence of multicast (224/8 and 239/8 CIDR blocks) and RFC1918 personal addresses (192.168/16 CIDR blocks) is because of this information coming from a synthetic assortment atmosphere as a substitute of dwell Web site visitors seize.
Contrasting the benign output towards the abuse output, we see a smaller variety of lookup addresses being queried within the abuse outcomes and a a lot faster drop-off within the variety of queries per host. Within the benign outcomes, there are six DNSNames which are queried repeatedly; within the abuse outcomes, there are two. The entire queries proven are PTR (reverse. RRType=12) queries, and all are going to the identical server. Within the high-volume DNSName queries, the utmost common packet size is barely bigger for the abuse information than for the benign information (81 vs. 78). Taken collectively, these variations present a slow-and-steady launch of further information as a part of the DNS information switch, which displays the file switch going down.
dnsIDBenign.sc output:
+-------------------------------------+---------+-----+----+----+------+-------+
|dnsName |dnsRRType|flows|#sIP|#dIP|bytes |packets|
+-------------------------------------+---------+-----+----+----+------+-------+
|[252.0.0.224.in-addr.arpa.] |[12] |2835 |1 |1 |416539|5901 |
|[150.20.168.192.in-addr.arpa.] |[12] |982 |1 |1 |242585|3125 |
|[200.20.168.192.in-addr.arpa.] |[12] |895 |1 |1 |134756|1836 |
|[15.20.168.192.in-addr.arpa.] |[12] |901 |1 |1 |133490|1844 |
|[100.20.168.192.in-addr.arpa.] |[12] |757 |1 |1 |112173|1533 |
|[2.20.168.192.in-addr.arpa.] |[12] |635 |1 |1 |91734 |1288 |
|[3.20.168.192.in-addr.arpa.] |[12] |315 |1 |1 |45438 |640 |
|[_ipps._tcp.local., _ipp._tcp.local.]|[12, 12] |122 |32 |1 |13161 |136 |
|[250.255.255.239.in-addr.arpa.] |[12] |74 |1 |1 |11328 |152 |
|[101.20.168.192.in-addr.arpa.] |[12] |31 |1 |1 |4666 |64 |
+-------------------------------------+---------+-----+----+----+------+-------+
solely exhibiting high 10 rows
dnsIDAbuse.sc output:
+-------------------------------------+---------+-----+----+----+------+-------+
|dnsName |dnsRRType|flows|#sIP|#dIP|bytes |packets|
+-------------------------------------+---------+-----+----+----+------+-------+
|[252.0.0.224.in-addr.arpa.] |[12] |1260 |1 |1 |191398|2696 |
|[2.20.168.192.in-addr.arpa.] |[12] |255 |1 |1 |130725|1615 |
|[150.20.168.192.in-addr.arpa.] |[12] |416 |1 |1 |63606 |866 |
|[200.20.168.192.in-addr.arpa.] |[12] |388 |1 |1 |57686 |788 |
|[15.20.168.192.in-addr.arpa.] |[12] |379 |1 |1 |56492 |781 |
|[100.20.168.192.in-addr.arpa.] |[12] |340 |1 |1 |50738 |694 |
|[3.20.168.192.in-addr.arpa.] |[12] |125 |1 |1 |17750 |250 |
|[250.255.255.239.in-addr.arpa.] |[12] |32 |1 |1 |4736 |64 |
|[_ipps._tcp.local., _ipp._tcp.local.]|[12, 12] |46 |30 |1 |4467 |51 |
|[_ipp._tcp.local., _ipps._tcp.local.]|[12, 12] |13 |9 |1 |1782 |19 |
+-------------------------------------+---------+-----+----+----+------+-------+
solely exhibiting high 10 rows
Understanding Knowledge Exfiltration
Whichever type of tooling is used, analysts typically want an understanding of the information transfers from their community. Repetitive queries for DNS decision needs to be moderately uncommon—caching ought to remove many of those repetitions. As repetitive queries for decision are recognized, a number of teams of hosts could also be discovered:
- Hosts that generate repetitive queries not indicative of exfiltration of information are prone to exist, characterised by very constant question dimension, periodic timing, and using anticipated title servers.
- Hosts that generate repetitive queries with uncommon title servers or timing could require additional investigation.
- Hosts that generate repetitive queries with uncommon title servers or question sizes needs to be examined fastidiously to establish potential exfiltration.
The impression of those hosts on community safety will differ relying on the vary and criticality of property these hosts entry, however a few of the site visitors could demand rapid response.
What Would possibly a Safety Analyst Wish to Know
This publish is a part of a sequence addressing a easy query: What would possibly a safety analyst wish to know initially of every shift relating to the community? In every publish we are going to focus on one reply to this query and software of a wide range of instruments which will implement that reply. Our purpose is to supply some key observations that assist analysts monitor and defend their networks, specializing in helpful ongoing measures, moderately than these particular to at least one occasion, incident, or difficulty.
We won’t concentrate on signature-based detection, since there are a selection of sources for such together with intrusion detection methods (IDS)/intrusion prevention methods (IPS) and antivirus merchandise. The instruments utilized in these articles will primarily be a part of the CERT/NetSA Evaluation Suite, however we are going to embrace different instruments if useful. Earlier posts examined instruments for monitoring software program updates and proxy bypass.
Our strategy will likely be to focus on a given analytic, focus on the motivation behind the analytic, and supply the appliance as a labored instance. The labored instance, by intention, is illustrative moderately than exhaustive. The choice of what analytics to deploy, and the way, is left to the reader.
If there are particular behaviors that you simply wish to counsel, please ship them by e-mail to netsa-help@cert.org with “SOC Analytics Concept” within the topic line.