DNS Abuse Detection: Domain Generation Algorithms

Definition

Domain Generation Algorithms (DGAs) allow threat actors to construct and register unique domains very quickly for malicious purposes such as phishing campaigns or malware command and control (C2) that evades filtering. Because DGAs are generated semi-randomly, they have proven difficult to block using blocklists of known “bad” domains. DGAs for the most part are limited only by the DGA operator’s ability to generate domains and register them through registrars.

Note: The term “DGA” technically means the algorithm itself, but is also commonly used to refer to a domain that is generated by a DGA.

DGA domains can serve as infrastructure for various types of other abuse and misuse. For example, in a SMS/text messaging campaign, they can be used as the initial link in the message, as part of a redirect farm, or as the endpoint holding the phishing content.

A DGA domain may or may not consist of random letters or numbers. For example, a DGA could append letters or numbers to a string, a word list, etc. DGAs can also potentially evade detection by using dictionary words in English or words from other languages instead using random mix of letters and numbers (for example, carhorsebatterystaplehousewindow.example could be a generated domain).

DGAs are listed as a technique for C2 traffic in MITRE ATTACK T1568.002:

“Adversaries may make use of Domain Generation Algorithms (DGAs) to dynamically identify a destination domain for command and control traffic rather than relying on a list of static IP addresses or domains. This has the advantage of making it much harder for defenders to block, track, or take over the command and control channel, as there potentially could be thousands of domains that malware can check for instructions…

“Adversaries may use DGAs for the purpose of Fallback Channels. When contact is lost with the primary command and control server malware may employ a DGA as a means to reestablishing command and control”

https://attack.mitre.org/techniques/T1568/002/

Advice

Detection of first use of DGA domains is considered beyond the scope of what a “typical” SOC can detect without the help of Cyber Threat Intelligence (CTI) DNS blocklist feeds, subscribing to a Response Policy Zone (RPZ), or using a Protective DNS solution.

MITRE ATT&CK describes detections for DGAs:

“Detecting dynamically generated domains can be challenging due to the number of different DGA algorithms, constantly evolving malware families, and the increasing complexity of the algorithms. There [are a] myriad … approaches for detecting a pseudo-randomly generated domain name, including using frequency analysis, Markov chains, entropy, proportion of dictionary words, ratio of vowels to other characters, and more. CDN domains may trigger these detections due to the format of their domain names. In addition to detecting a DGA domain based on the name, another more general approach for detecting a suspicious domain is to check for recently registered names or for rarely visited domains.

“Machine learning approaches to detecting DGA domains have been developed and have seen success in applications. One approach is to use N-Gram methods to determine a randomness score for strings used in the domain name. If the randomness score is high, and the domains are not whitelisted (CDN, etc), then it may be determined if a domain is related to a legitimate host or DGA. Another approach is to use deep learning to classify domains as DGA-generated.”

https://attack.mitre.org/techniques/T1568/002/

Examples

DGAs and the domains that they produce vary widely depending on their planned usage, typically for phishing or malware C2, and how stealthy they are trying to be. As a result, there is no one definitive method to determine if an observed domain has been created by a DGA.

However, there are several methods that can be combined to identify DGA domains, ranging from domain age to Whois lookups. If an organization receives a report of a fraudulent domain and that domain appears to be a DGA, the organization can potentially pivot using publicly available domain Whois information, nameserver delegations, and IP addresses and netblocks in query answers to detect additional domains that were generated using the same DGA.

Reviewing real-world examples of DGAs can help illustrate how different attributes of the domain name can help direct detection efforts. There are also simple, tried and true, no tool options such as measuring domain length: very short and very long domains might not indicate a DGA was used to create the domain. Be aware that some URL shorteners will use very short domains.

As an individual example of a DGA, single words can be paired with different TLDs to create fraudulent domains. Threat actors, aware of string matching and/or fuzzy matching based on a “bag of words”, have used word lists to defeat searches looking for specific keywords. These words could be random dictionary words or represent words common to a particular field of study (e.g., anthropology, chemistry, etc.).

This technique was used in a spear phishing campaign that was reported by the IRS at https://www.irs.gov/newsroom/latest-spear-phishing-scams-target-tax-professionals. This campaign targeted tax preparers using domains consisting of fraudulent domains of varying lengths using different word lists inside of several low-volume TLDs:

.rest TLD: arborer[.]rest ataghans[.]rest bimotor[.]rest

.shop TLD: isosulphocyanic[.]shop mediostapedial[.]shop necessarianism[.]shop

.world TLD: paleoethnography[.]world stereotaxically[.]world unforgettability[.]world

.monster TLD: backstabbing[.]monster bellflower[.]monster carousel[.]monster