YARA Rules
Content
- What is YARA rule?
- Meta
- Strings
- Conditions
What is YARA rule?
YARA is a tool used by analysts in the classification/identification of malware with a specific syntax. The rules written for this software are also called the YARA Rule. This blog post contains information about the general concept of how to write/read the YARA rules.
YARA rules have a syntax similar to the C language. Knowing specific keywords will be enough for us during reading the YARA rule. Of course, knowledge and experience are required to write a rule. So it is a concept that will develop over time. So what are the purposes of these YARA rules?
- To determine which malware family a malware belongs to
Faster analysis can be performed by reading different reports written about the sample examined
- To detect the presence of malicious software on the system by using YARA Rule sets with the help of various IOC scanning tools (THOR, Loki etc.) during the incident response
- Detecting a new variant of a sample belonging to a specific malware family
- Using it for proactive security,
In general, the above 4 objectives can be said. It can serve different purposes in specific uses.
So what can this rule file contains/should contain? The important point here is; We need to keep the probability of the rule we will write producing false positives at the lowest level. False positive, from the point of view of the security product; A malware detection alarm was generated, but software that causing the alarm is legal. This is called a “false positive”. If a rule produces a lot of false positives, it makes research phase very difficult for analysts, so our main focus in written rules should be a minimum false positive. What should we do about it? We need to write rules based on the features that make a software unique. For example, if we create the rule as “GetProcAddress, socket and recv” to alarm the software that passes the strings, there is a possibility that an alarm will be generated for the legal software, and we cannot determine whether it belongs to any malware family with these 3 values. Therefore, both the rules that are specific to that malware and the rules that are specific to that malware family, if it is to be used for classification, should be written.
Rule files consist of string or binary patterns. For example, a C2 IP address found in a hard-coded form can be written to the rule. If this value is included in the file, a rule can be created in the form of generate alarm.
Meta
One of the parts in the written rule is the Meta part. It has no effect on the functionality of the rule. In general, it includes:
- Author: The name or nickname of the person who wrote the rule
- Date: The date the rule was written
- Reference: References are made to a blog post describing the malware or technique that was the reason for writing the rule, etc.
- Description: This rule is written why?. For example, “detects mimikatz”.
- Hash: The hash information of the samples used when writing the rule.
In bulk IOC scans, the meta part is very important. In the event of a match, it is necessary for the analyst to obtain information as to why this match occurred. For example, when an alarm is generated by a rule named “detect mimikatz” for a file named “test.abc” , it provides the analyst with information about why the alarm was generated and who wrote it.
rule test_with_yara{ // Each rule is preceded by the keyword "rule", followed by the name of that rule and the rule in curly parentheses.
meta:
Author="OnlyF8"
Date="01-05-2023"
Description="Testing with YARA"
}
A meta part can be defined in the form. Keywords such as “Author, Date” etc. used here are not predefined. So you can give it any name you want.
Strings
The String part is the part where the IOCs that specifically identify that malware or malware family are written. The variables here are; can be hexadecimal, flat string, or regex.
Hexadecimal Strings
It is used for non-readable strings that are contained in the raw form of the software. For example, it can be used for XOR keys or non-UTF-8 strings. There are 3 types of uses here. These; wild-cards, jumps and alternatives.
Wildcards can be considered a type of regex. Let’s look at this rule as an example:
rule TestforWildcard{
strings:
$hex= { A1 29 ?? BA }
condition:
$hex
}
If a specific part of the $hex string written in the rule in the example is unknown or varies, it can be identified as such. Here, a kind of brute-force is thrown at the bytes with a question mark and scanned.
Jumps are used when a pattern is detected and there is a non-obvious change in the between, rather than a specific length of byte change. For example, the string “C:\ProgramData\Hello\abcd.exe” is passed in a piece of malware. The “Hello” index and “abcd.exe” found here may vary, but may not be of constant length. Because of this, jumps are used.
rule TestforJumps{
strings:
$jumps= { A1 29 [2-4] BA }
condition:
$jumps
}
In a rule written as follows: brute-force is thrown in the specified size range. For example;
A1 29 00 00 BA
A1 29 00 01 BA
.
.
.
A1 29 00 00 00 AB BA
.
.
.
In the Alternatives structure, bytes that are alternatives to each other are scanned. For example;
rule TestforAlternatives{
strings:
$alt= { A1 20 ( DC | 59 ) BA }
condition:
$alt
}
A rule defined as “A1 20 DC BA” or “A1 20 59 BA” is created for cases. More than one alternative and jumps or wild-cards can be used within these alternatives.
Düz Yazılar
YARA rules are case-sensitive by default. So if a rule says “onlyf8” but is in the file as “onlyF8”, there is no match.
The keyword “nocase” should be written next to the defined string to make it not case-sensitive.
The strings contained in the file are not always encoded as written. Sometimes there are cases where 2 bytes of space is allocated for each character. For such cases, the keyword “wide” is used. For example, the string “Only” might look like “O\x00n\x00l\x00y\x00” in the file.
The keyword “xor” is used for data stored by XOR’ing with a single byte. For example;
rule TestforXOR{
strings:
$testxor="Onlyf8" xor(0x01-0xab)
condition:
$testxor
}
The above rule scans all cases of the string “Onlyf8” XOR’d with single bytes (00, 01, 02, etc.). After the XOR keyword, XOR brute-force is performed in the interval specified in parentheses. This feature became available after YARA 3.11.
Base64 Values
The written YARA rules may contain decoded versions of embedded strings encoded with Base64 detected in the software. For example, the following rule generates an alarm when it detects a Base64-encoded version of the string “OnlyF8” in the file.
rule TestforBase64{
strings:
$textbase64="Onlyf8" base64
condition:
$textbase64
}
Fullword Scan
The fullword keyword is used to identify strings separated by non-alphanumeric characters. For example, you get a domain IOC value named “hello”. If the value “hello” is defined as “fullword”; It will not match “www.hellobois.com”, but it will match “www.hello-bois.com” or “www.hello.com” domains.
Keyword table:
Keyword | String Type | Description |
---|---|---|
nocase | Text, Regex | Not case-sensitive |
wide | Text, Regex | interprets the string as utf-16 and evaluates their 0x00 state |
ascii | Text, Regex | Used when ASCII characters are used (for non-English characters) |
xor | Text | Scans the given string by XORing it with a single byte |
base64 | Text | Scans with 3 separate Base64 permutations |
base64wide | Text | Scans by combining 3 separate Base64 permutations with 0x00 |
fullworld | Text, Regex | Ensures a match when there are no alphanumeric characters before or after a given string |
private | Hex, Text, Regex | The match is not shown in the output |
Conditions
Conditions can simply be thought of as a logical operation. Logical operators such as “XOR”, “AND”, “OR” can be used. In general, it works like the “IF” structure in software languages. For example;
rule APT34_Malware_HTA {
meta:
description = "Detects APT 34 malware"
license = "Detection Rule License 1.1 https://github.com/Neo23x0/signature-base/blob/master/LICENSE"
author = "Florian Roth (Nextron Systems)"
reference = "https://www.fireeye.com/blog/threat-research/2017/12/targeted-attack-in-middle-east-by-apt34.html"
date = "2017-12-07"
hash1 = "f6fa94cc8efea0dbd7d4d4ca4cf85ac6da97ee5cf0c59d16a6aafccd2b9d8b9a"
strings:
$x1 = "WshShell.run \"cmd.exe /C C:\\ProgramData\\" ascii
$x2 = ".bat&ping 127.0.0.1 -n 6 > nul&wscript /b" ascii
$x3 = "cmd.exe /C certutil -f -decode C:\\ProgramData\\" ascii
$x4 = "a.WriteLine(\"set Shell0 = CreateObject(" ascii
$x5 = "& vbCrLf & \"Shell0.run" ascii
$s1 = "<title>Blog.tkacprow.pl: HTA Hello World!</title>" fullword ascii
$s2 = "<body onload=\"test()\">" fullword ascii
condition:
filesize < 60KB and ( 1 of ($x*) or all of ($s*) )
}
[1]
Looking at the “condition” part of the YARA rule example above; It will generate alarms in cases where there are 1 of those that start with “$x” or all of those that start with “$s”. This is one of the techniques used by a malware family or an attack group to aggregate the techniques into a rule.
Typed IOC values (string, hex, etc.) can be written to detect a malware family or an attack group. In such cases, certain IOCs are present in all of them, but in some cases may vary. In such cases, different conditions are created.
By using different YARA rule libraries in the “Condition” section, hexadecimal values such as file signature etc. can also be checked.
For criticism/correction/suggestion, please contact me at my contact addresses. Your comments are valuable to me :)
Reference
[1] https://github.com/Neo23x0/signature-base/blob/master/yara/apt_apt34.yar
[2] https://yara.readthedocs.io/en/stable/