Malicious documents exploiting CVE-2017-11882 continue to be used by malicious actors, but it has been a few years since I took a deep dive into their mechanics. A quick spelunk through our dataset produces quite a few, but I wanted an RTF example with minimal RTF obfuscation and came across this email:
So It Begins
Let’s start out with analyzing the RTF document and compare it with past documents. We know from experience that this vulnerability can be exploited from multiple document types (RTF, DOCX, XLSX) and has two options for injecting the malicious stream (Equation stream and OleNativeStream). But this one immediately looks different. Most public tools were unable to correctly parse the embedded stream (rtfdump, rtfobj, or RTFScan). Although rtfdump doesn’t parse the equation stream, it does provide a good layout of all embedded objects and lets us dump the stream suspected of being the equation stream.
We see the traditional ClassName (slightly obfuscated), EqUatioN.3, and the required FormatID of 0x00000002, and random data for the OLEVersion. And instead of seeing an Embedded Equation object header starting with 0x001c or any bytes reflecting an MTEF header, such as a MTEF version of 0x03 and product version of 0x03, we only see the FONT record at the correct offset, 0x0108 at 0x29.
Out of Doubt, Out of Dark
Let’s load this sample into a debugger and see what other tricks have been developed. Because the equation object relies on COM, we can set a breakpoint when these objects are created and iterate until EQNEDT32.EXE is launched. Then attach a separate debugger to the Equation Editor process and set a break point on the vulnerable function, 0x0041160F. Just as my last analysis, the return address is overwritten with an address of a RET instruction. Because the font record location follows the return address on the stack, this also results in execution flow continuing into the first stage shellcode.
The first stage shellcode is slightly different for this sample, but not unique and already discussed here. Basically, the shellcode locates the OLE stream on the heap and uses kernel32.GlobalLock to lock the stream at this memory location. And then jumps to a statically defined offset with in the OLE stream.
Similar to my previous analysis, the second stage shellcode starts with a decoder stub. The decoder contains quite a few JMPs to complicate analysis, but it can be boiled down to the following:
- a CALL instruction to load the start of the encoded shellcode on the stack
- POP ESI to create a pointer to the encoded shellcode
- Initialize the key for the XOR decoder
- the key mutates every iteration with IMUL EDI, EDI, 67D6B6F7
- each dword is decoded with XOR DWORD PTR DS:[ESI], EDI
If This Is to Be Our End
Now that we know the shellcode for these malicious RTF documents hasn’t changed much, can we use the unicorn engine to dump the final payload without relying on the heavy weight and manual process of running it within a debugger?
The first step will be extracting the shellcode from the RTF, starting at the last instruction of the first stage shellcode, JMP EAX. Then modifying this instruction with a relative jump. The two instructions preceding this one result in 0xD5 and the JMP instruction is at offset 0x33 from the start of the OLE stream. By modifying the JMP EAX to a relative near jump, we will be adding 3 additional bytes to the instruction. This results in JMP 0x9F. Stripping the shellcode from the original RTF and modifying the JMP instruction produces the following hex string:
One interesting feature of the unicorn engine is how we can add hooks to instructions, code blocks, and even results of an instruction. We can use these hooks to add a callback function every time an instruction writes to memory or when an instruction reads from an unmapped segment of memory. To use the unicorn engine to decode our shellcode we will need to do the following:
- Define and map our address space
- Define ESP to handle any POP instructions
- Define a callback function on memory writes to determine what segment of our shellcode is being modified
- Define a callback function on a memory read from an unmapped segment, this should indicate our final shellcode attempting to load a function from a module
Excellent! Our script was able to decode the final shellcode and can even see the API calls that are loaded via LoadLibraryW. Because the shellcode is UTF-16BE, we can print the important IoCs by setting the encoding for the strings command. Our pipeline had already pulled this sample and labeled it as MassLogger.
|IoC Type||IoC Value|