Analysis of “new” RTF malware obfuscation method

Earlier this morning someone sent me a suspicious email they received for analysis. This is a print screen of the email:

original_email

 

The two documents attached were exactly the same file but with different names. My initial triage already determined the files to be malicious, as they were already submitted to VT and with a detection count of 16/56 (July 6th at 7:10 AM UTC). The file had first been uploaded just 11 minutes before by someone else for analysis, and detections were produced mostly by generic signatures for a malicious RTF obfuscated exploit:

 

VT_detections

So thus I had an idea of what type of malware could it be: something related to the recent malicious RTF exploits leveraging CVE-2017-0199, as explained by NViSO, FireEye or Fortinet last April 2017.

So using Didier Stevens rtfdump tool (thanks Didier!) I could see all components of this RTF file. As expected, there were A LOT of components (5206), as it happens in most malicious RTF documents due to need for obfuscation.

initial_rtfdump_viewjpg

 

The file starts with a typical RTF header but without the F (the f is not needed for Microsoft Word to open RTF documents, and this is typically used by malware creators to avoid having the file scanned by some AV or to be submitted in sandboxes). Then it has one object with 2 nested objects: an object class and the object data. Inside the object data component there are all other 5204 components of this file, nested inside \mmmailsubject components.

My first idea is that there would be some kind of OLE file inside objects 3 or 4. So I selected object 3 and applied the option -H to convert the hexadecimal characters. This was just a string that was giving the file a title.

rtfdump_object3

But selecting object 4, rtfdump was not able to present any output. So, since all objects under object 3 are nested inside object 2, I tried selecting object 2 and I got the content of all subcomponents:

rtfdump_object2

With this, I knew I could open the file with 010 Editor without danger in order to make it easier to view and decipher.

010_editor_initial_view

In the text editor it is easy to appreciate the \mmmailsubject tags. If we look at the official RTF specification document from Microsoft we can find a short explanation on this control word:

RTF_specs_MSFT-mmmailsubject

So, from this explanation I concluded all content inside the \mmmailsubject nests could be ignored, leaving behind a trail of hexadecimal values that put together would probably become an interesting string.

hexadecimal_values_between_ignored_nests

So I built a regex to match all nested components to be ignored and I tested it in a short sample of 2 nests on regex101. Bingo!

nested_regex_test

I used the Find&Replace option with regular expressions built in 010 Editor to match all nests and remove them. This left me with the following:

nests_ignored_rtf

Notice that the resultant hex string includes the magic bytes of an OLE file, d0cf11e.

I saved this file as “deobfuscated-rtf.vir” into my analysis folder and proceeded to analyze it again with rtfdump. Now the output was much smaller:

deobfuscated-rtfdump

Since I knew the file embedded in object 4 was an OLE file, I could try to extract it and pipe it into oledump tool (also from Didier Stevens).

deobfuscated-rtfdump-oledump

And what was in those OLE objects? Yes, a URL pointing to an HTA file to download.

deobfuscated-rtfdump-oledump-url

I will not analyze the HTA file that would have been downloaded because it has already been analyzed by others before. What was kind of new in this case was the obfuscation used in this RTF malware sample.

 

8 thoughts on “Analysis of “new” RTF malware obfuscation method

  1. Pingback: Furoner.CAT

Leave a comment