Earlier this morning someone sent me a suspicious email they received for analysis. This is a print screen of the email:
The two documents attached were exactly the same file but with different names. My initial triage already determined the files to be malicious, as they were already submitted to VT and with a detection count of 16/56 (July 6th at 7:10 AM UTC). The file had first been uploaded just 11 minutes before by someone else for analysis, and detections were produced mostly by generic signatures for a malicious RTF obfuscated exploit:
So using Didier Stevens rtfdump tool (thanks Didier!) I could see all components of this RTF file. As expected, there were A LOT of components (5206), as it happens in most malicious RTF documents due to need for obfuscation.
The file starts with a typical RTF header but without the F (the f is not needed for Microsoft Word to open RTF documents, and this is typically used by malware creators to avoid having the file scanned by some AV or to be submitted in sandboxes). Then it has one object with 2 nested objects: an object class and the object data. Inside the object data component there are all other 5204 components of this file, nested inside \mmmailsubject components.
My first idea is that there would be some kind of OLE file inside objects 3 or 4. So I selected object 3 and applied the option -H to convert the hexadecimal characters. This was just a string that was giving the file a title.
But selecting object 4, rtfdump was not able to present any output. So, since all objects under object 3 are nested inside object 2, I tried selecting object 2 and I got the content of all subcomponents:
With this, I knew I could open the file with 010 Editor without danger in order to make it easier to view and decipher.
In the text editor it is easy to appreciate the \mmmailsubject tags. If we look at the official RTF specification document from Microsoft we can find a short explanation on this control word:
So, from this explanation I concluded all content inside the \mmmailsubject nests could be ignored, leaving behind a trail of hexadecimal values that put together would probably become an interesting string.
So I built a regex to match all nested components to be ignored and I tested it in a short sample of 2 nests on regex101. Bingo!
I used the Find&Replace option with regular expressions built in 010 Editor to match all nests and remove them. This left me with the following:
Notice that the resultant hex string includes the magic bytes of an OLE file, d0cf11e.
I saved this file as “deobfuscated-rtf.vir” into my analysis folder and proceeded to analyze it again with rtfdump. Now the output was much smaller:
Since I knew the file embedded in object 4 was an OLE file, I could try to extract it and pipe it into oledump tool (also from Didier Stevens).
And what was in those OLE objects? Yes, a URL pointing to an HTA file to download.
I will not analyze the HTA file that would have been downloaded because it has already been analyzed by others before. What was kind of new in this case was the obfuscation used in this RTF malware sample.