Malicious RTF Files

Published: 2016-07-29. Last Updated: 2016-07-29 13:35:02 UTC
by Didier Stevens (Version: 1)
5 comment(s)

About a year ago I received RTF samples that I could not analyze with RTFScan or rtfobj (FYI: Philippe Lagadec has improved rtfobj.py significantly since then). So I started to write my own RTF analysis tool (rtfdump), but I was not satisfied enough with the way I presented the analysis result to warrant a release of my tool. Last week, I started analyzing new samples and updating my tool. I released it, and show how I analyze sample 07884483f95ae891845caf0d50ce507f in this diary entry.


This sample is an heavily obfuscated RTF file. RTF files are essentially sets of nested strings that start with { and end with }. Like this (strongly simplified):

{\rtf {data {more data}}}.

Malicious RTF files contain a payload. Objects in RTF files are embedded in hexadecimal, like this (strongly simplified):
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E616D6500000000000000...
}}}

Malicious RTF files obfuscate the hexadecimal data in many ways, one of them is to put extra control strings inside the hexadecimal data, like this:
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E61{\obj}6D6500000000000000...
}}}

The sample I analyzed takes this to the extreme. After each hexadecimal digit, extra control strings and whitespace are inserted:


(I removed a lot of whitespace to be able to put several hexadecimal digits on the screen).
The hexadecimal digits (highlighted in red) are 01050…

My tool outputs a line of analysis data for each nested string. In this sample, because of the obfuscation, there are a lot of them (22956, which is gigantic for an RTF file).

But you can reduce the output by filtering for entries that (potentially) contain an embedded object using option -f O:

Entry 165 is the one we will take a closer look at first. The information presented for entry 165 is the following: the nesting level is 4, it has 1 child (c=), starts at position 2ae5 in the file (p=), is 1194952 bytes long (l=), has 11429 hexadecimal digits (h=), has no \bin entries (b=), contains an embedded object (O), has 1 unknown character (u=) and is named \*\objdata133765.

We can select entry 165 for closer analysis:

I highlighted the hexademical digits in red.

To decode the hexadecimal data, we use option -H:

You can see the hex data clearly now: 01 05 00 ...

Since this is an embedded object, we use option -i to get more info on the object:

From the magic header, we see that the embedded object is an OLE file (FYI: if we analyze it with oledump, we get parsing errors).

Looking further into the data (-H), we see stream entries in the output:

And a bit further, we even find a URL:

Taking a closer look, I don't only see a URL, but hex data that looks like shellcode.

We can select this shellcode by cutting if out of the stream (option -c):

And of course also dump it to a file (option -d), so that we scan analyze it with the shellcode analyzer from libemu:

So this RTF file is a downloader.

The presence of shellcode in an RTF file is often an indication of an exploit. rtfdump supports YARA (like many of my *dump tools):

The first YARA search doesn't find anything. But the second search with option -H (to decode the hexadecimal content to binary) has hits for my RTF_ListView2_CLSID YARA rule. This indicates that entry 165 contains a byte sequence for the ListView2 classid, so this is very likely an exploit for vulnerability CVE-2012-0158 in this ListView.

The set of samples I looked at last week are characterized by the following properties:

they start with {\rtfMETAX

they end with this:

If you have interesting tools or techniques to analyze RTF files, please post a comment.

Didier Stevens
Microsoft MVP Consumer Security
blog.DidierStevens.com DidierStevensLabs.com

Keywords: maldoc rtf
5 comment(s)
ISC Stormcast For Friday, July 29th 2016 http://isc.sans.edu/podcastdetail.html?id=5103

Comments


Diary Archives