In the last article, we covered how to do Analysis of Malicious Documents. In this final part of the article series, we will continue to analyze PDF documents using additional tools. In this article, review the Origami framework that can be used to inspect and extract various objects from PDF documents.
For a refresher, let’s review the basic keywords related to parsing PDF documents.
Analysis of Malicious Documents
/AA: Defines automatic actions that are inserted into the document when the user opens the document. It should be noted that events are also declared inside this like cursor movement to trigger a specific action.
- /AcroForm: Indicates whether Adobe forms are used in PDF documents or not.
- /ObjStm: Used to define an object stream that can hide specific objects. We will see that later in the series.
- /GoTo*: Redirected to the specified destination in the PDF file.
- /URI: Resource accessed by URL
- /SubmitForm and /GoToR: Indicates data submitted to a URL.
- /Launch: Launches the program.
Let’s start using different tools within Origami.
First, let’s look at PDF Walker, which is a GUI program that is part of the Origami framework. Below is the result when the pdf is loaded into PDFwalker.
This search will give us the reference of object 32
To view this object, click on Document > Jump to Object and type the object number like below
This will show us the Object 32 stream
It must be noted that PDF Walker identifies the encoded algorithm used in the PDF document and applies necessary decoding. For this document, PDFwalker identifies FlateDecode and applies the necessary filter
Above we can see the decoded stream. We can dump this stream by right-clicking the stream and dump it.
Moreover, below is the decoded dump output
Now let’s explore another sample with both these tools.
Launching the sample inside PDFwalker like below
Let’s jump to Object 10
and it will give following output
As we can see that it references Object 12, so let’s jump to object 12.
And it gives following output. It points to Object 13
Continuing the same process, let’s jump to Object 13.
And below is the embedded object 13.
Now the stream can be decoded and then analyzed further.
Below we can see that the pdf-extract tool has extracted 2 pdf streams, 2 scripts from the sample pdf file and dump it to mentioned locations.
After this, we can use SpiderMonkey to deobfuscate the script located in the sample.dump/scripts folder. Using spider monkey will show us the extraction of JS into eval 1 and eval 2 and after looking at the contents of eval.002.log, it contains the deobfuscated JS as can be seen below.
As discussed earlier, now also we can see that the exploit is targeting the Collab.CollectEmailInfo vulnerability. Please note the use of NOP sled in the different variables above. Now to analyze further we need to copy the shellcode in variable brIW1yTY and convert it into an executable, we will do it using shellcode2exe like below
Since there are %u, so we need to convert the Unicode to hex first like below
Below are the contents of Shellcode-hex
And now let’s convert this into exe using the shellcode2exe.py like below
And it successfully converts the shellcode to exe binary
This exe can be analyzed further, for example, a quick search for ‘HTTP’ in binary reveals
So, this is all for PDF analysis using these tools. There are other tools as well such as PDF Stream Dumper, Peeppdf, AnalyzePDF which can also be used to analyze malicious PDF.