Analysis of Malicious Documents 2023
In the last article, we covered how to do Analysis of Malicious Documents. In this final part of the article series, we will continue to analyze PDF documents using additional tools. In this article, review the Origami framework that can be used to inspect and extract various objects from PDF documents.
For a refresher, let’s review the basic keywords related to parsing PDF documents.
Analysis of Malicious Documents
/AA: Defines automatic actions that are inserted into the document when the user opens the document. It should be noted that events are also declared inside this like cursor movement to trigger a specific action.
- /AcroForm: Indicates whether Adobe forms are used in PDF documents or not.
- /ObjStm: Used to define an object stream that can hide specific objects. We will see that later in the series.
- /JS: Embedded JavaScript in the document.
- /GoTo*: Redirected to the specified destination in the PDF file.
- /URI: Resource accessed by URL
- /SubmitForm and /GoToR: Indicates data submitted to a URL.
- /Launch: Launches the program.
Let’s start using different tools within Origami.
First, let’s look at PDF Walker, which is a GUI program that is part of the Origami framework. Below is the result when the pdf is loaded into PDFwalker.
As we can see, PDFwalker has extracted all the embedded objects from the PDF. Now we need to look for a JavaScript object, so let’s look at the JavaScript references first.

This search will give us the reference of object 32

To view this object, click on Document > Jump to Object and type the object number like below

This will show us the Object 32 stream

It must be noted that PDF Walker identifies the encoded algorithm used in the PDF document and applies necessary decoding. For this document, PDFwalker identifies FlateDecode and applies the necessary filter

Above we can see the decoded stream. We can dump this stream by right-clicking the stream and dump it.

Moreover, below is the decoded dump output

Origami also includes a command line tool PDFextract which automatically locates, decodes and extracts JavaScript code. It must be noted that PDFextract can also extract embedded images and file attachment. To instruct the tool to extract only JavaScript, we must supply this with -j parameter.

Moreover, it will create a direct <filename>. dump >script and will dump the extracted script inside it. Below is an example of extracted JavaScript from the sample.pdf.7

Now let’s explore another sample with both these tools.
Launching the sample inside PDFwalker like below

And now search for JavaScript as is done earlier. It will give reference to Object 10

Let’s jump to Object 10

and it will give following output

As we can see that it references Object 12, so let’s jump to object 12.

And it gives following output. It points to Object 13

Continuing the same process, let’s jump to Object 13.

And below is the embedded object 13.

Now the stream can be decoded and then analyzed further.
Let’s analyze the same PDF using pdf-extract. This time we will extract everything in the sample PDF and not just JavaScript like below

Below we can see that the pdf-extract tool has extracted 2 pdf streams, 2 scripts from the sample pdf file and dump it to mentioned locations.

After this, we can use SpiderMonkey to deobfuscate the script located in the sample.dump/scripts folder. Using spider monkey will show us the extraction of JS into eval 1 and eval 2 and after looking at the contents of eval.002.log, it contains the deobfuscated JS as can be seen below.

As discussed earlier, now also we can see that the exploit is targeting the Collab.CollectEmailInfo vulnerability. Please note the use of NOP sled in the different variables above. Now to analyze further we need to copy the shellcode in variable brIW1yTY and convert it into an executable, we will do it using shellcode2exe like below

Since there are %u, so we need to convert the Unicode to hex first like below

Below are the contents of Shellcode-hex

And now let’s convert this into exe using the shellcode2exe.py like below

And it successfully converts the shellcode to exe binary

This exe can be analyzed further, for example, a quick search for ‘HTTP’ in binary reveals

Reveals

So, this is all for PDF analysis using these tools. There are other tools as well such as PDF Stream Dumper, Peeppdf, AnalyzePDF which can also be used to analyze malicious PDF.
Also Read:Everything you need to know about Ethical Hacking as a Career by Blackhat Pakistan 2023