One of the most unknown file formats for most IT people is the PDF (Portable Document Format) format type. Originally developed as a universally compatible file format based on the PostScript format, it has become a highly-regarded international format to share documents and information in a structured way. PDF documents are easy to create and use with specific software to read them, and its content and layout is displayed the exact same way no matter which Operating System, device or software is used to view them. Another advantage is that they can be compressed into a file size that is easy to exchange while retaining image quality. They are also multi-dimensional, allowing for the integration of many different types of content (text, images, graphic vectors, videos, audio, animations, forms, hyperlinks, buttons. All this flexibility means an increase in complexity in the background. This is why PDFs are hard to edit, transform or extract content from. In this is also why the format and its structure are widely unknown.
I will use a recent file I have received for analysis as a sample on how to do an analysis of Acroform objects.
The file was received yesterday by a user, attached in an email with the subject “Global AGILE Development & Innovation Summit – London, UK”. Initially, nothing suggested this could be a malicious email. But the fact the email was sent from a different domain than the link that appeared in the body of the email was enough for the user to become suspicious of it and tag it for investigation before opening.
With an initial triage of the file, I got the MD5 hash, I confirmed the Magic number was for a PDF file, version 1.7, and the fact that it appeared already in VirusTotal with zero detections so far.
I recommend a similar type of triage for all files that require deeper investigation, even though someone else has already performed some kind of previous analysis on the file. This way, potential mistakes are avoided.
It is also good to rename the file extension to .VIR to reduce the risk of double-clicking on the file to investigate and executing any potential malicious code.
First Step with every PDF file investigation
Right after confirming it as a PDF file, the next step is always to use the tool PDFiD from Didier Stevens, with the plugin_triage option, to determine in a quick way if the document contains suspicious tags that require further investigation.
As you can see in the image, this file did not contains many objects in 9 different pages, 16 object streams, and only 1 really suspicious element: an Acroform.
This means we can not easily dismiss the file as benign and we need to go deep in the investigation.
Finding what is inside the Acroform
How? By using yet another tool by Didier Stevens, called pdf-parser. The tool has a search option that looks for strings or indirect objects (not inside streams). The search is not case-sensitive and is susceptible to obfuscation techniques, but does its job when attempting to find any object related for instance to Acroforms.
The image above shows how I used the search option (-s) to find where was the Acroform of the sample in question. In this case, object 459 contained an Acroform and referenced 4 other objects. Objects 563, 445, 456 and 453 were actually empty or non-existent.
But object 202 contained some kind of data. In order to find out what that data could be, we can use the options filter (-f) and raw (-w) options to check the content of potential encoded and compressed (filtered) data in the objects (see first example in the below image).
Or we can check the content of objects without streams or with streams without filtes using the content option (-c), as it is shown in the second example in the above image.
The sample analysed was clearly a non-malicious PDF file.
Examples of malicious code inside Acroforms
Other interesting links to read about PDF
Texts with in-depth explanation of the PDF file format to understand the files better when doing analysis on them:
Other resources for analyzing PDF files: