This article will talk about a new server side vulnerability that I discovered in the PDF export process.
Many servers are still vulnerable, varying from social networks to financial and governmental websites.

Have you ever surfed the internet and seen a “Download as PDF” button?
Over the past few years, many sites have added the option to export your personal data to an accessible format, as PDF / Word.
As a penetration tester, I have tested a lot of large web applications that included the conversion feature, and was wondering — what happens behind the scenes, does this process broaden the attack surface?

After a quick research, I discovered that the process is very dangerous from a security perspective, and without the appropriate filtering, could expose your application to many vulnerabilities.
In this article, I will try to explain the conversion process, and the potential attacks.

1. The Conversion Process

When a website converts data to PDF, in most cases, what actually happens is the following process

  1. The web application gets the client’s data from a database / directly from the client.
  2. Put the data inside an HTML template*
  3. Sends the custom HTML to an external library
  4. The external library gets the HTML, does its magic and returns a PDF file
  5. The client downloads the PDF file.

*In some cases, the web application downloads the whole HTML, including the personal data, directly from the website itself with HTTP (e.g., from the profile page of the user)

The most interesting part is the conversion from the custom HTML to the PDF file by the external library.
I discovered that there are many players in the HTML to PDF market.

2. The attack vector

The conversion process takes an HTML page, parses all the elements inside it, and converts each one to a new PDF element.
The common external libraries are full of features, and support many HTML tags. Some of them even support CSS and Javascript.
With this understanding, think about the following scenario: what would happen, if an attacker succeeds to inject a malicious HTML tag to the conversion process?
If the web application does not encode or filter the user’s input, the server is exposed to a wide range of vulnerabilities.

2.1. Arbitrary file download

One of the most common vulnerabilities on the web, is the option to download an arbitrary file from a server. This situation constitutes a critical security breach, because it gives an attacker the ability to download sensitive data from the server. e.g., log files that contain users’ data, configuration files that contain connection strings and encryption keys, users’ private files, etc.
If we could inject an HTML tag to the conversion process, in some libraries, we can download almost any file from the web server. For this attack vector, we should use these tags:

  • iframe / frame
  • object
  • fonts (CSS)

Example from the real world:

1. The HTTP Request

2. The PDF Response

2.2 Internal network exposure (SSRF)

Sometimes during a penetration test, after exposing a few vulnerabilities I come to a dead end. In many cases, what separates me from a significant progress is the inability to disclose information about the server and the internal network.
The “Export Injection”, in all the libraries, gives us the option to obtain a lot of information about the server. Some techniques that have occurred to me:

  • Internal port scanning: by the delay of the response from the web server, we can reveal if a port is open or closed. For example, if we send a malicious IMG tag:
  • <img src=””/> — Delay of 2.3 seconds (The port is open)
  • <img src=””/> — Delay of 4.8 seconds (The port is close)
  • Internal resources access: we can use the Object, Iframe and Frame tags to access internal HTTP interfaces and watch the responses. For example:
  • Injection of:
  • <object data=””/>
  • Discover the real IP address of the website: We could make the site perform an HTTP request to any server on the internet, even to our server. I used the “iplogger” site to log the IP address of the attacked website:
  • <img src=””/>

With this technique, we can expose the real IP of the web server, and perform an effective port scan.

2.3. Effective Denial of Service (DOS)

The vulnerability exposes the site to a potential DOS attack. The external libraries support parsing complex data (Images, fonts and more). An attacker could abuse this mechanism and make the server work hard, if he sent one the following tags:

  • <img src=””/>
    Causes the web application to download a heavy file.
  • <iframe src=””/>
    Causes the web application to enter to a long HTTP redirection loop.

The way to perform a DOS attack changes from library to library.

3. How to protect yourself?

It’s quite easy to prevent the vulnerability.
As a concept, you should never pass users’ input to an external library without a thought. Always think — “What an attacker would do?”
In this specific case, you should encode the input before passing it to the external conversion libraries.
HTML Encode should work and prevent the potential vulnerabilities in most cases.

Vulnerable Libraries:

To understand which external library has been used, just open the PDF file with Hex Editor, and search for strings like ‘Creator’, or ‘Author’

4. Conclusions:

I hope that my quick research will increase the awareness for this vulnerability. The attack surface is broad, and I mentioned only the basic vectors. I hope that the article will open a door for future researches about the conversion process.

I love to learn, build and break things. Head of Security Research @; Security Consultant @ Tangent Logic

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store