Address space layout randomization or ASLR in short is a first line of defense against attackers targeting Internet users. ASLR randomizes the location of an application’s code and data in the virtual address space in order to make it difficult for attackers to leak or manipulate the data or reuse the code in order to compromise the application. Combined with the no execution bit for data enforced by all modern processors, ASLR makes it harder to compromise systems.
In the past, researchers have shown that ASLR can be broken in some instances. For example, a local attacker with native code execution can break kernel-level ASLR. In more serious environments such as the browser, however, ASLR is still considered a good defense.
Last year, our award-winning attack showed that a JavaScript-enabled attacker can break ASLR in Microsoft Edge using a side channel introduced by memory deduplication. Microsoft quickly moved to disable memory deduplication to preserve the security of its users. In this project, we show that the limitations of ASLR is fundamental to how modern processors manage memory and build an attack that can fully derandomize ASLR from JavaScript without relying on any software feature.
The AnC attack
The memory management unit (MMU) of modern processors uses the cache hierarchy of the processor in order to improve the performance of page table walks. This is fundamental to efficient code execution in modern processors. Unfortuantely, this cache hierarchy is also shared by untrustred applications, such as JavaScript code running in the browser.
We have built a side-channel attack, specifically an EVICT+TIME cache attack, that can detect which locations in the page table pages are accessed during a page table walk performed by the MMU. For example, on the x86_64 architecture, our attack can find the offsets that are accessed by the MMU for each of the four page table pages. The offset within each page breaks nine bits of entropy so even a perfect ASLR implementation with 36 bits of entropy is not safe.
Our attack, which we called ASLR⊕Cache (or AnC for short), first flushes part of the last level cache and then times the MMU’s page table walk performed due to a memory access. This already finds cache lines of interest in the page table page. To further distinguish which cache lines belong to which page table level and find the page table entry offset (e.g., 8 bytes on x86_64) within the cache line (e.g., 64 bytes on x86_64), AnC accesses various offsets within the target buffer or code.
We have implemented AnC natively and in JavaScript! We use our native version to establish that MMU’s signal can be observed on 22 different microarchitectures and use our JavaScript version to find code and heap pointers in Firefox and Chrome browsers. For more information about AnC and our experiments please refer to our papers. Below you can see how fast AnC reduces the entropy of data pointers in the browser. Note that most of the remaining entropy bits are known.
This is what it might look like
Really computing real addresses from Javascript using a micro-architectural phenomenon. See also the video demo below.
Precise Timing from JavaScript
The AnC attack requires a precise timer in JavaScript to tell the difference between a cached and uncached memory access. Recently, browser vendors have broken the precise JavaScript timer, performance.now(), in order to thwart cache attacks.
We built two new timers that bypasses this mitigation in order to make the AnC attack work. Our new timers not only make the AnC attack possible, they also revive the previously known cache attacks from the browser. For more information, we invite you to read Section 4 of our NDSS’17 paper.
Reverse Engineering Page Table Caches
During the development of AnC, we noticed that various processors implement page table caches with different behaviors. To make the AnC attack robust we needed to flush these caches. Unfortuantely, these caches are not properly documented and their properties are different on each microarchitecture.
We have built a new technique that retrofits the AnC attack in order to reverse engineer the properties of these page table caches on 22 different microarchitectures. Our findings not only makes the AnC attack robust, but it also benefits all recent attacks that rely on flushing these caches to work properly, such as Rowhammer attacks that manipulate page tables. For more information, we invite you to read our paper on reverse engineering page table caches that is currently under submission.
Papers
Video Demo and Source Code
You can see the AnC attack in action derandomizing the full 64 bits of entropy in Firefox below.
We are releasing the native version of AnC as a library to reverse engineer page table caches. We are not going to release the JavaScript version of AnC in order to protect Internet users from the AnC attack. However, we predict that any sufficiently advanced adversary can replicate our results in a few weeks with the knowledge from our NDSS’17 paper.
You can find the source code of the native AnC packaged nicely to reverse engineer and flush page table caches (AKA RevAnc) on github.
ASLR gone in 25 seconds
The attack time is variable. An example of a faster version than the above:
Affected Architectures
We could observe the MMU signal in the following 22 microarchitectures from Intel, ARM and AMD processors. There was no architecture that we tried without observing the MMU signal.
CPU Model
Microarchitecture
Year
Intel Xeon E3-1240 v5
Skylake
2015
Intel Core i7-6700K
Skylake
2015
Intel Celeron N2840
Silvermont
2014
Intel Xeon E5-2658 v2
Ivy Bridge EP
2013
Intel Atom C2750
Silvermont
2013
Intel Core i7-4500U
Haswell
2013
Intel Core i7-3632QM
Ivy Bridge
2012
Intel Core i7-2620QM
Sandy Bridge
2011
Intel Core i5 M480
Westmere
2010
Intel Core i7 920
Nehalem
2008
AMD FX-8350 8-Core
Piledriver
2012
AMD FX-8320 8-Core
Piledriver
2012
AMD FX-8120 8-Core
Bulldozer
2011
AMD Athlon II 640 X4
K10
2010
AMD E-350
Bobcat
2010
AMD Phenom 9550 4-Core
K10
2008
Allwinner A64
ARM Cortex A53
2016
Samsung Exynos 5800
ARM Cortex A15
2014
Samsung Exynos 5800
ARM Cortex A7
2014
Nvidia Tegra K1 CD580M-A1
ARM Cortex A15
2014
Nvidia Tegra K1 CD570M-A1
ARM Cortex A15; LPAE
2014
Disclosure Process
We started the disclosure process in coordination with the Dutch National Cyber Security Center (NCSC) in October of 2016 with a disclosure date set to February 15, . This was a challenging process involving many different parties including the processor, browser and OS vendors.
Some processor vendors agreed with our findings that ASLR is no longer a viable security defense at least for the browsers. Others did not dispute our findings. From the browser vendors, most found AnC relevant.
To track the developments related to AnC multiple CVEs were assigned by MITRE. CVE-2017-5925 is assigned to track the developments for Intel processors, CVE-2017-5926 for AMD processors and CVE-2017-5927 for ARM processors. Finally, CVE-2017-5928 is assigned to track the timer issues that we found in multiple browsers.
We contributed with a plan of action to various vendors affected by AnC. While various vendors may have followed our suggestions at their discretion, we worked directly with the Apple Product Security Team in order to harden WebKit against the AnC attack.
We can think of two possible mitigation techniques at the CPU level:
1) Addition of separate page table page caches without going through the cache hierarchy shared by the cores. This has already been suggested and implemented in Intel Atom, but has since been abandoned. What are the implications on performance and chip real-estate if separate caches are provided and the hardware page walker does not interact with the cache hierarchy?
2) Cache partitioning between the the cores and their MMUs. Intel recently implemented Cache Allocation Technology (CAT). CAT can be used to isolate MMU fills into the cache hierarchy from the rest of (untrusted) memory accesses in e.g., a browser setting where AnC has the highest impact. Is the performance overhead noticeable with the right partitioning of the caches between MMU and the rest of the system? Alternatively, CAT can be used to isolate web browsers and other applications running untrusted code in a sandbox from the rest of the system.
While we do not expect CPU providers to fully address AnC within the disclosure window, they can assess whether AnC can be mitigated with negligible cost or performance overhead.
Browser Vendors
We believe browser vendors are in the first position to reduce the impact of AnC. We suggest the followings:
1) The time-to-tick and shared-memory-counter timing methods should be prevented. Browsers should implement jitter around the coarse-grained timer and abandon sharing across JavaScript threads. This will address the current AnC exploit, although it cannot guarantee an attacker won’t be able to find other ways to craft a sufficiently precise timer.
2) Browsers should become more aware of allocations performed from JavaScript. For example, Firefox should not allow large virtual memory allocations and Chrome should make sure that new allocations do not cross page tables at the top two levels (to preserve at least 6 bits of entropy on x86_64). Counter-intuitively, in Chrome, the internal allocator should not use the entropy bits from the top-level page table (upper 9 bits) and instead rely on the operating system for the top 9 bits. See below for the reason.
OS Vendors
During the development of AnC, we noticed that breaking the entropy bits in the top level of the page table tree is much harder than the lower levels due to two reasons: 1) there is often activity from other elements in the JavaScript engine, blinding the signal from the top level page table pages. 2) Even if the signal from the top level is observed, AnC requires multi-TB virtual allocations to uniquely identify the page table slot to fully derandomize ASLR.
Currently, by default Linux uses 28 bits of entropy. We suggest to increase this to possibly 35 bits of entropy to preserve more ASLR bits from top level page table pages. Further, in Windows, there is only 24 bits of entropy allocated to heap objects. Microsoft should make better use of the available entropy bits in the virtual addresses to preserve ASLR bits from top-level page table pages.
With better ASLR entropy gained from the top-level page table page, browsers should allocate all their objects from the same top level page table cacheline (= 8 slots * 512GB of virtual address space), but internally randomize objects for the lowest three page table pages for other protections. Overall, we think that in a browser setting, the only ASLR entropy bits that we can hope to preserve are from the top-level page table page.
2) Long-term Mitigation
CPU Vendors
If possible, a mitigation should be implemented for the next-generation of the processors as outlined earlier.
Browser Vendors
With some effort, it should be possible to avoid AnC or AnC-like attacks to build a sliding primitive by ensuring that JavaScript objects do not map sequentially to the underlying virtual memory when generating JIT code. There are performance implications that should be investigated further.
OS Vendors
OS vendors should adopt the improved ASLR (gaining entropy bits from the top-level page table) in their next release.
Reception
There has been quite some attention from both security community and media outlets about AnC and its impact. We are tracking them here. NCSC released an informative advisory on AnC (in Dutch).
Frequently Asked Questions
How can I protect myself as a user against the AnC attack?
You unfortunately cannot as AnC exploits the fundamental properties of your processor. You can however stop untrusted JavaScript code from being executed on your browser using a plugin such as NoScript.
I am a {processor, browser, OS} vendor. How can I protect my users against AnC?
You should have already been informed on our suggested plan of actions. If not, you can find it under “Disclosure Process” on this page.
How is this attack different than Dedup Est Machina?
Quite different! Dedup Est Machina relied on a software side channel to leak ASLR and it stopped to work as soon as Microsoft disabled memory deduplication. Furthermore, it took 30 to 45 minutes to leak code and heap pointers. The AnC attack on the other hand, only relies on fundamental hardware properties and cannot be simply mitigated by disabling features. Furthermore, our prototype can fully break heap and code pointers in only 90 seconds. We expect with some effort this can further be reduced to mere seconds.
How is AnC different that the recent attack on Intel’s branch prediction table?
The recent Jump over ASLR paper abuses a flaw in the Intel processors that share sensitive branch predictions across hardware threads. To exploit the flaw, the attacker needs to run on the same core as the victim and more importantly, the attacker needs native code execution. AnC, on the other hand, exploits a fundamental mechanism that is in place for efficient code execution that is present in all modern processors. Hence, it is not straightforward to “fix” AnC. Furthermore, AnC runs from JavaScript and does not need to make assumptions on core placement, significantly increasing its impact over Jump over ASLR.
Acknowledgements
This work was supported by the European Commission through project H2020 ICT-32-2014 “SHARCS” under Grant Agreement No. 644571 and by the Netherlands Organisation for Scientific Research through grant NWO 639.023.309 VICI “Dowsing”. The public artifacts reflect only the authors’ view. The funding agencies are not responsible for any use that may be made of the information they contain.
Systems and Network Security Group at VU Amsterdam