GoFetch

Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers

Overview of GoFetch Attack

GoFetch is a microarchitectural side-channel attack that can extract secret keys from constant-time cryptographic implementations via data memory-dependent prefetchers (DMPs).

We show that DMPs are present in many Apple CPUs and pose a real threat to multiple cryptographic implementations, allowing us to extract keys from OpenSSL Diffie-Hellman, Go RSA, as well as CRYSTALS Kyber and Dilithium.

Update (April 2024)

A HID configuration bit (SYS_APL_HID11_EL1[30]) was found by Hector Martin (marcan) to disable DMPs on m1 and m2 CPUs. Setting this chicken bit requires kernel support that is not available in macOS at this time. See @marcan's post for further details (and thank you Hector!).

Demo Videos.

Go's RSA-2048 Key Extraction on Apple m1

People Behind GoFetch

Frequently Asked Questions

The GoFetch attack is based on a CPU feature called data memory-dependent prefetcher (DMP), which is present in the latest Apple processors. We reverse-engineered DMPs on Apple m-series CPUs and found that the DMP activates (and attempts to dereference) data loaded from memory that "looks like" a pointer. This explicitly violates a requirement of the constant-time programming paradigm, which forbids mixing data and memory access patterns.

To exploit the DMP, we craft chosen inputs to cryptographic operations, in a way where pointer-like values only appear if we have correctly guessed some bits of the secret key. We verify these guesses by monitoring whether the DMP performs a dereference through cache-timing analysis. Once we make a correct guess, we proceed to guess the next batch of key bits. Using this approach, we show end-to-end key extraction attacks on popular constant-time implementations of classical (OpenSSL Diffie-Hellman Key Exchange, Go RSA decryption) and post-quantum cryptography (CRYSTALS-Kyber and CRYSTALS-Dilithium).

We have mounted end-to-end GoFetch attacks on Apple hardware equipped with m1 processors. We also tested DMP activation patterns on other Apple processors and found that m2 and m3 CPUs also exhibit similar exploitable DMP behavior. While we have not tested other m-series variants (e.g., m2 Pro, etc), we hypothesize that since these parts have the same microarchitecture as their simpler counterparts, they are likewise equipped with exploitable DMPs. Finally, we found that Intel's 13th Gen Raptor Lake microarchitecture also features a DMP. However, its activation criteria are more restrictive, making it robust to our attacks.

The Apple m-series DMP was first discovered by Augury, which suggested that DMPs might mix data and addresses under some conditions. However, we found that the DMP activation criteria outlined by Augury are overly restrictive. This prevents Augury's findings from being sufficient to mount attacks on real-world constant-time cryptography.

GoFetch shows that the DMP is significantly more aggressive than previously thought, and thus poses a much greater security risk. Specifically, we find that any value loaded from memory is a candidate for being dereferenced (literally!). This allows us to sidestep many of Augury's limitations and demonstrate end-to-end attacks on real constant-time code.

Modern processors use caches to reduce a program's memory access latency. If data has been accessed before, it gets cached, which makes subsequent accesses to it faster. Since the cache is shared by processes running on the same machine, attackers co-located to the same machine can monitor the cache's state to deduce a victim's access pattern.

Constant-time programming is a paradigm that aims to harden code against side-channel attacks by ensuring that all operations take the same amount of time, regardless of their operands. In particular, constant-time code cannot contain secret-dependent branches, loops, or other control structures. Moreover, as the CPU caches different addresses with attacker-observable latency, constant-time code cannot mix data and addresses in any way and prohibits the use of secret-dependent memory accesses or array indices.

We show that even if a victim correctly separates data from addresses by following the constant-time paradigm, the DMP will generate secret-dependent memory access on the victim's behalf, resulting in variable-time code susceptible to our key-extraction attacks.

Prefetchers are a hardware optimization that predicts memory addresses to be accessed in the near future and fetches the data into the cache accordingly from the main memory. To make a prediction, classical prefetchers use the address trace of previous demand accesses. This strategy performs poorly when it comes to irregular access patterns like linked-list traversals. Aiming to handle such irregular patterns, data memory-dependent prefetchers (DMPs) also consider the content of memory to determine what to fetch, which is capable of capturing those indirect access patterns. Unfortunately, this behavior inherently mixes data and memory addresses at the hardware level, making the entire compute stack non-constant-time, enabling our attack.

We don't know. Our attack relies on the fact that it is possible to craft inputs to control specific intermediate states, making them contain memory addresses in a key-dependent way. The DMP then serves as an oracle, allowing us to learn if the intermediate state indeed looks like a pointer and thus leaks secret key bits. Unfortunately, to assess if an implementation is vulnerable, cryptanalysis and code inspection are required to understand when and how intermediate values can be made to look like pointers in a way that leaks secrets. This process is manual and slow and does not rule out other attack approaches.

Yes, but only on some processors. We observe that the DIT bit set on m3 CPUs effectively disables the DMP. This is not the case for the m1 and m2. Also, Intel's counterpart, DOIT bit, can be used to disable DMP on the Raptor Lake processors.

Update (April 2024): A HID configuration bit (SYS_APL_HID11_EL1[30]) was found by Hector Martin (marcan) to disable DMPs on m1 and m2 CPUs. Setting this chicken bit requires kernel support that is not available in macOS at this time. See @marcan's post for further details (and thank you Hector!).

For users, we recommend using the latest versions of software, as well as performing updates regularly. Developers of cryptographic libraries can either set the DOIT bit and DIT bit bits, which disable the DMP on some CPUs. Additionally, input blinding can help some cryptographic schemes avoid having attacker-controlled intermediate values, avoiding key-dependent DMP activation. Finally, preventing attackers from measuring DMP activation in the first place, for example by avoiding hardware sharing, can further enhance the security of cryptographic protocols.

Yes, check our GitHub repository.

We disclosed our findings to Apple on December 5, 2023 (107 days before public release).

GoFetch in the News

Acknowledgments

This work was partially supported by the Air Force Office of Scientific Research (AFOSR) under award number FA9550-20-1-0425; the Defense Advanced Research Projects Agency (DARPA) under contract numbers W912CG-23-C-0022 and HR00112390029; the National Science Foundation (NSF) under grant numbers 1954712, 1954521, 2154183, 2153388, and 1942888; the Alfred P. Sloan Research Fellowship; and gifts from Intel, Qualcomm, and Cisco.