To prevent running arbitrary payloads inside a pVM, Android Virtualization Framework (AVF) uses a layered security approach where each layer adds additional enforcements. Following is a list of AVF security layers:
Android – Android ensures that only apps with pVM permissions are allowed to create or inspect pVMs.
Bootloader – The bootloader ensures that only pVM images signed by Google or device vendors are allowed to boot and respects the Android Verified Boot procedure. This architecture implies apps running pVMs can't bundle their own kernels.
pVM – The pVM provides defense-in-depth, such as with SELinux, for payloads run in the pVM. Defense-in-depth disallows mapping data as executable (
neverallow execmem) and ensures that W^X holds for all file types.
Confidentiality, integrity, and availability, also known as the CIA triad, is a model designed to guide information security policies:
- Confidentiality is a set of rules that limits access to information.
- Integrity is the assurance that the information is trustworthy and accurate.
- Availability is a guarantee of reliable access to the information by authorized entities.
Note that pKVM was designed to maintain confidentiality and integrity, but not availability, of guests. These principles influence design decisions spanning all aspects of the architecture, from the hypervisor to user space components.
Confidentiality and integrity
Confidentiality stems from the memory isolation properties enforced by pKVM hypervisor. pKVM tracks memory ownership of individual physical memory pages and any requests from owners for pages to be shared. pKVM ensures that only entitled pVMs (host and guests) have the given page mapped in their stage 2 page tables that are controlled by the hypervisor. This architecture maintains that the contents of memory owned by a pVM remain private unless the owner explicitly shares it with another pVM.
Restrictions for maintaining confidentiality also extend to any entities in the system that perform memory accesses on behalf of pVMs, namely DMA-capable devices and services running in more privileged layers. SoC vendors must satisfy a new set of requirements before they can support pKVM, otherwise confidentiality can't be provided.
Integrity applies to both data in memory and computation:
- pVMs can't modify each other’s memory without consent.
- pVMs can't influence each other’s CPU state.
These requirements are enforced by the hypervisor. But problems concerning data integrity also arise with virtual data storage where other solutions must be applied, such as dm-verity or AuthFS.
These principles are no different from process isolation offered by Linux where access to memory pages is controlled with stage 1 page tables and the kernel context-switches between processes. However, the EL2 portion of pKVM, which enforces these properties, has roughly half the attack surface compared to the entire Linux kernel (roughly 10 thousand versus 20 million lines of code) and therefore offers stronger assurance to use cases that are too sensitive to rely on process isolation.
Given its size, a pKVM lends itself to formal verification. We're actively supporting academic research, which aims to formally prove these properties on the actual pKVM binary.
The remainder of this document covers the confidentiality and integrity guarantees that each component around a pKVM provides.
pKVM is a KVM-based hypervisor that isolates pVMs and Android into mutually distrusted execution environments. These properties hold in the event of a compromise within any pVM, including the host. Alternative hypervisors that comply with AVF need to provide similar properties.
- A pVM can't access a page belonging to another entity, such as a pVM or hypervisor, unless explicitly shared by the page owner. This rule includes the host pVM and applies to both CPU and DMA accesses.
- Before a page used by a pVM is returned to the host, such as when the pVM is destroyed, it's wiped.
- The memory of all pVMs and the pVM firmware from one device boot is wiped before the OS bootloader runs in the subsequent device boot.
- When a hardware debugger, such as SJTAG, is attached, a pVM can't access its previously minted keys.
- The pVM firmware doesn't boot if it can't verify the initial image.
- The pVM firmware doesn't boot if the integrity of the
- Boot Certificate Chain (BCC) and Compound Device Identifiers (CDIs) provided to a pVM instance can be derived only by that particular instance.
Microdroid is an example of an OS running within a pVM. Microdroid consists of a U-boot-based bootloader, GKI, and a subset of Android userspace, and a payload launcher. These properties hold in the event of a compromise within any pVM, including the host. Alternatives OSs running in a pVM should provide similar properties.
- Microdroid won't boot if
vbmeta\_system.imgcan’t be verified.
- Microdroid won't boot if the APK verification fails.
- The same Microdroid instance won't boot even if the APK was updated.
- Microdroid won't boot if any of the APEXes fail the verification.
- Microdroid won't boot (or boots with a clean initial state) if the
instance.imgis modified outside of the guest pVM.
- Microdroid provides attestation to the boot chain.
- Any (unsigned) modification to the disk images shared to the guest pVM causes an I/O error on the pVM side.
- BCC and CDIs provided to a pVM instance can be derived only by that particular instance.
These are properties maintained by Android as the host but don't hold true in the event of a host compromise:
- A guest pVM can’t directly interact with (for example, make a vsock connection to) other guest pVMs.
- Only the
VirtualizationServicein the host pVM can make a communication channel to a pVM (Note: It can pass the established channel to others).
- Only the apps that are signed with the platform key can request permission to create, own, or interact with pVMs.
- The identifier, called a context identifier (CID), used in setting up vsock connections between host and pVM isn't reused while the host pVM is running. For example, replacing a running pVM with another isn't possible.
In the context of pVMs, availability refers to the host allocating sufficient resources to guests so guests can perform the tasks they were designed to do.
The host's responsibilities include scheduling the pVM’s virtual CPUs. KVM, unlike traditional Type-1 hypervisors, such as Xen, makes the explicit design decision to delegate workload scheduling to the host kernel. Given the size and complexity of today’s schedulers, this design decision significantly reduces the size of the trusted computing base (TCB) and enables the host to make more informed scheduling decisions to optimize performance. However, a malicious host can choose to never schedule a guest.
Similarly, pKVM also delegates physical interrupt handling to the host kernel to reduce complexity of the hypervisor and leave the host in charge of scheduling. Effort is taken to ensure that forwarding of guest interrupts results only in a denial of service (too few, too many, or misrouted interrupts).
Finally, the host's virtual machine monitor (VMM) process is responsible for allocating memory and providing virtual devices, such as a network card. A malicious VMM can withhold resources from the guest.
Although pKVM doesn't provide availability to guests, the design protect the host’s availability from malicious guests because the host can always preempt or terminate a guest and reclaim its resources.
Data is tied to instances of a pVM, and secure boot ensures that access to an instance’s data can be controlled. The first boot of an instance provisions it by randomly generating a secret salt for the pVM and extracting details, such as verification public keys and hashes, from the loaded images. This information is used to verify subsequent boots of the pVM instance and ensure the instance’s secrets are released only to images that pass verification. This process occurs for every loading stage within the pVM: pVM firmware, pVM ABL, Microdroid, and so on.
DICE provides each loading stage with an attestation key pair, the public part of which is certified in the BCC entry for that stage. This key pair can change between boots, so a sealing secret is also derived that is stable for the VM instance across reboots and, as such, is suitable for protecting persistent state. The sealing secret is highly valuable to the VM so it should not be used directly. Instead, sealing keys should be derived from the sealing secret and the sealing secret should be destroyed as early as possible.
Each stage hands a deterministically encoded CBOR object to the next stage. This object contains secrets and the BCC, which contains accumulated status information, such as whether the last stage loaded securely.
When a device is unlocked with
fastboot oem unlock, user data is wiped.
This process protects user data from unauthorized access. Data that is private
to a pVM is also invalidated when a device unlocking occurs.
Once unlocked, the owner of the device is free to reflash partitions that are usually protected by verified boot, including partitions containing the pKVM implementation. Therefore, pKVM on an unlocked device won't be trusted to uphold the security model.
Remote parties can observe this potentially insecure state by inspecting the device’s verified boot state in a key attestation certificate.