Synchronization framework

The synchronization framework explicitly describes dependencies between different asynchronous operations in the Android graphics system. The framework provides an API that enables components to indicate when buffers are released. The framework also allows synchronization primitives to be passed between drivers from the kernel to userspace and between userspace processes themselves.

For example, an application may queue up work to be performed in the GPU. The GPU starts drawing that image. Although the image hasn’t been drawn into memory yet, the buffer pointer is passed to the window compositor along with a fence that indicates when the GPU work will finish. The window compositor starts processing ahead of time and passes the work to the display controller. In a similar manner, the CPU work is done ahead of time. Once the GPU finishes, the display controller immediately displays the image.

The synchronization framework also lets implementers leverage synchronization resources in their own hardware components. Finally, the framework provides visibility into the graphics pipeline to help with debugging.

Explicit synchronization

Explicit synchronization enables producers and consumers of graphics buffers to signal when they're finished using a buffer. Explicit synchronization is implemented in kernel-space.

The benefits of explicit synchronization include:

Less behavior variation between devices
Better debugging support
Improved testing metrics

The sync framework has three object types:

sync_timeline
sync_pt
sync_fence

sync_timeline

sync_timeline is a monotonically increasing timeline that vendors should implement for each driver instance, such as a GL context, display controller, or 2D blitter. sync_timeline counts jobs submitted to the kernel for a particular piece of hardware. sync_timeline provides guarantees about the order of operations and enables hardware-specific implementations.

Follow these guidelines when implementing sync_timeline:

Provide useful names for all drivers, timelines, and fences to simplify debugging.
Implement the timeline_value_str and pt_value_str operators in timelines to make debugging output more readable.
Implement the fill driver_data to give userspace libraries, such as the GL library, access to private timeline data, if desired. data_driver lets vendors pass information about the immutable sync_fence and sync_pts to build command lines based on them.
Don't allow userspace to explicitly create or signal a fence. Explicitly creating signals/fences results in a denial-of-service attack that halts pipeline functionality.
Don't access sync_timeline, sync_pt, or sync_fence elements explicitly. The API provides all required functions.

sync_pt

sync_pt is a single value or point on a sync_timeline. A point has three states: active, signaled, and error. Points start in the active state and transition to the signaled or error states. For example, when an image consumer no longer needs a buffer, a sync_pt is signaled so an image producer knows that it's okay to write into the buffer again.

sync_fence

sync_fence is a collection of sync_pt values that often have different sync_timeline parents (such as for the display controller and GPU). sync_fence, sync_pt, and sync_timeline are the main primitives that drivers and userspace use to communicate their dependencies. When a fence becomes signaled, all commands issued before the fence are guaranteed to be complete because the kernel driver or hardware block executes commands in order.

The sync framework allows multiple consumers or producers to signal when they're finished using a buffer, communicating the dependency information with one function parameter. Fences are backed by a file descriptor and are passed from kernel space to userspace. For example, a fence can contain two sync_pt values that signify when two separate image consumers are done reading a buffer. When the fence is signaled, the image producers know that both consumers are done consuming.

Fences, like sync_pt values, start active and change state based on the state of their points. If all sync_pt values become signaled, the sync_fence becomes signaled. If one sync_pt falls into an error state, the entire sync_fence has an error state.

Membership in a sync_fence is immutable after the fence is created. To get more than one point in a fence, a merge is conducted where points from two distinct fences are added to a third fence. If one of those points was signaled in the originating fence and the other wasn't, the third fence also won't be in a signaled state.

To implement explicit synchronization, provide the following:

A kernel-space subsystem that implements the sync framework for a particular hardware driver. Drivers that need to be fence-aware are generally anything that accesses or communicates with the Hardware Composer. Key files include:
- Core implementation:
  - kernel/common/include/linux/sync.h
  - kernel/common/drivers/base/sync.c
- Documentation at kernel/common/Documentation/sync.txt
- Library to communicate with the kernel space in platform/system/core/libsync
The vendor must provide the appropriate synchronization fences as parameters to the validateDisplay() and presentDisplay() functions in the HAL.
Two fence-related GL extensions (EGL_ANDROID_native_fence_sync and EGL_ANDROID_wait_sync) and fence support in graphics driver.

Case study: Implement a display driver

To use the API supporting the synchronization function, develop a display driver that has a display buffer function. Before the synchronization framework existed, this function would receive dma-buf objects, put those buffers on the display, and block while the buffer was visible. For example:

/*
 * assumes buffer is ready to be displayed.  returns when buffer is no longer on
 * screen.
 */
void display_buffer(struct dma_buf *buffer);

With the synchronization framework, the display_buffer function is more complex. While putting a buffer on display, the buffer is associated with a fence that indicates when the buffer will be ready. You can queue up and initiate the work after the fence clears.

Queuing and initiating work after the fence clears doesn't block anything. You immediately return your own fence, which guarantees when the buffer will be off of the display. As you queue up buffers, the kernel lists dependencies with the synchronization framework:

/*
 * displays buffer when fence is signaled.  returns immediately with a fence
 * that signals when buffer is no longer displayed.
 */
struct sync_fence* display_buffer(struct dma_buf *buffer, struct sync_fence
*fence);

Sync integration

This section explains how to integrate the kernel-space sync framework with userspace parts of the Android framework and the drivers that must communicate with one another. Kernel-space objects are represented as file descriptors in userspace.

Integration conventions

Follow the Android HAL interface conventions:

If the API provides a file descriptor that refers to a sync_pt, the vendor's driver or the HAL using the API must close the file descriptor.
If the vendor driver or the HAL passes a file descriptor that contains a sync_pt to an API function, the vendor driver or the HAL must not close the file descriptor.
To continue using the fence file descriptor, the vendor driver or the HAL must duplicate the descriptor.

A fence object is renamed every time it passes through BufferQueue. Kernel fence support allows fences to have strings for names, so the sync framework uses the window name and buffer index that's being queued to name the fence, such as SurfaceView:0. This is helpful in debugging to identify the source of a deadlock as the names appear in the output of /d/sync and bug reports.

ANativeWindow integration

ANativeWindow is fence aware. dequeueBuffer, queueBuffer, and cancelBuffer have fence parameters.

OpenGL ES integration

OpenGL ES sync integration relies on two EGL extensions:

EGL_ANDROID_native_fence_sync provides a way to wrap or create native Android fence file descriptors in EGLSyncKHR objects.
EGL_ANDROID_wait_sync allows GPU-side stalls rather than CPU-side, making the GPU wait for EGLSyncKHR. The EGL_ANDROID_wait_sync extension is the same as the EGL_KHR_wait_sync extension.

To use these extensions independently, implement the EGL_ANDROID_native_fence_sync extension along with the associated kernel support. Next, enable the EGL_ANDROID_wait_sync extension in your driver. The EGL_ANDROID_native_fence_sync extension consists of a distinct native fence EGLSyncKHR object type. As a result, extensions that apply to existing EGLSyncKHR object types don’t necessarily apply to EGL_ANDROID_native_fence objects, avoiding unwanted interactions.

The EGL_ANDROID_native_fence_sync extension employs a corresponding native fence file descriptor attribute that can be set only at creation time and can't be directly queried onward from an existing sync object. This attribute can be set to one of two modes:

A valid fence file descriptor wraps an existing native Android fence file descriptor in an EGLSyncKHR object.
-1 creates a native Android fence file descriptor from an EGLSyncKHR object.

Use the DupNativeFenceFD() function call to extract the EGLSyncKHR object from the native Android fence file descriptor. This has the same result as querying the set attribute, but adheres to the convention that the recipient closes the fence (hence the duplicate operation). Finally, destroying the EGLSyncKHR object closes the internal fence attribute.

Hardware Composer integration

The Hardware Composer handles three types of sync fences:

Acquire fences are passed along with input buffers to the setLayerBuffer and setClientTarget calls. These represent a pending write into the buffer and must signal before the SurfaceFlinger or the HWC attempts to read from the associated buffer to perform composition.
Release fences are retrieved after the call to presentDisplay using the getReleaseFences call. These represent a pending read from the previous buffer on the same layer. A release fence signals when the HWC is no longer using the previous buffer because the current buffer has replaced the previous buffer on the display. Release fences are passed back to the app along with the previous buffers that will be replaced during the current composition. The app must wait until a release fence signals before writing new contents into the buffer that was returned to them.
Present fences are returned, one per frame, as part of the call to presentDisplay. Present fences represent when the composition of this frame has completed, or alternately, when the composition result of the prior frame is no longer needed. For physical displays, presentDisplay returns present fences when the current frame appears on the screen. After present fences are returned, it's safe to write to the SurfaceFlinger target buffer again, if applicable. For virtual displays, present fences are returned when it's safe to read from the output buffer.