Starting from Android 11, the NNAPI offers better quality of service (QoS) by allowing an app to indicate the relative priorities of its models, the maximum amount of time expected for a given model to be prepared, and the maximum amount of time expected for a given execution to be completed. Further, Android 11 introduces additional NNAPI error values enabling a service to more accurately indicate what went wrong when a failure occurs so that the client app can better react and recover.
For Android 11 or higher, models are prepared with a priority in the NN HAL 1.3. This priority is relative to other prepared models owned by the same app. Higher-priority executions can use more compute resources than lower-priority executions, and can preempt or starve lower-priority executions.
The NN HAL 1.3 call that includes
Priority as an explicit argument is
Priority in the cache arguments.
There are many possible strategies for supporting priorities depending on the capabilities of the driver and accelerator. Here are several strategies:
- For drivers that have built-in priority support, directly propagate
Priorityfield to the accelerator.
- Use a per-app priority queue to support different priorities even before an execution reaches the accelerator.
Pause or cancel low-priority models that are currently being executed to free the accelerator to execute high-priority models. Do this by either inserting checkpoints in low-priority models that, when reached, query a flag to determine whether the current execution should be halted prematurely or by partitioning the model into submodels and querying the flag between submodel executions. Note that the use of checkpoints or submodels in models prepared with a priority can introduce additional overhead that isn't present for models without a priority in versions lower than NN HAL 1.3.
- To support preemption, preserve the execution context including the next operation or sub-model to be executed and any relevant intermediate operand data. Use this execution context to resume the execution at a later time.
- Full preemption support isn't necessary, so the execution context doesn't need to be preserved. Because NNAPI model executions are deterministic, executions can be restarted from scratch at a later time.
Android enables services to differentiate between different calling apps through
the use of an AID (Android UID). HIDL has built-in mechanisms to retrieve the
calling app's UID through the method
::android::hardware::IPCThreadState::getCallingUid. A list of AIDs can be
Starting from Android 11, model preparation and
executions can be launched with an
OptionalTimePoint deadline argument. For
drivers that can estimate how long a task takes, this deadline allows the driver
to abort the task before it starts if the driver estimates that the task can't
be completed before the deadline. Similarly, the deadline allows the driver to
abort an ongoing task that it estimates won't be completed before the deadline.
The deadline argument doesn't force a driver to abort a task if the task isn't
complete by the deadline or if the deadline has passed. The deadline argument
can be used to free up compute resources within the driver and return control
to the app faster than is possible without the deadline.
The NN HAL 1.3 calls that include
OptionalTimePoint deadlines as an argument
To see a reference implementation of the deadline feature for each of the above
methods, see the NNAPI sample driver at
Android 11 includes four error code values in
NN HAL 1.3 to improve error reporting, allowing drivers to better communicate
their state and apps to recover more gracefully. These are the error code
In Android 10 or lower, a driver could only indicate a failure through the
GENERAL_FAILURE error code. From Android 11, the
MISSED_DEADLINE error codes can be used to indicate that the workload was
aborted because the deadline was reached or because the driver predicted the
workload wouldn't complete by the deadline. The two
codes can be used to indicate that the task failed because of a resource
limitation within the driver, such as the driver not having enough memory for
TRANSIENT version of both errors indicates that the problem is temporary,
and that future calls to the same task might succeed after a short delay. For
example, this error code should be returned when the driver is busy with prior
long-running or resource-intensive work, but that the new task would complete
successfully if the driver wasn't busy with the prior work. The
version of both errors indicates that future calls to the same task are always
expected to fail. For example, this error code should be returned when the
driver estimates the task wouldn't complete by the deadline even under perfect
conditions, or that the model is inherently too large and exceeds the driver's
The quality of service functionality is tested in the NNAPI VTS tests
VtsHalNeuralnetworksV1_3Target). This includes a set of tests for validation
TestGenerated/ValidationTest#Test/) to ensure that the driver rejects invalid
priorities and a set of tests called
TestGenerated/DeadlineTest#Test/) to ensure that the driver handles deadlines