This page describes best practices for implementing Neural Networks API (NNAPI) drivers to allow for broad adoption of the NNAPI by app developers.
Keeping startup times short
If your driver transforms the weights of a model on first use, make sure the driver supports compilation caching, which reduces the time used for compilation when an app starts. This is important as apps might avoid using hardware acceleration if start-up times are too long. For example, some apps have more than 100 MB of weights and transforming these each time the app launches is wasteful.
Reducing minimal latency
To ensure that models use hardware acceleration, it's important to reduce the minimal latency in drivers. Many apps use small models that are executed multiple times and if the minimal latency to execute a workload is too high, such as a few milliseconds, models might run the workload on the CPU, which only takes one or two milliseconds, instead of using hardware accelerations. Be careful of costly thread synchronization.
Using the NN HAL SchedTune group
From Android 11 or higher, AOSP includes a dedicated
group that allows interprocess NN HAL processes to use big
cores, similar to same-process implementation within the predefined
top-app cgroup. Using this
SchedTune group reduces driver overhead, especially for small models.
To use the SchedTune group, add the following line to the
init.rc file of
the NN HAL process: