From Android 10, the Neural Networks API (NNAPI)
provides functions to support
caching of compilation artifacts, which reduces the time used for compilation
when an app starts. Using this caching functionality, the driver doesn't
need to manage or clean up the cached files. This is an optional feature that
can be implemented with NN HAL 1.2. For more information about this function,
see
ANeuralNetworksCompilation_setCaching.
The driver can also implement compilation caching independent of the NNAPI. This can be implemented whether the NNAPI NDK and HAL caching features are used or not. AOSP provides a low-level utility library (a caching engine). For more information, see Implementing a caching engine.
Workflow overview
This section describes general workflows with the compilation caching feature implemented.
Cache information provided and cache hit
- The app passes a caching directory and a checksum unique to the model.
- The NNAPI runtime looks for the cache files based on the checksum, the execution preference, and the partitioning outcome and finds the files.
- The NNAPI opens the cache files and passes the handles to the driver
with
prepareModelFromCache.
- The driver prepares the model directly from the cache files and returns the prepared model.
Cache information provided and cache miss
- The app passes a checksum unique to the model and a caching directory.
- The NNAPI runtime looks for the caching files based on the checksum, the execution preference, and the partitioning outcome and doesn't find the cache files.
- The NNAPI creates empty cache files based on the checksum, the execution
preference, and the partitioning, opens the cache files, and passes the
handles and the model to the driver with
prepareModel_1_2.
- The driver compiles the model, writes caching information to the cache files, and returns the prepared model.
Cache information not provided
- The app invokes compilation without providing any caching information.
- The app passes nothing related to caching.
- The NNAPI runtime passes the model to the driver with
prepareModel_1_2.
- The driver compiles the model and returns the prepared model.
Cache information
The caching information that is provided to a driver consists of a token and cache file handles.
Token
The
token
is a caching token of length
Constant::BYTE_SIZE_OF_CACHE_TOKEN
that identifies the prepared model. The same token is provided when saving the
cache files with prepareModel_1_2 and retrieving the prepared model with
prepareModelFromCache. The client of the driver should choose a token with a
low rate of collision. The driver can't detect a token collision. A collision
results in a failed execution or in a successful execution that produces
incorrect output values.
Cache file handles (two types of cache files)
The two types of cache files are data cache and model cache.
- Data cache: Use for caching constant data including preprocessed and transformed tensor buffers. A modification to the data cache shouldn't result in any effect worse than generating bad output values at execution time.
- Model cache: Use for caching security-sensitive data such as compiled executable machine code in the device's native binary format. A modification to the model cache might affect the driver's execution behavior, and a malicious client could make use of this to execute beyond the granted permission. Thus, the driver must check whether the model cache is corrupted before preparing the model from cache. For more information, see Security.
The driver must decide how cache information is distributed between the two
types of cache files, and report how many cache files it needs for each type
with
getNumberOfCacheFilesNeeded.
The NNAPI runtime always opens cache file handles with both read and write permission.
Security
In compilation caching, the model cache may contain security-sensitive data such as compiled executable machine code in the device's native binary format. If not properly protected, a modification to the model cache may affect the driver's execution behavior. Because the cache contents are stored in the app directory, the cache files are modifiable by the client. A buggy client may accidentally corrupt the cache, and a malicious client could intentionally make use of this to execute unverified code on the device. Depending on the characteristics of the device, this may be a security issue. Thus, the driver must be able to detect potential model cache corruption before preparing the model from cache.
One way to do this is for the driver to maintain a map from the token to a
cryptographic hash of the model cache. The driver can store the token and the
hash of its model cache when saving the compilation to cache. The driver checks
the new hash of the model cache with the recorded token and hash pair when
retrieving the compilation from cache. This mapping should be persistent across
system reboots. The driver can use the
Android keystore service, the utility library in
framework/ml/nn/driver/cache,
or any other suitable mechanism to implement a mapping manager. Upon driver
update, this mapping manager should be reinitialized to prevent preparing cache
files from an earlier version.
To prevent time-of-check to time-of-use (TOCTOU) attacks, the driver must compute the recorded hash before saving to file and compute the new hash after copying the file content to an internal buffer.
This sample code demonstrates how to implement this logic.
bool saveToCache(const sp<V1_2::IPreparedModel> preparedModel,
                 const hidl_vec<hidl_handle>& modelFds, const hidl_vec<hidl_handle>& dataFds,
                 const HidlToken& token) {
    // Serialize the prepared model to internal buffers.
    auto buffers = serialize(preparedModel);
    // This implementation detail is important: the cache hash must be computed from internal
    // buffers instead of cache files to prevent time-of-check to time-of-use (TOCTOU) attacks.
    auto hash = computeHash(buffers);
    // Store the {token, hash} pair to a mapping manager that is persistent across reboots.
    CacheManager::get()->store(token, hash);
    // Write the cache contents from internal buffers to cache files.
    return writeToFds(buffers, modelFds, dataFds);
}
sp<V1_2::IPreparedModel> prepareFromCache(const hidl_vec<hidl_handle>& modelFds,
                                          const hidl_vec<hidl_handle>& dataFds,
                                          const HidlToken& token) {
    // Copy the cache contents from cache files to internal buffers.
    auto buffers = readFromFds(modelFds, dataFds);
    // This implementation detail is important: the cache hash must be computed from internal
    // buffers instead of cache files to prevent time-of-check to time-of-use (TOCTOU) attacks.
    auto hash = computeHash(buffers);
    // Validate the {token, hash} pair by a mapping manager that is persistent across reboots.
    if (CacheManager::get()->validate(token, hash)) {
        // Retrieve the prepared model from internal buffers.
        return deserialize<V1_2::IPreparedModel>(buffers);
    } else {
        return nullptr;
    }
}
Advanced use cases
In certain advanced use cases, a driver requires access to the cache content (read or write) after the compilation call. Example use cases include:
- Just-in-time compilation: The compilation is delayed until the first execution.
- Multi-stage compilation: A fast compilation is performed initially and an optional optimized compilation is performed at a later time depending on the frequency of use.
To access the cache content (read or write) after the compilation call, ensure that the driver:
- Duplicates the file handles during the invocation of
prepareModel_1_2orprepareModelFromCacheand reads/updates the cache content at a later time.
- Implements file locking logic outside of the ordinary compilation call to prevent a write occurring concurrently with a read or another write.
Implement a caching engine
In addition to the NN HAL 1.2 compilation caching interface, you can also find a
caching utility library in the
frameworks/ml/nn/driver/cache
directory. The
nnCache
subdirectory contains persistent storage code for the driver to implement
compilation caching without using the NNAPI caching features. This form of
compilation caching can be implemented with any version of the NN HAL. If the
driver chooses to implement caching disconnected from the HAL interface,
the driver is
responsible for freeing cached artifacts when they are no longer needed.
