The Sound Trigger feature provides apps with the ability to listen for certain acoustic events, like hotwords, in a low-power and privacy-sensitive manner. Example use cases of Sound Trigger are Assistant and Now Playing.
This page gives an overview of the Sound Trigger architecture and its HAL (Hardware Abstraction Layer) interface.
Sound Trigger stack
The Sound Trigger subsystem is built in layers as shown in Figure 1:
Figure 1: Sound Trigger stack
The following list describes each layer shown in Figure 1 in more detail:
The HAL layer (in green) contains the vendor specific code which implements the Sound Trigger HAL (STHAL) interface.
SoundTriggerMiddleware(in yellow) resides above the HAL interface. It communicates with the HAL and is responsible for functionalities such as sharing the HAL between different clients, logging, enforcing permissions and handling compatibility with older HAL versions.
SoundTriggerService(in blue) system resides above the middleware. It facilitates integration with other system features, such as telephony and battery events. It also maintains a database of sound models, indexed by unique IDs.
SoundTriggerServicelayer, the stack (in brown) handles features specific to Assistant and generic apps separately.
The function of the Sound Trigger stack is to deliver discrete events that
represent acoustic, trigger events. In most cases, the Sound Trigger stack does
not deal with audio. Upon receipt of the trigger events, apps get access to the
actual audio stream, surrounding the time of the events, by opening an
AudioRecord object via the Audio framework. The Sound Trigger HAL APIs provide
a handle with the triggered event that is used with the Audio Framework. Hence,
since the Sound Trigger HAL and Audio HAL are connected under the hood, they
typically share a process.
Sound Trigger HAL interface
The Sound Trigger HAL (STHAL) interface is the vendor specific part of the Sound Trigger stack and it handles hardware recognition of hotwords and other sounds. STHAL provides one or more engines with each one running a different algorithm designed to detect a specific class of sounds. When STHAL detects a trigger, it sends an event to the framework and then stops the detection.
The STHAL interface is specified under
ISoundTriggerHw interface supports the ability to have one or more
detection sessions running at a given time and to listen to acoustic events.
A call to
ISoundTriggerHw.getProperties() returns a
containing implementation description and capabilities.
The basic flow of setting up a session is explained as follows in Figure 2:
Figure 2: STHAL state diagram
The following steps describe each state in more detail:
The HAL client loads a model using
loadPhraseSoundModel(). The provided model object indicates which implementation-specific detection algorithm (engine) to use, as well as the parameters applicable for this algorithm. Upon success, these methods return a handle which is used to reference this model in subsequent calls.
Once the model has been successfully loaded, the HAL client calls
startRecognition()to begin detection. Recognition continues to run in the background until one of the following events occurs:
stopRecognition()has been called on this model.
- A detection has occurred.
- Detection is aborted due to resource constraints, for example, when a higher priority use case has been initiated.
In the latter two cases, a recognition event is sent via the callback interface that is registered by the HAL client upon loading. In all cases, after any of these events occur, the detection becomes inactive and no more recognition callbacks are allowed.
The same model can be started again at a later time, and this process can be repeated as many times as needed.
Finally, an inactive model that is no longer needed is unloaded by the HAL client via
Handling of HAL Errors
To ensure reliable and consistent behavior between driver implementations, in Android 11, any non-success error codes returned from the HAL are treated as programming errors, recovery from which requires restarting the HAL process. This is a last-resort recovery strategy and the expectation is that such cases won't occur in a properly working system.