Use the car watchdog to help debug the VHAL. Car watchdog monitors
the health of — and kills — unhealthy processes. For a process to be monitored
by the car watchdog, the process must be registered with the car watchdog. When
the car watchdog kills unhealthy processes, car watchdog writes the status of
the processes to data/anr
as with other Application Not Responding
(ANR) dumps. Doing so facilities the debugging process.
This article describes how vendor HALs and services can register a process with the car watchdog.
Vendor HAL
Typically, the vendor HAL uses a thread pool for hwbinder
. However,
the car watchdog client communicates with the car watchdog daemon through
binder
, which differs from hwbinder
. Therefore,
another thread pool for binder
is in use.
Specify car watchdog aidl in makefile
- Include
carwatchdog_aidl_interface-ndk_platform
inshared_libs
:Android.bp
:cc_defaults { name: "vhal_v2_0_defaults", shared_libs: [ "libbinder_ndk", "libhidlbase", "liblog", "libutils", "android.hardware.automotive.vehicle@2.0", "carwatchdog_aidl_interface-ndk_platform", ], cflags: [ "-Wall", "-Wextra", "-Werror", ], }
Add an SELinux Policy
-
Allow
system_server
to kill your HAL. If you don't havesystem_server.te
, create one. It is strongly recommended you add an SELinux policy to each device. -
Allow the vendor HAL to use binder (
binder_use
macro) and add the vendor HAL to thecarwatchdog
client domain (carwatchdog_client_domain
macro). See the code below forsystemserver.te
andvehicle_default.te
:system_server.te
# Allow system_server to kill vehicle HAL allow system_server hal_vehicle_server:process sigkill;
hal_vehicle_default.te
# Configuration for register VHAL to car watchdog carwatchdog_client_domain(hal_vehicle_default) binder_use(hal_vehicle_default)
Implement a client class by inheriting BnCarWatchdogClient
-
In
checkIfAlive
, perform health checking. For example, post to the thread loop handler. If healthy, callICarWatchdog::tellClientAlive
. See the code below forWatchogClient.h
andWatchogClient.cpp
:WatchogClient.h
class WatchdogClient : public aidl::android::automotive::watchdog::BnCarWatchdogClient { public: explicit WatchdogClient(const ::android::sp<::android::Looper>& handlerLooper, VehicleHalManager* vhalManager);
ndk::ScopedAStatus checkIfAlive(int32_t sessionId, aidl::android::automotive::watchdog::TimeoutLength timeout) override; ndk::ScopedAStatus prepareProcessTermination() override; };WatchogClient.cpp
ndk::ScopedAStatus WatchdogClient::checkIfAlive(int32_t sessionId, TimeoutLength /*timeout*/) { // Implement or call your health check logic here return ndk::ScopedAStatus::ok(); }
Start the binder thread and register the client
- Create a thread pool for binder communication. If vendor HAL uses hwbinder for its own purpose, you must create another thread pool for car watchdog binder communication).
-
Search for the daemon with the name and call
ICarWatchdog::registerClient
. The car watchdog daemon interface name isandroid.automotive.watchdog.ICarWatchdog/default
. -
Based on service responsiveness, select one of the three following types of timeout
supported by the car watchdog and then pass the timeout in the call to
ICarWatchdog::registerClient
:- critical(3s)
- moderate(5s)
- normal(10s)
VehicleService.cpp
andWatchogClient.cpp
:VehicleService.cpp
int main(int /* argc */, char* /* argv */ []) { // Set up thread pool for hwbinder configureRpcThreadpool(4, false /* callerWillJoin */); ALOGI("Registering as service..."); status_t status = service->registerAsService(); if (status != OK) { ALOGE("Unable to register vehicle service (%d)", status); return 1; } // Setup a binder thread pool to be a car watchdog client. ABinderProcess_setThreadPoolMaxThreadCount(1); ABinderProcess_startThreadPool(); sp<Looper> looper(Looper::prepare(0 /* opts */)); std::shared_ptr<WatchdogClient> watchdogClient = ndk::SharedRefBase::make<WatchdogClient>(looper, service.get()); // The current health check is done in the main thread, so it falls short of capturing the real // situation. Checking through HAL binder thread should be considered. if (!watchdogClient->initialize()) { ALOGE("Failed to initialize car watchdog client"); return 1; } ALOGI("Ready"); while (true) { looper->pollAll(-1 /* timeoutMillis */); } return 1; }
WatchogClient.cpp
bool WatchdogClient::initialize() { ndk::SpAIBinder binder(AServiceManager_getService("android.automotive.watchdog.ICarWatchdog/default")); if (binder.get() == nullptr) { ALOGE("Failed to get carwatchdog daemon"); return false; } std::shared_ptr<ICarWatchdog> server = ICarWatchdog::fromBinder(binder); if (server == nullptr) { ALOGE("Failed to connect to carwatchdog daemon"); return false; } mWatchdogServer = server; binder = this->asBinder(); if (binder.get() == nullptr) { ALOGE("Failed to get car watchdog client binder object"); return false; } std::shared_ptr<ICarWatchdogClient> client = ICarWatchdogClient::fromBinder(binder); if (client == nullptr) { ALOGE("Failed to get ICarWatchdogClient from binder"); return false; } mTestClient = client; mWatchdogServer->registerClient(client, TimeoutLength::TIMEOUT_NORMAL); ALOGI("Successfully registered the client to car watchdog server"); return true; }
Vendor Services (Native)
Specify the car watchdog aidl makefile
- Include
carwatchdog_aidl_interface-ndk_platform
inshared_libs
.Android.bp
cc_binary { name: "sample_native_client", srcs: [ "src/*.cpp" ], shared_libs: [ "carwatchdog_aidl_interface-ndk_platform", "libbinder_ndk", ], vendor: true, }
Add an SELinux policy
- To add an SELinux policy, allow the vendor service domain to use binder
(
binder_use
macro) and add the vendor service domain to thecarwatchdog
client domain (carwatchdog_client_domain
macro). See the code below forsample_client.te
andfile_contexts
:sample_client.te
type sample_client, domain; type sample_client_exec, exec_type, file_type, vendor_file_type; carwatchdog_client_domain(sample_client) init_daemon_domain(sample_client) binder_use(sample_client)
file_contexts
/vendor/bin/sample_native_client u:object_r:sample_client_exec:s0
Implement a client class by inheriting BnCarWatchdogClient
- In
checkIfAlive
, perform a health check. One option is to post to the thread loop handler. If healthy, callICarWatchdog::tellClientAlive
. See the code below forSampleNativeClient.h
andSampleNativeClient.cpp
:SampleNativeClient.h
class SampleNativeClient : public BnCarWatchdogClient { public: ndk::ScopedAStatus checkIfAlive(int32_t sessionId, TimeoutLength timeout) override; ndk::ScopedAStatus prepareProcessTermination() override; void initialize(); private: void respondToDaemon(); private: ::android::sp<::android::Looper> mHandlerLooper; std::shared_ptr<ICarWatchdog> mWatchdogServer; std::shared_ptr<ICarWatchdogClient> mClient; int32_t mSessionId; };
SampleNativeClient.cpp
ndk::ScopedAStatus WatchdogClient::checkIfAlive(int32_t sessionId, TimeoutLength timeout) { mHandlerLooper->removeMessages(mMessageHandler, WHAT_CHECK_ALIVE); mSessionId = sessionId; mHandlerLooper->sendMessage(mMessageHandler, Message(WHAT_CHECK_ALIVE)); return ndk::ScopedAStatus::ok(); } // WHAT_CHECK_ALIVE triggers respondToDaemon from thread handler void WatchdogClient::respondToDaemon() { // your health checking method here ndk::ScopedAStatus status = mWatchdogServer->tellClientAlive(mClient, mSessionId); }
Start a binder thread and register the client
The car watchdog daemon interface name is
android.automotive.watchdog.ICarWatchdog/default
.
- Search for the daemon with the name and call
ICarWatchdog::registerClient
. See the code below formain.cpp
andSampleNativeClient.cpp
:main.cpp
int main(int argc, char** argv) { sp<Looper> looper(Looper::prepare(/*opts=*/0)); ABinderProcess_setThreadPoolMaxThreadCount(1); ABinderProcess_startThreadPool(); std::shared_ptr<SampleNativeClient> client = ndk::SharedRefBase::make<SampleNatvieClient>(looper); // The client is registered in initialize() client->initialize(); ... }
SampleNativeClient.cpp
void SampleNativeClient::initialize() { ndk::SpAIBinder binder(AServiceManager_getService( "android.automotive.watchdog.ICarWatchdog/default")); std::shared_ptr<ICarWatchdog> server = ICarWatchdog::fromBinder(binder); mWatchdogServer = server; ndk::SpAIBinder binder = this->asBinder(); std::shared_ptr<ICarWatchdogClient> client = ICarWatchdogClient::fromBinder(binder) mClient = client; server->registerClient(client, TimeoutLength::TIMEOUT_NORMAL); }
Vendor Services (Android)
Implement a client by inheriting CarWatchdogClientCallback
- Edit the new file as follows:
private final CarWatchdogClientCallback mClientCallback = new CarWatchdogClientCallback() { @Override public boolean onCheckHealthStatus(int sessionId, int timeout) { // Your health check logic here // Returning true implies the client is healthy // If false is returned, the client should call // CarWatchdogManager.tellClientAlive after health check is // completed } @Override public void onPrepareProcessTermination() {} };
Register the client
- Call
CarWatchdogManager.registerClient()
:private void startClient() { CarWatchdogManager manager = (CarWatchdogManager) car.getCarManager( Car.CAR_WATCHDOG_SERVICE); // Choose a proper executor according to your health check method ExecutorService executor = Executors.newFixedThreadPool(1); manager.registerClient(executor, mClientCallback, CarWatchdogManager.TIMEOUT_NORMAL); }
Unregister the client
- Call
CarWatchdogManager.unregisterClient()
when the service is finished:private void finishClient() { CarWatchdogManager manager = (CarWatchdogManager) car.getCarManager( Car.CAR_WATCHDOG_SERVICE); manager.unregisterClient(mClientCallback); }
Detect processes terminated by car watchdog
Car watchdog dumps/kills processes (vendor HAL, vendor native services,
vendor Android services) that are registered to the car watchdog when they are
stuck and unresponsive. Such dumping is detected by checking logcats. The car
watchdog outputs a log carwatchdog killed process_name (pid:process_id)
when a problematic process is dumped or killed. Therefore:
$ adb logcat -s CarServiceHelper | fgrep "carwatchdog killed"
The relevant logs are captured. For example, if the KitchenSink app (a car watchdog client) becomes stuck, a line such as that below is written to the log:
05-01 09:50:19.683 578 5777 W CarServiceHelper: carwatchdog killed com.google.android.car.kitchensink (pid: 5574)
To determine why or where the KitchenSink app became stuck, use the process dump
stored at /data/anr
just as you would use Activity ANR cases.
$ adb root $ adb shell grep -Hn "pid process_pid" /data/anr/*
The following sample output is specific to the KitchenSink app:
$ adb shell su root grep -Hn "pid 5574" /data/anr/*. /data/anr/anr_2020-05-01-09-50-18-290:3:----- pid 5574 at 2020-05-01 09:50:18 ----- /data/anr/anr_2020-05-01-09-50-18-290:285:----- Waiting Channels: pid 5574 at 2020-05-01 09:50:18 -----
Find the dump file (for example, /data/anr/anr_2020-05-01-09-50-18-290
in the example above) and start your analysis.