Use profile-guided optimization

The Android build system for Android 13 and lower supports using Clang's profile-guided optimization (PGO) on native Android modules that have blueprint build rules. This page describes Clang PGO, how to continually generate and update profiles used for PGO, and how to integrate PGO with the build system (with use case).

NB: This document describes the use of PGO in the Android platform. To learn about using PGO from an Android app, visit this page.

About Clang PGO

Clang can perform profile-guided optimization using two types of profiles:

  • Instrumentation-based profiles are generated from an instrumented target program. These profiles are detailed and impose a high runtime overhead.
  • Sampling-based profiles are typically produced by sampling hardware counters. They impose a low runtime overhead, and can be collected without any instrumentation or modification to the binary. They are less detailed than instrumentation-based profiles.

All profiles should be generated from a representative workload that exercises the typical behavior of the app. While Clang supports both AST-based (-fprofile-instr-generate) and LLVM IR-based (-fprofile-generate), Android supports only LLVM IR-based for instrumentation-based PGO.

The following flags are needed to build for profile collection:

  • -fprofile-generate for IR-based instrumentation. With this option, the backend uses a weighted minimal spanning tree approach to reduce the number of instrumentation points and optimize their placement to low-weight edges (use this option for the link step as well). The Clang driver automatically passes the profiling runtime (libclang_rt.profile-arch-android.a) to the linker. This library contains routines to write the profiles to disk upon program exit.
  • -gline-tables-only for sampling-based profile collection to generate minimal debug information.

A profile can be used for PGO using -fprofile-use=pathname or -fprofile-sample-use=pathname for instrumentation-based and sampling-based profiles respectively.

Note: As changes are made to the code, if Clang can no longer use the profile data it generates a -Wprofile-instr-out-of-date warning.

Use PGO

Using PGO involves the following steps:

  1. Build the library/executable with instrumentation by passing -fprofile-generate to the compiler and linker.
  2. Collect profiles by running a representative workload on the instrumented binary.
  3. Post-process the profiles using the llvm-profdata utility (for details, see Handling LLVM profile files).
  4. Use the profiles to apply PGO by passing -fprofile-use=<>.profdata to the compiler and linker.

For PGO in Android, profiles should be collected offline and checked in alongside the code to ensure reproducible builds. The profiles can be used as code evolves, but must be regenerated periodically (or whenever Clang warns that the profiles are stale).

Collect profiles

Clang can use profiles collected by running benchmarks using an instrumented build of the library or by sampling hardware counters when the benchmark is run. At this time, Android doesn't support using sampling-based profile collection, so you must collect profiles using an instrumented build:

  1. Identify a benchmark and the set of libraries collectively exercised by that benchmark.
  2. Add pgo properties to the benchmark and libraries (details below).
  3. Produce an Android build with an instrumented copy of these libraries using:
    make ANDROID_PGO_INSTRUMENT=benchmark

benchmark is a placeholder that identifies the collection of libraries instrumented during build. The actual representative inputs (and possibly another executable that links against a library being benchmarked) aren't specific to PGO and are beyond the scope of this document.

  1. Flash or sync the instrumented build on a device.
  2. Run the benchmark to collect profiles.
  3. Use the llvm-profdata tool (discussed below) to post-process the profiles and make them ready to be checked into the source tree.

Use profiles during build

Check the profiles into toolchain/pgo-profiles in an Android tree. The name should match what is specified in the profile_file sub-property of the pgo property for the library. The build system automatically passes the profile file to Clang when building the library. The ANDROID_PGO_DISABLE_PROFILE_USE environment variable can be set to true to temporarily disable PGO and measure its performance benefit.

To specify additional product-specific profile directories, append them to the PGO_ADDITIONAL_PROFILE_DIRECTORIES make variable in a BoardConfig.mk. If additional paths are specified, profiles in these paths override those in toolchain/pgo-profiles.

When generating a release image using the dist target to make, the build system writes the names of missing profile files to $DIST_DIR/pgo_profile_file_missing.txt. You can check this file to see what profile files were accidentally dropped (which silently disables PGO).

Enable PGO in Android.bp files

To enable PGO in Android.bp files for native modules, simply specify the pgo property. This property has the following sub-properties:

Property Description
instrumentation Set to true for PGO using instrumentation. Default is false.
sampling Set to true for PGO using sampling. Default is false.
benchmarks List of strings. This module is built for profiling if any benchmark in the list is specified in the ANDROID_PGO_INSTRUMENT build option.
profile_file Profile file (relative to toolchain/pgo-profile) to use with PGO. The build warns that this file doesn't exist by adding this file to $DIST_DIR/pgo_profile_file_missing.txt unless the enable_profile_use property is set to false OR the ANDROID_PGO_NO_PROFILE_USE build variable is set to true.
enable_profile_use Set to false if profiles shouldn't be used during build. Can be used during bootstrap to enable profile collection or to temporarily disable PGO. Default is true.
cflags List of additional flags to use during an instrumented build.

Example of a module with PGO:

cc_library {
    name: "libexample",
    srcs: [
        "src1.cpp",
        "src2.cpp",
    ],
    static: [
        "libstatic1",
        "libstatic2",
    ],
    shared: [
        "libshared1",
    ]
    pgo: {
        instrumentation: true,
        benchmarks: [
            "benchmark1",
            "benchmark2",
        ],
        profile_file: "example.profdata",
    }
}

If the benchmarks benchmark1 and benchmark2 exercise representative behavior for libraries libstatic1, libstatic2, or libshared1, the pgo property of these libraries can also include the benchmarks. The defaults module in Android.bp can include a common pgo specification for a set of libraries to avoid repeating the same build rules for several modules.

To select different profile files or selectively disable PGO for an architecture, specify the profile_file, enable_profile_use, and cflags properties per architecture. Example (with architecture target in bold):

cc_library {
    name: "libexample",
    srcs: [
          "src1.cpp",
          "src2.cpp",
    ],
    static: [
          "libstatic1",
          "libstatic2",
    ],
    shared: [
          "libshared1",
    ],
    pgo: {
         instrumentation: true,
         benchmarks: [
              "benchmark1",
              "benchmark2",
         ],
    }

    target: {
         android_arm: {
              pgo: {
                   profile_file: "example_arm.profdata",
              }
         },
         android_arm64: {
              pgo: {
                   profile_file: "example_arm64.profdata",
              }
         }
    }
}

To resolve references to the profiling runtime library during instrumentation-based profiling, pass the build flag -fprofile-generate to the linker. Static libraries instrumented with PGO, all shared libraries, and any binary that directly depends on the static library must also be instrumented for PGO. However, such shared libraries or executables don't need to use PGO profiles, and their enable_profile_use property can be set to false. Outside of this restriction, you can apply PGO to any static library, shared library, or executable.

Handle LLVM profile files

Executing an instrumented library or executable produces a profile file named default_unique_id_0.profraw in /data/local/tmp (where unique_id is a numeric hash that is unique to this library). If this file already exists, the profiling runtime merges the new profile with the old one while writing the profiles. Note that /data/local/tmp isn't accessible to app developers; they should use somewhere like /storage/emulated/0/Android/data/packagename/files instead. To change the location of the profile file, set the LLVM_PROFILE_FILE environment variable at runtime.

The llvm-profdata utility is then used to convert the .profraw file (and possibly merge multiple .profraw files) to a .profdata file:

  llvm-profdata merge -output=profile.profdata <.profraw and/or .profdata files>

profile.profdata can then be checked into the source tree for use during build.

If multiple instrumented binaries/libraries are loaded during a benchmark, each library generates a separate .profraw file with a separate unique ID. Typically, all of these files can be merged to a single .profdata file and used for PGO build. In cases where a library is exercised by another benchmark, that library must be optimized using profiles from both the benchmarks. In this situation, the show option of llvm-profdata is useful:

  llvm-profdata merge -output=default_unique_id.profdata default_unique_id_0.profraw
llvm-profdata show -all-functions default_unique_id.profdata

To map unique_ids to individual libraries, search the show output for each unique_id for a function name that is unique to the library.

Case study: PGO for ART

The case study presents ART as a relatable example; however, it isn't an accurate description of the actual set of libraries profiled for ART or their interdependencies.

The dex2oat ahead-of-time compiler in ART depends on libart-compiler.so, which in turn depends on libart.so. The ART runtime is implemented mainly in libart.so. Benchmarks for the compiler and the runtime will be different:

Benchmark Profiled libraries
dex2oat dex2oat (executable), libart-compiler.so, libart.so
art_runtime libart.so
  1. Add the following pgo property to dex2oat, libart-compiler.so:
        pgo: {
            instrumentation: true,
            benchmarks: ["dex2oat",],
            profile_file: "dex2oat.profdata",
        }
  2. Add the following pgo property to libart.so:
        pgo: {
            instrumentation: true,
            benchmarks: ["art_runtime", "dex2oat",],
            profile_file: "libart.profdata",
        }
  3. Create instrumented builds for the dex2oat and art_runtime benchmarks using:
        make ANDROID_PGO_INSTRUMENT=dex2oat
        make ANDROID_PGO_INSTRUMENT=art_runtime
  4. Alternatively, create a single instrumented build with all libraries instrumented using:

        make ANDROID_PGO_INSTRUMENT=dex2oat,art_runtime
        (or)
        make ANDROID_PGO_INSTRUMENT=ALL

    The second command builds all PGO-enabled modules for profiling.

  5. Run the benchmarks exercising dex2oat and art_runtime to obtain:
    • Three .profraw files from dex2oat (dex2oat_exe.profdata, dex2oat_libart-compiler.profdata, and dexeoat_libart.profdata), identified using the method described in Handling LLVM profile files.
    • A single art_runtime_libart.profdata.
  6. Produce a common profdata file for dex2oat executable and libart-compiler.so using:
    llvm-profdata merge -output=dex2oat.profdata \
        dex2oat_exe.profdata dex2oat_libart-compiler.profdata
  7. Obtain the profile for libart.so by merging the profiles from the two benchmarks:
    llvm-profdata merge -output=libart.profdata \
        dex2oat_libart.profdata art_runtime_libart.profdata

    The raw counts for libart.so from the two profiles might be disparate because the benchmarks differ in the number of test cases and the duration for which they run. In this case, you can use a weighted merge:

    llvm-profdata merge -output=libart.profdata \
        -weighted-input=2,dex2oat_libart.profdata \
        -weighted-input=1,art_runtime_libart.profdata

    The above command assigns twice the weight to the profile from dex2oat. The actual weight should be determined based on domain knowledge or experimentation.

  8. Check the profile files dex2oat.profdata and libart.profdata into toolchain/pgo-profiles for use during build.