Posted by Aditya Kumar – Software program Engineer
Context
Binary structure utilizing a logo order file (often known as binary order file or linker order file) is a well known link-time optimization. The linker makes use of the order of symbols so as file to put out symbols within the binary. Order file based mostly binary structure improves utility launch time in addition to different essential person journeys. Order file technology is often a multi-step course of the place builders use completely different instruments at each stage. We’re offering a unified set of instruments and documentation that can enable each native app developer to leverage this optimization. Each Android app builders and the AOSP group can profit from the instruments.
Background
Supply code is often structured to facilitate software program growth and comprehension. The structure of capabilities and variables in a binary can also be impacted by their relative ordering within the supply code. The binary structure impacts utility efficiency because the working system has no manner of understanding which symbols might be required in future and usually makes use of spatial locality as one of many price fashions for prefetching subsequent pages.
However the order of symbols in a binary could not replicate this system execution order. When an utility executes, fetching symbols that aren’t current in reminiscence would end in web page faults. For instance, think about the next program:
// Check.cpp
int foo() { /* */ } int bar() { /* */ } // Different capabilities... int predominant() { bar(); foo();}
Which will get compiled into:
# Check.app page_x: _foo page_y: _bar # Different symbols page_z:_main
When Check.app begins, its entrypoint _main is fetched first then _bar adopted by _foo. Executing Check.app can result in web page faults for fetching every operate. Evaluate this to the next binary structure the place all of the capabilities are situated in the identical web page (assuming the capabilities are sufficiently small).
# Check.app page_1: _main page_1: _bar page_1: _foo # Different symbols
On this case when _main will get fetched, _bar and _foo can get fetched within the reminiscence on the identical time. In case these symbols are giant and they’re situated in consecutive pages, there’s a excessive probability the working system could prefetch these pages leading to much less web page faults.
As a result of execution order of capabilities throughout an utility lifecycle could rely on varied components it’s not possible to have a novel order of symbols that’s best. Fortuitously, utility startup sequence is pretty deterministic and steady typically. And it’s also potential to construct a binary having a desired image order with the assistance of linkers like lld which is the default linker for Android NDK toolchain.
Order file is a textual content file containing a listing of symbols. The linker makes use of the order of symbols in order file to put out symbols within the binary. An order file having capabilities that get referred to as in the course of the app startup sequence can cut back web page faults leading to improved launch time. Order recordsdata can enhance the launch time of cellular functions by greater than 2%. The advantages of order recordsdata are extra significant on bigger apps and decrease finish gadgets. A extra mature order file technology system can enhance different essential person journeys.
Design
The order file technology includes the next steps
- Acquire app startup sequence utilizing compiler instrumentation method
- Use compiler instrumentation to report each operate invocation
- Run the instrumented binary to gather launch sequence in a (binary) profraw file
- Generate order file from the profraw recordsdata
- Validate order file
- Merge a number of order recordsdata into one
- Recompile the app with the merged order file
Overview
The order file technology relies on LLVM’s compiler instrumentation course of. LLVM has a stage to generate the order file then recompile the supply code utilizing the order file.
Acquire app startup sequence
The supply code is instrumented by passing -forder-file-instrumentation to the compiler. Moreover, the -orderfile-write-mapping flag can also be required for the compiler to generate a mapping file. The mapping file is generated throughout compilation and it’s used whereas processing the profraw file. The mapping file exhibits the mapping from MD5 hash to operate image (as proven under).
# Mapping file MD5 db956436e78dd5fa predominant MD5 83bff1e88ac48f32 _GLOBAL__sub_I_main.cpp MD5 c943255f95351375 _Z5mergePiiii MD5 d2d2238cf08db816 _Z9mergeSortPiii MD5 11ed18006e729e73 _Z4partPiii MD5 3e897b5ee8bebbd1 _Z9quickSortPiii
The profile (profraw file) is generated each time the instrumented utility is executed. The profile information within the profraw file incorporates the MD5 hash of the capabilities executed in chronological order. The profraw file doesn’t have duplicate entries as a result of every operate solely outputs its MD5 hash on first invocation. A typical run of binary containing the capabilities listed within the mapping file above can have the next profraw entries.
# Profraw file 00000000 32 8f c4 8a e8 f1 bf 83 fa d5 8d e7 36 64 95 db |2...........6d..| 00000010 16 b8 8d f0 8c 23 d2 d2 75 13 35 95 5f 25 43 c9 |.....#..u.5._%C.| 00000020 d1 bb be e8 5e 7b 89 3e 00 00 00 00 00 00 00 00 |....^{.>........| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
As a way to discover the operate names equivalent to the MD5 hashes in a profraw file, a corresponding mapping file is used.
Notice: The compiler instrumentation for order recordsdata (-forder-file-instrumentation) solely works when an optimization flag (01, 02, 03, 0s, 0z) is handed. So, if -O0 (compiler flag usually used for debug builds) is handed, the compiler won’t instrument the binary. In precept, one ought to use the identical optimization flag for instrumentation that’s utilized in transport launch binaries.
The Android NDK repository has scripts that automate the order file technology given a mapping file and an order file.
Recompiling with Order File
After you have an order file, you present the trail of the order file to the linker utilizing the –symbol-ordering-file flag.
Detailed design
Creating Order File Construct Property
The Android Open Supply Venture (AOSP) makes use of a construct system referred to as soong so we will leverage this construct system to move the flags as needed. The order file construct property has 4 predominant fields:
- instrumentation
- load_order_file
- order_file_path
- cflags
The cflags are supposed to add different needed flags (like mapping flags) throughout compilation. The load_order_file and order_file_path tells the construct system to recompile with the order file somewhat than set it to the profiling stage. The order recordsdata should be in saved in one in every of two paths:
- toolchain/pgo-profiles/orderfiles
- vendor/google_data/pgo_profile/orderfiles
# Profiling orderfile: { instrumentation: true, load_order_file: false, order_file_path: "", cflags: [ "-mllvm", "-orderfile-write-mapping=<filename>-mapping.txt", ], } #Recompiling with Order File orderfile: { instrumentation: true, load_order_file: true, order_file_path: "<filename>.orderfile", }
Creating order recordsdata
We offer a python script to create an order file from a mapping file and a profraw file. The script additionally permits eradicating a specific image or creating an order file till a specific image.
Script Flags:
- Profile file (–profile-file):
- Description: The profile file generated by operating a binary compiled with -forder-file-instrumentation
- Mapping file (–mapping-file):
- Description: The mapping file generated throughout compilation that maps MD5 hashes to image names
- Output file (–output):
- Description: The output file title for the order file. Default Identify: default.orderfile
- Deny Record (–denylist):
- Description: Symbols that you just need to exclude from the order file
- Final image (–last-symbol):
- Description: The order file will finish on the handed final image and ignore the symbols after it. In order for you an order file just for startup, it is best to move the final startup image. Final-symbol has precedence over leftover so we are going to output till the final image and ignore the leftover flag.
- Leftover symbols (–leftover):
- Description: Some symbols (capabilities) won’t have been executed so they won’t seem within the profile file. In order for you these symbols in your order file, you need to use this flag and it’ll add them on the finish.
Validating order recordsdata
As soon as we get an order file for a library or binary, we have to verify whether it is legitimate based mostly on a set of standards. Some order recordsdata is probably not of fine high quality so they’re higher discarded. This may occur resulting from a number of causes like utility terminated unexpectedly, the runtime couldn’t write the entire profraw file earlier than exiting, an undesired code-sequence was collected within the profile, and many others. To automate this course of, we offer a python script that may assist builders verify for:
- Partial order that must be within the order file
- Symbols that should be current so as file
- Symbols that shouldn’t be current so as file
- Minimal variety of symbols to make an order file
Script Flags:
- Order file (–order-file):
- Description: The order file you might be validating on the under standards.
- Partial Order (–partial):
- Description: A partial order of symbols that should be held within the order file.
- Allowed Lists (–allowlist):
- Description: Symbols that should be current within the order file.
- Denied Lists (–denylist):
- Description: Symbols that shouldn’t be within the order file. Denylist flag has precedence over allowlist.
- Minimal Variety of Entries (–min):
- Description: Minimal variety of symbols wanted for an order file
Merging order recordsdata
At the next stage, the order file symbols in a group of order recordsdata approximate a partial order (poset) of operate names with order outlined by time of execution. Throughout completely different runs of an utility, the order recordsdata might need variations. These variations might be resulting from OS, machine class, construct model, person configurations and many others. Nonetheless, the linker can solely take one order file to construct an utility. As a way to have one order file that gives the specified advantages, we have to merge these order recordsdata right into a single order file. The merging algorithm additionally must be environment friendly in order to not decelerate the construct time. There are non-linear clustering algorithms that will not scale effectively for merging giant numbers of order recordsdata, every having many symbols. We offer an environment friendly merging algorithm that converges rapidly. The algorithm permits for customizable parameters, such that builders can tune the result.
Merging N partial order units might be carried out both pessimistically (merging a choice of order recordsdata all the way in which till there may be one order file left) or optimistically (merging all of them without delay). The pessimistic method might be inefficient in addition to sub-optimal. Consequently, it’s higher to work with all N partial order units without delay. As a way to have an environment friendly implementation it helps to symbolize all N posets with a weighted directed Graph (V,E) the place:
- V: Components of partial order units (symbols) and the variety of occasions it seems in numerous partial order units. Notice that the frequency of vertices could also be larger than the sum of all incoming edges due to invocations from uninstrumented elements of binary, dependency injection and many others.
- E (V1 -> V2): An edge happens if the component of V2 instantly succeeds V1 in any partial order set with its weight being the variety of occasions this occurs.
For a binary executable, there may be one root (e.g., predominant) vertex, however shared libraries might need many roots based mostly on which capabilities are referred to as within the binary utilizing them. The graph will get difficult if the applying has threads as they incessantly end in cycles. To have a topological order, cycles are eliminated by preferring the very best chance path over others. A Depth-First traversal that selects the very best weighted edge serves the aim.
Eradicating Cycles:
- Mark again edges throughout a Depth-First traversal - For every Cycle (c): - Add the weights of all in-edges of every vertex (v) within the cycle excluding the sides within the cycle - Take away the cycle edge pointing **to** the vertex with highest sum
After cycles are eliminated, the identical depth first traversal offers a topological order (the order file) when all of the ahead edges are eliminated. Basically, the algorithm computes a minimum-spanning-tree of maximal weights and traverses the tree in topological order.
Producing an order:
printOrderUtil(G, n, order): - If n was visited: - return - Add n to the tip of order - Type all out edges based mostly on weight - For each out_edge (n, v): - printOrderUtil(G, v, order) printOrder(G): - Get all roots - order = [] - For every root r: - printOrderUtil(G, r, order) - return order
Instance:
Given the next order recordsdata:
- predominant -> b -> c -> d
- predominant -> a -> c
- predominant -> e -> f
- predominant -> b
- predominant -> b
- predominant -> c -> b
The graph to the correct is obtained by eradicating cycles.
- DFS: predominant -> b-> c -> b
- Again edge: c -> b
- Cycle: b -> c-> b
- Cycle edges: [b -> c, c -> b]
- b’s sum of in-edges is 3
- c’s sum of in-edges is 2
- This suggests b might be traversed from the next frequency edge, so c -> b is eliminated
- Ignore ahead edges a->c, main->c
- The DFS of the acyclic graph on the correct will produce an order file predominant -> b -> c -> d -> a -> e -> f after ignoring the ahead edges.
Amassing order recordsdata for Android Apps (Java, Kotlin)
The order file instrumentation and profile information assortment is simply enabled for C/C++ functions. Consequently, it can not profit Java or Kotlin functions. Nonetheless, Android apps that ship compiled C/C++ libraries can profit from order file.
To generate order file for libraries which might be utilized by Java/Kotlin functions, we have to invoke the runtime strategies (referred to as as a part of order file instrumentation) on the proper locations. There are three capabilities that customers should name:
- __llvm_profile_set_filename(char *f): Set the title of the file the place profraw information might be dumped.
- __llvm_profile_initialize_file: Initialize the file set by __llvm_profile_set_filename
- __llvm_orderfile_dump: Dumps the profile(order file information) collected whereas operating instrumented binary
Equally, the compiler and linker flags must be added to construct configurations. We offer template construct system recordsdata e.g, CMakeLists.txt to compile with the proper flags and add a operate to dump the order recordsdata when the Java/Kotlin utility calls it.
# CMakeLists.txt set(GENERATE_PROFILES ON) #set(USE_PROFILE "${CMAKE_SOURCE_DIR}/demo.orderfile") add_library(orderfiledemo SHARED orderfile.cpp) target_link_libraries(orderfiledemo log) if(GENERATE_PROFILES) # Producing profiles require any optimization flag apart from -O0. # The mapping file will not generate and the profile instrumentation does not work with out an optimization flag. target_compile_options( orderfiledemo PRIVATE -forder-file-instrumentation -O2 -mllvm -orderfile-write-mapping=mapping.txt ) target_link_options( orderfiledemo PRIVATE -forder-file-instrumentation ) target_compile_definitions(orderfiledemo PRIVATE GENERATE_PROFILES) elseif(USE_PROFILE) target_compile_options( orderfiledemo PRIVATE -Wl,--image-ordering-file=${USE_PROFILE} -Wl,--no-warn-image-ordering ) target_link_options( orderfiledemo PRIVATE -Wl,--image-ordering-file=${USE_PROFILE} -Wl,--no-warn-image-ordering ) endif()
We additionally present a pattern app to dump order recordsdata from a Kotlin utility. The pattern app creates a shared library referred to as “orderfiledemo” and invokes the DumpProfileDataIfNeeded operate to dump the order file. This library might be taken out of this pattern app and might be repurposed for different functions.
// Order File Library #if outlined(GENERATE_PROFILES) extern "C" int __llvm_profile_set_filename(const char *); extern "C" int __llvm_profile_initialize_file(void); extern "C" int __llvm_orderfile_dump(void); #endif void DumpProfileDataIfNeeded(const char *temp_dir) { #if outlined(GENERATE_PROFILES) char profile_location[PATH_MAX] = {}; snprintf(profile_location, sizeof(profile_location), "%s/demo.output", temp_dir); __llvm_profile_set_filename(profile_location); __llvm_profile_initialize_file(); __llvm_orderfile_dump(); __android_log_print(ANDROID_LOG_DEBUG, kLogTag, "Wrote profile information to %s", profile_location); #else __android_log_print(ANDROID_LOG_DEBUG, kLogTag, "Didn't write profile information as a result of the app was not " "constructed for profile technology"); #endif } extern "C" JNIEXPORT void JNICALL Java_com_example_orderfiledemo_MainActivity_runWorkload(JNIEnv *env, jobject /* this */, jstring temp_dir) { DumpProfileDataIfNeeded(env->GetStringUTFChars(temp_dir, 0)); }
# Kotlin Utility class MainActivity : AppCompatActivity() { personal lateinit var binding: ActivityMainBinding override enjoyable onCreate(savedInstanceState: Bundle?) { tremendous.onCreate(savedInstanceState) binding = ActivityMainBinding.inflate(layoutInflater) setContentView(binding.root) runWorkload(applicationContext.cacheDir.toString()) binding.sampleText.textual content = "Good day, world!" } /** * A local technique that's carried out by the 'orderfiledemo' native library, * which is packaged with this utility. */ exterior enjoyable runWorkload(tempDir: String) companion object { // Used to load the 'orderfiledemo' library on utility startup. init { System.loadLibrary("orderfiledemo") } } }
Limitation
order file technology solely works for native binaries. The validation and merging scripts will work for any set of order recordsdata.