Compiler-Assisted Instrumentation Selection for Large-Scale C++ Codes

Kreutzer, Sebastian; Iwainsky, Christian; Lehr, Jan-Patrick; Bischof, Christian

doi:10.1007/978-3-031-23220-6_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13387))

Included in the following conference series:

International Conference on High Performance Computing

1273 Accesses
1 Citations

The original version of this chapter was revised: this chapter was previously published non-open access. The correction to this chapter is available at https://doi.org/10.1007/978-3-031-23220-6_28

Abstract

Code instrumentation is the primary method for collecting fine-grained performance data. As instrumentation introduces an inherent runtime overhead, it is essential to measure only those regions of the code which are most relevant to the analysis. In practice, the typical approach is to define filter lists manually. Prior projects aim to automate this process using static analysis. Specifically, InstRO enables tailored instrumentation via sophisticated user-defined selection of code regions. However, due to the need for whole-program call-graph analysis, its application on large-scale scientific codes is currently impractical. In this work, we present the new instrumentation tool CaPI (short for “Compiler-assisted Performance Instrumentation”), which is targeted towards such large-scale applications. We demonstrate its application on the CFD framework OpenFOAM. Our evaluation shows that a hybrid approach of CaPI and existing profile-guided filtering outperforms profile-guided filtering alone. Furthermore, we identify correctness and usability issues and propose possible avenues to improve CaPI, as well as compiler-assisted instrumentation tools in general.

You have full access to this open access chapter, Download conference paper PDF

FAROS: A Framework to Analyze OpenMP Compilation Through Benchmarking and Compiler Optimization Analysis

Folding: Detailed Analysis with Coarse Sampling

Actionable Program Analyses for Improving Software Performance

Keywords

1 Introduction

Collecting performance data to examine the run-time behavior of a program is essential for identifying regions in the code that benefit most from optimization or parallelization [14]. Traditionally, this data is collected using either sampling or instrumentation techniques. For use cases that require a more in-depth analysis, such as the creation of performance models [4, 5] for specific functions, accurate measurements are essential. Hence, instrumentation is better suited, as it guarantees that every function invocation is recorded accurately.

However, instrumenting all functions in a program typically generates a large overhead, which can increase the execution time by orders of magnitude [18]. This is in large part caused by frequently-called, short-running functions. Additionally, the insertion of measurement hooks can prohibit optimization in some cases [25].

For this reason, a filtering approach is typically necessary to instrument only those functions which are most relevant w.r.t. a user-defined metric, e.g. execution time. Excluding all other functions reduces the total number of calls to the measurement tool and, thus, the execution overhead. We refer to the set of instrumented functions as the instrumentation configuration (IC).

The simplest way to create a suitable IC is to define filter lists manually. The typical workflow involves first profiling a fully-instrumented version of the code. Subsequently, the user examines the resulting profile and selects the functions that should be excluded from the measurement. The drawback with this approach is that the user has to manually select the functions to instrument, which may require multiple iterations of compiling the code, executing it to generate a profile, and refining the IC. Hence, different tools to automate the selection process have been proposed and mainly differ in whether they use runtime data or rely on source-code features to determine a suitable IC. Unfortunately, the application of current compiler-assisted static selection tools is tedious and error prone, despite their general advantages in expressiveness and overhead reduction.

In this paper, we focus on the composable instrumentation selection mechanism introduced by the InstRO [10] project. In the context of the exaFOAM project,^{Footnote 1} we investigated its applicability for the instrumentation of the computational fluid dynamics (CFD) framework OpenFOAM [26]. However, due to the scale and structure of OpenFOAM, we found that the current implementation of InstRO is not suited to this task.

We present the Compiler-assisted Performance Instrumentation (CaPI) tool, which adopts ideas from InstRO and makes them applicable for the selective instrumentation of large-scale codes. We make the following contributions: (1) Present a new instrumentation tool based on key principles of InstRO. (2) Demonstrate its application on large-scale scientific software and identify specific usability and validation impediments. (3) Identify key challenges for improving CaPI specifically, as well as compiler-assisted selection tools in general.

The paper is structured as follows: Sect. 2 gives an overview of related work. Section 3 explains particularities of OpenFOAM and how they stress limitations of InstRO. Section 4 presents the CaPI toolchain to address these limitations. Thereafter, CaPI is evaluated on OpenFOAM in Sect. 5. Usability and validation impediments are highlighted in Sect. 6. The results are subsequently discussed in Sect. 7. Finally, Sect. 8 summarizes the paper and gives a brief outline on how remaining challenges may be addressed.

2 Related Work

Several tools have been developed to help automate the process of constructing ICs for performance measurements, or reduce the overhead by filtering runtime events. Their function selection methods can be divided into three categories, for which we list some representative tools.

Profile-guided selection uses previously recorded profile data to determine which functions to exclude or include in a subsequent measurement. An example is the scorep-score utility of the Score-P measurement infrastructure [12]. It enables the user to define a set of threshold values for, e.g., execution time per invocation, which need to be exceeded by a function to be considered for instrumentation. PerfTaint [6] applies a taint analysis to determine which parts of the application depend on a given set of input parameters, and only instruments dependent functions, as all others are considered to have constant runtime w.r.t. the set of input parameters.
Compiler-assisted selection tools aim to semi-automatically determine a suitable IC with the help of static code analysis. Tau [25] enables the selective instrumentation of functions via the use of its intermediate representation called PDT [19]. Cobi [21] requires the user to specify which points in a program to instrument in an XML-based format. It relies on binary instrumentation using the DynInst API [3], and, since it operates at the binary level, ignores C++ virtual functions or function pointers for any path analysis. The InstRO project [10] gives the user the ability to define selection passes that filter out functions based on statically collected information. Notably, a static call graph (CG) is generated that gives information about the call context of the respective function. This information can be used to decide if the function is relevant for overall performance.
Hybrid selection tools combine profile- and static data for the creation of IC files. PIRA [16] employs a static statement aggregation scheme [11] to estimate the amount of work per function for an initial IC. Subsequently, the IC is iteratively refined using profile information or empirically constructed performance models [15]. X-Ray [1] instrumentation uses instruction-level heuristics to estimate if a function should be instrumented, and, if so, inserts no-op sleds into the binary. At runtime, the sleds can be patched to enable or disable the recording of events, which may also be filtered based on their occurrence or available memory.

3 Tailored Instrumentation for OpenFOAM

While the utility of compiler-assisted selection tools has been successfully demonstrated on smaller applications, large scientific codes pose particular challenges.

OpenFOAM, a modular CFD framework, is a prime example of such a code. It is comprised of a multitude of individual solvers, and applicable to a wide variety of problems. OpenFOAM v2106 [22] consists of over 5000 C++ source files and \(\approx \)1.2 million lines of code (counted with cloc [7]).

Its philosophy centers around an extendable toolbox for physics simulation. Hence, OpenFOAM provides many libraries that implement different solver algorithms, preconditioners, and other utilities required to develop simulation software. These libraries are employed in various solvers for specific use cases and physical phenomena, e.g., multi-phase flows or fluid-structure interaction, requiring a high degree of flexibility and configurability in the code base. One of OpenFOAM’s very particular properties is the use of the project-specific build system wmake. Build systems, particularly custom and niche ones, commonly pose challenges in their application [8], e.g., maintaining multiple configurations. For such systems, the application of static analysis and instrumentation tools can be challenging.

The following section outlines how these features of OpenFOAM make the application of the existing InstRO tool impractical.

3.1 Design and Limitations of InstRO

InstRO provides a configurable set of passes, which can be combined by the user to perform customized source-to-source code transformations on selected code regions. Passes can be divided into three categories: Selectors select code regions for instrumentation based on code features. Transformers perform necessary source code transformations, e.g., to canonicalize certain constructs for instrumentation. Finally, Adapters implement the actual instrumentation of the code. Figure 1 provides an example of how passes may be combined for selective instrumentation of functions related to MPI [20] usage.

This abstract pass design makes InstRO highly configurable, and, together with its whole-program analysis, a powerful instrumentation tool. Moreover, the layered design of InstRO makes many parts of the tool—theoretically at least—independent of the compiler technology used underneath. However, most of InstRO’s features have been implemented on top of the ROSE source-to-source translator. A Clang-based implementation exists, but provides, in comparison, only limited functionality.

For the application to OpenFOAM, both versions proved unsuitable. The main issue is the need for a global CG analysis in order to enable the selection of specific call-paths. In the ROSE implementation, this requires the parsing and merging of all 5000 source files at once, which is impractical due to time and memory constraints. The Clang implementation lacks global CG analysis capabilities altogether.

To overcome this obstacle, we developed the new CaPI tool based on the InstRO paradigms, but capable for application to large-scale codes. We demonstrate its capabilities on OpenFOAM and construct a low-overhead IC that focuses on analyzing functions that use MPI communication.

4 The CaPI Instrumentation Toolchain

In this section, the CaPI workflow and its implementation are introduced and explained in further detail.

We reworked the InstRO toolchain in order to make it applicable for the OpenFOAM use case. Most notably, we switched from a source-to-source transformation to a more flexible compiler instrumentation approach. This necessitated moving from the abstract pass formulation to a more concrete workflow comprised of analysis, selection and instrumentation steps. CaPI employs MetaCG [17] for global CG analysis, which was developed for a similar purpose in the automatic instrumentation refinement tool PIRA [16]. We use a custom domain-specific language (DSL) to implement the user-defined selection mechanism, designed with a focus on ease-of-use and conciseness.

4.1 Instrumentation Workflow

The toolchain consists of two main phases: In the analysis and selection phase the code is analyzed statically and relevant code regions are selected for instrumentation. We employ a stand-alone selection tool to process the collected data and generate the IC. The final instrumentation step is implemented using a custom LLVM [13] optimizer plugin. During compilation, hooks are inserted into the selected functions to interface with the measurement library. These steps are illustrated in Fig. 2.

4.2 Implementation

The implementation distinguishes between the selection phase, which is implemented in a stand-alone tool, and the compilation phase, in which an LLVM plugin is used to insert the instrumentation hooks. We provide a more detailed explanation on how the selection is implemented and how different selection passes are combined. Thereafter, we briefly explain the compilation phase.

Analysis and Selection. The selection is applied to the whole-program CG representation provided by MetaCG. Hence, selectors can match function names, or structural properties of functions within the CG. The whole-program view enables the selectors to maintain full context information for the functions selected, when desired.

One of the fundamental paradigms of InstRO is the composability of its selector modules. We realize this composability via a lightweight DSL. This DSL enables the user to easily instantiate a nested sequence of parameterized selectors. We found that, compared to an alternative XML or JSON based format, this approach results in a much more concise and comprehensible specification. A simplified grammar definition is shown in Fig. 3.

A selection specification consists of a sequence of selector definitions, which may be named or anonymous. The last of these definitions serves as the entry point to the selection pipeline. Each definition starts with the name of the selector module, followed by a list of arguments enclosed in parentheses. Aside from basic data types, i.e. strings, booleans, integers and floating-point numbers, selector modules may accept other selector definitions as input. These can be defined in-place or passed as a reference to a previously defined (named) selector instance. Such references are marked with a leading , followed by the identifier. The reference is pre-defined and corresponds to the set of all functions.

Listing 1 shows an example for a call-path selection pipeline that instruments functions on paths to MPI calls.

The user can choose from a set of predefined selectors that can be customized for the specific use case. The following selectors are currently available:

Include/exclude lists: Select functions by name based on regular expressions.
Specifier selection: Select functions w.r.t. specifiers, e.g., the keyword.
Call-path selection: Select all functions that are in the call chain below or above a previously selected function.
Unresolved call selection: Select functions that contain calls via function pointers, which may not be statically resolvable.
Set operations: Merge selection sets using basic operations such as union, intersection and complement.

The selection pipeline is applied to all functions in the CG, resulting in the final IC file. This file consists of the list of functions to be instrumented and is compatible with the Score-P filter file format. Hence, Score-P can be used as an alternative to our compiler plugin for the instrumentation step.

Compilation. We use the Clang/LLVM compiler toolchain to build the target code and perform the instrumentation. A custom LLVM plugin reads the IC file and identifies all functions in the current translation unit that are contained in the IC. These functions are then marked with LLVM function instrumentation attributes. Subsequently, the instrumentation attributes are consumed by the existing post-inline LLVM pass and the measurement hooks are inserted accordingly. We apply the instrumentation after inlining, in order to pre-emptively reduce instrumentation overhead. The enter and exit hooks conform to the GNU profiling interface, which is used by GCC compatible compilers for function instrumentation via the flag [9].

4.3 Score-P Integration

In principle, CaPI is compatible with any measurement tool that supports the GNU interface. Our main target, however, is the Score-P measurement infrastructure, which is commonly available in HPC environments. While Score-P provides support for the GNU profiling interface as well as defining its own measurement API, the GNU version is limited to recording only statically linked functions. This is due to the fact that only symbols with statically known addresses are collected from the main executable. As a result, the corresponding function names of calls to shared libraries cannot be identified and are thus ignored in the measurement.

We have developed the Score-P symbol injector library to identify and register these missing symbols [24]. Linked into the instrumented executable, it queries the /proc/self/maps pseudo-file at start-up to obtain information about the memory mapping of the loaded shared libraries. Each of these libraries is then analyzed with nm. Using the previously-collected information, each symbol is mapped to its address in the running program. Functions that are included in the IC are then registered in Score-P’s internal address-resolution hash map.

5 Evaluation on OpenFOAM

In this section, we demonstrate the presented CaPI toolchain on OpenFOAM and examine the obtained measurement results.

We evaluated the ICs with two OpenFOAM test cases: 3-D Lid-driven cavity (cavity), a well-known benchmark problem for incompressible flow [2], and HPC_Motorbike (motorbike), a simulation of flow around a motorbike model [23]. The executables applied in the main solve phase are icoFoam and simpleFoam, respectively. We measured the execution time for the Score-P profiling mode on a single node of the Lichtenberg 2 cluster, running with 4 MPI processes.^{Footnote 2}

The compatibility of CaPI with the Score-P filter file format enables the comparison of various combinations of the available selection and instrumentation methods. This is illustrated in Fig. 4.

The full specification of the evaluated variants is shown in Table 1. All instrumented variants rely on Score-P’s compile-time filtering method, using an IC generated by either scorep-score or CaPI. The scorep-full variant corresponds to Score-P’s default full instrumentation, which does not perform any explicit filtering but excludes all functions declared as . The hybrid variant combines both selection methods by performing additional runtime filtering. All variants were compiled with -O2 optimization.

Table 1. Build configurations used in the evaluation.

Full size table

For the scorep-score IC, we filtered out all functions that are called at least a million times and take less than 10 \(\mu {}s\) to execute. This yielded a filter file that excludes 17 functions for cavity and 38 functions for motorbike that are responsible for a majority of the overhead.

For the CaPI variants, we used the selection specification shown in Listing 1, which selects all call paths performing MPI communication. Additionally, we filtered out functions defined in files from a directory that contains mostly code related to I/O operations, as well as functions specified as .

We manually validated these ICs by comparing the resulting profiles with the results from scorep-full. Both profiles represented the behavior of the program accurately and preserved the call paths comprising hot spots.

Figure 5 shows the execution time measured for each variant. For both benchmarks, vanilla-gcc performed slightly better than vanilla-clang. For cavity, however, this difference is miniscule.

Compared to vanilla-gcc, the unfiltered instrumentation scorep-full produced only 8% overhead for cavity, but 135% for motorbike. Using the profile-guided filter variant scorep-filt reduced the overhead significantly to 3% for cavity and 44% for motorbike. The capi-gnu variant, however, was slower than scorep-filt in both cases. This is in part due to the initial look-up and registration of the shared library symbols. This step is quite time consuming because the CaPI-generated IC consists of an include list of about 110k entries, which have to be cross-checked with the found symbols. In the capi-scorep variant, the performance penalty due to the initialization overhead is eliminated, thus showing better results in both cases. The discrepancy in the execution time of between capi-gnu and capi-scorep are likely due to the differences in compilers and the Score-P measurement API.

The hybrid variant showed the most promising results. For cavity, it reduces the instrumentation overhead to below 1%. Similarly, hybrid yielded the overall best results for motorbike with an overhead of 30% compared to vanilla-gcc.

6 Usability and Validation Impediments

In this section, we highlight some of the usability impediments that we had to overcome in the instrumentation of OpenFOAM.

As mentioned earlier, dealing with the particularities of uncommon build systems can be cumbersome and tedious. As such, OpenFOAM’s wmake made certain aspects of the tool application more difficult. We do not consider it as a separate issue in this list. Nonetheless, it should be noted that the chosen build system heavily influences the ease-of-use of any instrumentation workflow.

Whole-Program CG. The generation of the whole-program CG is the most time-consuming part of our toolchain, and took several hours for OpenFOAM. The main difficulty, however, lies in setting up the analysis correctly. It has to be executed as a preprocessing step and is therefore not easily applied via the build system. This makes it difficult to identify which source files should be included.

For the initial local CG analysis, it is sufficient to search the code base for C++ files. The subsequent merging into a whole-program CG, however, requires additional care. OpenFOAM builds a large number of individual solver executables. Merging them all together is not sensible, as their behavior varies significantly. Hence, to generate the CG for each solver, we first merge all local CGs of the OpenFOAM libraries into a large library CG. We then identify the source files specific to the solver and merge the corresponding CGs with the library CG.

In general, this requires the user to have detailed knowledge about the build process of the target application. In its current form, the setup of the CG analysis therefore constitutes a significant barrier.

Limitations of Static Analysis. Due to the inherent limitations of static analysis, some call paths cannot be correctly identified by MetaCG. The resulting CG is therefore not guaranteed to be complete. A common reason for missed call edges is the use of function pointers [17]. For OpenFOAM, this played a minor role. In general, however, we cannot guarantee that there are no other issues that lead to missed calls, e.g., due to bugs in the analysis or misconfigured selection specifications. Unfortunately, there is no direct way to reliably check that a recorded profile is complete. Hence, it is the responsibility of the user to manually verify that no major parts of the code are missing.

To mitigate the issue, MetaCG provides a tool that compares the statically constructed CG with one constructed from a full-instrumentation profile and adds missing edges. This approach, however, introduces additional steps into the instrumentation workflow and requires a fully-instrumented build of the target. Furthermore, the resulting CG is only valid for the specific program inputs used to generate the profile. In order to guarantee completeness, this validation step must be repeated every time the program calling behavior changes based on inputs. For large code bases, this is impractical.

Managing Multiple Configurations. In the use case of OpenFOAM, it is sensible to create separate ICs for different solvers, as they may use completely different parts of the main library. As the instrumentation of the selected functions happens at compile-time, every new IC requires a rebuild of the program. Moreover, for multiple, different ICs, a separate build folder per IC is required.

This is especially tedious in OpenFOAM because the build system is designed to have only one build for each compiler configuration. Maintaining multiple instrumented builds is doable, but requires tedious configuration work. In addition, the user needs to keep track of the purpose of each build and document the configuration steps. If this is done poorly, the wrong build may be used, potentially leading to incomplete profiling data.

Furthermore, having multiple builds of a large program can waste significant amounts of disk space, despite the binaries being virtually identical.

In order to avoid these issues altogether, Score-P provides an option for run-time filtering. Using this method, all functions are initially instrumented. At run-time, the entry/exit hooks are still called, but measurements are only recorded for functions that pass the filter. As a result, the overhead is generally bigger compared to compile-time filtering, which may lead to skewed measurements. This is especially apparent with our toolchain, which generates a filter list containing \(\approx \)29k entries for the cavity case. We observed a significant increase in overhead using run-time filtering with this CaPI-generated filter, compared to the compile-time filtering method.

7 Discussion

We have demonstrated that our tool is capable of generating instrumentation configurations for large-scale codes. The results show that a hybrid approach, which combines the tailored CaPI selection with run-time filtering to remove remaining high-overhead functions, proved to be especially effective in mitigating the overhead, while preserving relevant call paths. This demonstrates that the compiler-assisted instrumentation workflow is in principle feasible to apply and beneficial w.r.t. overhead reduction.

In practice, however, the application on OpenFOAM proved to be quite time-consuming and required a good understanding of the code base and build system. We can therefore conclude that for most cases, the use of existing profile-guided filtering techniques with manual adjustments is preferable, as they require far less configuration overhead. The issues we identified are in large part applicable to other compiler-assisted instrumentation tools that rely on prior static analysis. This relates to PIRA in particular, which uses the same CG analysis workflow. In order for compiler-assisted instrumentation tools to be a viable alternative, the following key challenges must be addressed:

Simplification of the Analysis Workflow: The global static CG analysis is a requirement for the presented selection techniques. Currently, this step is very time-consuming. In order to simplify the workflow, the manual set-up must be reduced, by providing better integration into the compilation process.

Management of Build Configurations: Different instrumented versions of a code currently require maintaining multiple program builds. Instrumentation tools should aid in organizing and identifying them. Ideally, the need for separate builds should be eliminated altogether by providing an alternative run-time adaptation method that introduces little overhead.

Detection of Missed Calls: Currently, the user is unable to tell if function calls are missing due to limitations in the static analysis. A manual comparison with a complete instrumentation of the same program is possible, but requires extra steps that have to be repeated for every input configuration. Ideally, the static analysis phase should detect situations where such problems might occur and insert run-time checks to detect missed calls.

8 Conclusion and Future Work

We presented the Compiler-assisted Performance Instrumentation tool for user-defined selective program instrumentation. CaPI was demonstrated by creating tailored instrumentation for the CFD framework OpenFOAM. Our evaluation showed that a hybrid selection approach, comprised of static selection and run-time filtering, is effective in eliminating overhead. However, the amount of required manual work for CaPI is undesirable. Hence, we identified key areas for improvement to make such techniques more accessible.

Currently, the biggest usability issue for CaPI and similar tools is the requirement for a separate analysis phase. This issue could be mitigated by shifting the whole-program CG construction to link-time and embedding the CG into the generated binary, as illustrated in Fig. 6. In this proposed toolchain, a suitable dynamic instrumentation method enables the selection and instrumentation steps at program start. This opens up opportunities for dynamic instrumentation refinement based on collected run-time information, as employed by PIRA, without the need to rebuild the program. In addition, the availability of the CG at run-time would enable the assessment of the IC’s completeness. Further work is required to assess the feasibility of this approach.

CaPI is available at https://github.com/tudasc/CaPI under the BSD 3-Clause license.

Change history

09 May 2023
A correction has been published.

Notes

References

Berris, D.M., Veitch, A., Heintze, N., Anderson, E., Wang, N.: XRay: a function call tracing system (2016). https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45287.pdf
Bnà, S., Spisso, I., Olesen, M., Rossi, G.: PETSc4FOAM: a library to plug-in PETSc into the OpenFOAM Framework. PRACE White paper (2020)
Google Scholar
Buck, B.R.: An API for runtime code patching. Int. J. High Perform. Comput. Appl. 14(4), 317–329 (2000). https://doi.org/10.1177/109434200001400404
Article Google Scholar
Calotoiu, A., et al.: Fast multi-parameter performance modeling. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 172–181. IEEE, September 2016. https://doi.org/10.1109/CLUSTER.2016.57
Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 45:1–45:12. ACM, New York (2013). https://doi.org/10.1145/2503210.2503277
Copik, M., Calotoiu, A., Grosser, T., Wicki, N., Wolf, F., Hoefler, T.: Extracting clean performance models from tainted programs. In: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2021, pp. 403–417. ACM, New York (2021). https://doi.org/10.1145/3437801.3441613
Daniel, A.: Contributors: CLOC (2006–2020). https://github.com/AlDanial/cloc
Dubois, P., Epperly, T., Kumfert, G.: Why Johnny can’t build [portable scientific software]. Comput. Sci. Eng. 5(5), 83–88 (2003). https://doi.org/10.1109/MCISE.2003.1225867
Article Google Scholar
Free Software Foundation Inc: GCC Program Instrumentation Options (2022). https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html
Iwainsky, C.: InstRO: a component-based toolbox for performance instrumentation. Ph.D. thesis, TU Darmstadt (2015). https://doi.org/10.2370/9783844045628
Iwainsky, C., Bischof, C.: Calltree-controlled instrumentation for low-overhead survey measurements. In: Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016, pp. 1668–1677. IEEE, July 2016. https://doi.org/10.1109/IPDPSW.2016.54
Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing 2011, pp. 79–91 (2012). https://doi.org/10.1007/978-3-642-31476-6_7
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, CGO 2004, pp. 75–86 (2004). https://doi.org/10.1109/CGO.2004.1281665
Lehr, J.P., Bischof, C., Dewald, F., Mantel, H., Norouzi, M., Wolf, F.: Tool-supported mini-app extraction to facilitate program analysis and parallelization. In: 50th International Conference on Parallel Processing. ACM, New York (2021). https://doi.org/10.1145/3472456.3472521
Lehr, J.P., Calotoiu, A., Bischof, C., Wolf, F.: Automatic instrumentation refinement for empirical performance modeling. In: Proceedings of ProTools 2019: Workshop on Programming and Performance Visualization Tools - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 40–47. IEEE, November 2019. https://doi.org/10.1109/ProTools49597.2019.00011
Lehr, J.P., Hück, A., Bischof, C.: PIRA: performance instrumentation refinement automation. In: AI-SEPS 2018 - Proceedings of the 5th ACM SIGPLAN International Workshop on Artificial Intelligence and Empirical Methods for Software Engineering and Parallel Computing Systems, Co-located with SPLASH 2018, pp. 1–10. ACM, New York, November 2018. https://doi.org/10.1145/3281070.3281071
Lehr, J.P., Hück, A., Fischler, Y., Bischof, C.: MetaCG: annotated call-graphs to facilitate whole-program analysis. In: TAPAS 2020 - Proceedings of the 11th ACM SIGPLAN International Workshop on Tools for Automatic Program Analysis, Co-located with SPLASH 2020, pp. 3–9. ACM, New York, November 2020. https://doi.org/10.1145/3427764.3428320
Lehr, J.P., Iwainsky, C., Bischof, C.: The influence of HPCToolkit and score-p on hardware performance counters. In: Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems, SEPS 2017, pp. 21–30. ACM, New York (2017). https://doi.org/10.1145/3141865.3141869
Lindlan, K.A., et al.: A tool framework for static and dynamic analysis of object-oriented software with templates. In: Proceedings of the ACM/IEEE 2000 Conference on Supercomputing, p. 49, November 2000. https://doi.org/10.1109/SC.2000.10052
Message Passing Interface Forum: MPI: a message-passing interface standard version 3.1 (2015). https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
Mußler, J., Lorenz, D., Wolf, F.: Reducing the overhead of direct application instrumentation using prior static analysis. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6852, pp. 65–76. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23400-2_7
Chapter Google Scholar
OpenCFD: OpenFOAM v2106. https://develop.openfoam.com/Development/openfoam/-/tree/OpenFOAM-v2106
OpenFOAM Project: OpenFOAM benchmark problems. https://develop.openfoam.com/committees/hpc/-/tree/develop/
Sebastian Kreutzer: Score-P Symbol Injector Library (2022). https://github.com/sebastiankreutzer/scorep-symbol-injector
Shende, S.S.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006). https://doi.org/10.1177/1094342006064482
Article Google Scholar
Weller, H.G., Tabor, G., Jasak, H., Fureby, C.: A tensorial approach to computational continuum mechanics using object-oriented techniques. Comput. Phys. 12(6) (1998). https://doi.org/10.1063/1.168744

Download references

Acknowledgments

This work was funded by the Bundesministeriums für Bildung und Forschung (BMBF) - 16HPC023, and by the Hessian LOEWE initiative within the Software-Factory 4.0 project and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - Project-ID 265191195 - SFB 1194.

The exaFOAM project has received funding from the European Union’s Horizon 2020/EuroHPC research and innovation program under grant Agreement number: 956416.

Calculations were conducted on the Lichtenberg high-performance computer of Technical University of Darmstadt.

Author information

Authors and Affiliations

Scientific Computing, Department of Computer Science, Technical University of Darmstadt, Darmstadt, Germany
Sebastian Kreutzer, Jan-Patrick Lehr & Christian Bischof
Hessian Competence Center for High Performance Computing (HKHLR), Darmstadt, Germany
Christian Iwainsky

Authors

Sebastian Kreutzer
View author publications
You can also search for this author in PubMed Google Scholar
Christian Iwainsky
View author publications
You can also search for this author in PubMed Google Scholar
Jan-Patrick Lehr
View author publications
You can also search for this author in PubMed Google Scholar
Christian Bischof
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Kreutzer .

Editor information

Editors and Affiliations

University of Tennessee, Knoxville, TN, USA
Hartwig Anzt
University of New Mexico, Albuquerque, NM, USA
Amanda Bienz
University of Tennessee, Knoxville, TN, USA
Piotr Luszczek
Université Paris-Saclay, Orsay, France
Marc Baboulin

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kreutzer, S., Iwainsky, C., Lehr, JP., Bischof, C. (2022). Compiler-Assisted Instrumentation Selection for Large-Scale C++ Codes. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham. https://doi.org/10.1007/978-3-031-23220-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-23220-6_1
Published: 04 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23219-0
Online ISBN: 978-3-031-23220-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Compiler-Assisted Instrumentation Selection for Large-Scale C++ Codes