Track 1: Data collection through the whole hardware/software stack
Abstract:
Track 1 will study the tracing information that may be available at different levels in the complete hardware and software stack, in order to insure that all the needed information can be extracted.
Challenges:
The foundation for monitoring tools is the low-disturbance data collection through the whole software and hardware stack. The foundation for monitoring tools is the low-disturbance data collection through the whole software and hardware stack. LTTng has an infrastructure in place to efficiently collect tracing data from tracepoints inserted statically or dynamically in the operating system, in applications, and even in applications running on bare-metal. In addition, hardware tracing support is now available in most of the general purpose central processing unit architectures such as ARM, FreeScale QorIQ, and the Intel X86; these often provide the lowest overhead and good scalability.
The problem however is that newer specialised co-processors have not reached the same achitectural maturity and offer less hardware support for tracing and profiling. GPGPUs do contain performance counters and limited hardware tracing support. However, this is typically undocumented, only accessible in a limited way through closed-source libraries and tools. This is changing with AMD opening up a large portion of its software stack through the GPU Open initiative. Similarly, new sources of tracing data will be needed with other co-processors such as the Adapteva Epiphany V or with the Google Tensor Processing Unit. The same holds true for telecom and networking equipment such as switches, where packet co-processors are coupled with general purpose central processing units. Tracing these packet co-processors is required for tracking difficult performance or logic bugs.
The main difficulty, when many diverse sources of tracing data are available, is to properly correlate the events from different sources that correspond to interactions. As a first step, time synchronisation is required. Efficient algorithms have been developed for this purpose but they need to be adapted to the different types of interactions, since these serve as reference points for the synchronisation. Even more difficult is to follow all the links between the events in the different sources and layers. To be able to compute the critical path for a certain task, sufficient information must be available. We have obtained extremely interesting results for many applications interacting through standard system calls. However, for interactions between the central processing unit and co-processors such as GPGPUs, this information is much more difficult to obtain.
Many systems are now capable of producing extremely detailed tracing data. The challenge then becomes to select which data sources to activate, and for which time interval. Efficient algorithms were developed to provide a framework to trigger data collection and storage upon encountering specific trigger conditions (e.g., large system call latency) in the operating system or in applications. Similar new algorithms and mechanisms are required for data sources coming from different co-processors.
Plan:
Truck1 aims to develop algorithms and techniques to better integrate new data sources, associated with co-processors, into the tracing and monitoring framework. It includes hardware traces, hardware performance counters and software instrumentation in the runtime support. This is in order to obtain information about all important events of the execution, and link them to events in the central processing unit. In addition, it will propose algorithms to dynamically adjust the level of tracing details.