Abstracts (Peter Pirkelbauer)

Exploring Dynamic Load Imbalance Solutions with the CoMD Proxy Application

Olga Pearce, Hadia Ahmed, Rasmus W. Larsen, Peter Pirkelbauer and David F. Richards.

We have enabled work migration in the CoMD proxy application to study dynamic load imbalance. Proxy applications are developed to simplify studying parallel performance of scientific simulations and to test potential solutions for performance problems. However, proxy applications are typically too simple to allow work migration or to represent the load imbalance of their parent applications. To study the ability of load balancing solutions to balance work effectively, we enable work migration in one of the ExMatEx proxy applications, CoMD. We design a methodology to parameterize three key aspects necessary for studying load imbalance correction: 1) the granularity with which work can be migrated; 2) the initial load imbalance; 3) the dynamic load imbalance (how quickly the load changes over time). We present a study of the impact of flexibility in work migration in CoMD on load balance and the associated rebalancing costs for a wide range of initial and dynamic load imbalance scenarios.


Transforming Blocking MPI Collectives to Nonblocking and Persistent Operations

Hadia Ahmed, Anthony Skjellum, Purushotham Bangalore, Peter Pirkelbauer

This paper describes Petal, a prototype tool that uses compiler-analysis techniques to automate code transformations to hide communication costs behind computation by replacing blocking MPI functions with corresponding nonblocking and persistent collective operations while maintaining legacy applications' correctness. In earlier work, we have already demonstrated Petal's ability to transform point-to-point MPI operations in complement to the results shown here. The contributions of this paper include the approach to collective operation transformations, a description of achieved functionality, examples of transformations, and demonstration of performance improvements obtained thus far on representative sample MPI programs. Depending on system scale and problem size, the transformations yield a speedup of up to a factor of two. This tool can be used to transform useful classes of new and legacy MPI programs to use the newest variants of MPI functions to improve performance without manual intervention for forthcoming HPC systems and updated versions of the MPI standard.


Ariadne: Hybridizing Directed Model Checking and Static Analysis

Reed Milewicz, Peter Pirkelbauer

While directed model checking has proven to be a powerful tool in the fight against concurrency bugs, scalability remains a concern due to the combinatorial explosion in size of the state space. Overcoming that combinatorial explosion requires the selection and/or parameterization of meta*-heuristics, but we are left with a persistent problem of having to provide or compute specialized knowledge of the program under consideration, and this limits the practical value of the technique. To circumvent that, this paper investigates directed model checking as a platform for the synthesis of results from other analyses. We introduce an open-source tool, Ariadne, which translates reports of suspected race conditions of a static analyzer (Petablox) to instrumentation using a source-to-source compiler (ROSE) that can be exploited by a model checker (Java Pathfinder). We detail the algorithm used, present experimental results, and outline directions for future research.


Comparing Artificial Neural Network and Cohort-Component Models for Population Forecasts

Viktoria Riiman, Amalee Wilson, Reed Milewicz, Peter Pirkelbauer

Artificial neural network (ANN) models are rarely used to forecast population in spite of their growing prominence in other fields. We compare the forecasts generated by ANN long short-term memory models (LSTM) with population projections from traditional cohort-component method (CCM) for counties in Alabama. The evaluation includes forecasts for all 67 counties that offer diversity in terms of population and socioeconomic characteristics. When comparing projected values with total population counts from the 2010 decennial census, the CCM used by the Center for Business and Economic Research at the University of Alabama in 2001 produced more accurate results than a basic multi-county ANN LSTM model. Only when we use single-county models or proxy for a forecaster's experience and personal judgment with potential economic forecasts, results from ANN models improve. The results indicate the significance of forecaster's experience and judgment for CCM and difficulty, but not impossibility of substituting these insights with available data.


Memory Management for Concurrent Data Structures on Hardware Transactional Memory

Peter Pirkelbauer, Reed Milewicz, Amalee Wilson, Hadia Ahmed

Similar to fine-grained locking, and lock-free programming, memory management poses challenges to programming systems with hardware transactional memory (HTM). Thus, scalable data structures need to integrate a memory management scheme, such as hazard pointers, repeated offender, reference counting, and Stacktrack.

In this paper, we revisit epochs, another popular memory management technique, and offer an interpretation for HTM systems. HTM helps avoid a common problem of quiescent techniques where delayed or failed threads prevent memory reclamation. In transactional mode, HTM allows threads to interrupt other delayed threads that prevent progress otherwise. This paper describes the design and implementation of this technique and discusses trade-offs compared to other techniques. Experiments were conducted on Intel Haswell and Power8 architectures. The described technique is competitive with other HTM based techniques. Our results also demonstrate that under many scenarios, HTM based algorithms can be as fast as other optimized algorithms.


A Portable Lock-free Bounded Queue

Peter Pirkelbauer, Reed Milewicz, Juan Felipe Gonzalez

Attaining efficient and portable lock-free containers is challenging as almost any CPU family implements slightly different memory models and atomic read-modify-write operations. C++11 offers a memory model and operation abstractions that enable portable implementations of non-blocking algorithms. In this paper, we present a first scalable and portable lock-free bounded queue supporting multiple readers and multiple writers. Our design uses unique empty values to decouple writing an element from incrementing the tail during enqueue. Dequeue employs a helping scheme that delays helping in the regular case, thereby reducing contention on shared memory. We evaluate our implementation on a range of architectures featuring weak and strong memory consistency models. Our comparisons with known blocking designs and another novel alternative lock-free design demonstrate that the presented implementation performs well on architectures that implement a weak memory consistency model.


An Automatic Transformation and Analysis Tool for Improving Legacy MPI Applications

Hadia Ahmed, Peter Pirkelbauer, Anthony Skjellum

Legacy MPI applications are an important and economically valuable category of parallel software that rely on the MPI-1, MPI-2 (and, more recently, MPI-3) standards to achieve performance and portability. Many of these applications have developed or ported to MPI over the past two decades, with the implicit (dual) goal of achieving acceptably high performance and scalability, and a high level of portability between diverse parallel architectures.

However they were often created implicitly using MPI in ways that exploited how a particular underlying MPI behaved at the time (such as those with polling progress). Thus, they did not necessarily take advantage of the full potential for describing latent concurrency or for loosening the coupling of the application thread from the message scheduling and transfer.

Our hypothesis is that the ability to refactor MPI applications for modern architectures (leading all the way to Exascale systems) will allow the programs to retain their nominal portability while in fact taking advantage of better hardware and superior MPI architectures.


Lightweight runtime checking of C programs with RTC

Reed Milewicz, Rajesh Vanka, James Tuck, Daniel Quinlan, Peter Pirkelbauer

The C Programming Language is known for being an efficient language that can be compiled on almost any architecture and operating system. However the absence of dynamic safety checks and a relatively weak type system allows programmer oversights that are hard to spot. In this paper, we present RTC, a runtime monitoring tool that instruments unsafe code and monitors the program execution. RTC is built on top of the ROSE compiler infrastructure. RTC finds memory bugs and arithmetic overflows and underflows, and run-time type violations. Most of the instrumentations are directly added to the source file and only require a minimal runtime system. As a result, the instru- mented code remains portable. In tests against known error detection benchmarks, RTC found 98% of all memory related bugs and had zero false positives. In performance tests conducted with well known algorithms, such as binary search and MD5, we determined that our tool has an average run-time overhead rate of 9.7x and memory overhead rate of 3.5x.


Runtime Checking C Programs

Reed Milewicz, Rajesh Vanka, James Tuck, Daniel Quinlan, Peter Pirkelbauer

Choosing a suitable data structure is hard in sequential applications and harder in parallel applications. In this paper, we describe a novel methodology that selects an optimal data structure implementation from a repository. Users describe the data structure (e.g., queue, linked list), their requirements (ABA free, relaxed linearizability), and used operations (e.g., enqueue, dequeue). The repository contains multiple instances for each data structure. The most generic instance, known as parent, implements a full set of operations. In addition, the repository contains children data structures optimized for a subset of operations. Users write code in terms of the generic data structure, specify requirements, and required operations. Through code analysis, we determine whether any child data structure can be used in place of the parent. This selection allows for fast, correct, scalable implementations that do not lock the user into a single approach. By allowing the user's code base to evolve without fear of implementation boundaries, we can provide a major benefit in complex software design. Our approach is built on the idea that change is inevitable in software design. Selecting the best data structure implementation is a hard problem, but should not distract programmers from their work.


Building Fast Concurrent Data Structures through Data Structure Families

Brendan Lynch, Peter Pirkelbauer, Damian Dechev

Choosing a suitable data structure is hard in sequential applications and harder in parallel applications. In this paper, we describe a novel methodology that selects an optimal data structure implementation from a repository. Users describe the data structure (e.g., queue, linked list), their requirements (ABA free, relaxed linearizability), and used operations (e.g., enqueue, dequeue). The repository contains multiple instances for each data structure. The most generic instance, known as parent, implements a full set of operations. In addition, the repository contains children data structures optimized for a subset of operations. Users write code in terms of the generic data structure, specify requirements, and required operations. Through code analysis, we determine whether any child data structure can be used in place of the parent. This selection allows for fast, correct, scalable implementations that do not lock the user into a single approach. By allowing the user's code base to evolve without fear of implementation boundaries, we can provide a major benefit in complex software design. Our approach is built on the idea that change is inevitable in software design. Selecting the best data structure implementation is a hard problem, but should not distract programmers from their work.


SimpleConcepts: A Lightweight Extension to C++ to Support Constraints on Generic Types

Reed Milewicz, Marjan Mernik, Peter Pirkelbauer

Generic programming plays an essential role in C++ software through the use of templates. However, both the creation and use of template libraries is hindered by the fact that the language does not allow programmers to specify constraints on generic types. To date, no proposal to update the language to provide concepts has survived the committee process. Until that time comes, as a form of early support, this paper introduces SimpleConcepts, an extension to C++11 that provides support for concepts, sets of constraints on generic types. SimpleConcepts features are parsed according to an island grammar and source-to-source translation is used to lower concepts to pure C++11 code.


SimpleConcepts: Support for Constraints on Generic Types in C++

Reed Milewicz, Marjan Mernik, Peter Pirkelbauer

Generic programming plays an essential role in C++ software through the use of templates. However, both the creation and use of template libraries is hindered by the fact that the language does not allow programmers to specify constraints on generic types. To date, no proposal to update the language to provide concepts has survived the committee process. Until that time comes, as a form of early support, this paper introduces SimpleConcepts, an extension to C++11 that provides support for concepts, sets of constraints on generic types. SimpleConcepts features are parsed according to an island grammar and source-to-source translation is used to lower concepts to pure C++11 code.


Non-blocking Programming Techniques

Peter Pirkelbauer

Multi-core architectures are common on personal computers and mobile devices. The number of available cores is bound to increase rapidly in the foreseeable future. Utilizing the potential parallel processing power provided by the hardware is a major challenge in modern software engineering. Programming parallel software is difficult because concurrency entails many hazards including race conditions, deadlocks, livelocks, order violations, and atomicity violations.

In this talk we will introduce the non-blocking programming paradigm as an alternative for programming multi-core systems. Non-blocking programming avoids many of the mentioned concurrency hazards, while retaining a large degree of parallelism. We will present non-blocking implementations of common data structures, such as a bounded queue and a resizable array. Our evaluation against known blocking and hybrid alternatives demonstrates that the performance of our implementations is competitive. The presented data-structures are portable because we rely only on the C1x/C++11 concurrency model, which offers programmers fine-grained control over data synchronization. A brief overview of the C++11 concurrency model will also be given.


Portable Non-blocking Data Structures

Peter Pirkelbauer

Multi-core architectures are common on personal computers and mobile devices. The number of available cores is bound to increase rapidly in the foreseeable future. Utilizing the potential parallel processing power provided by the hardware is a major challenge in modern software engineering. Programming parallel software is difficult because concurrency entails many hazards including race conditions, deadlocks, livelocks, order violations, and atomicity violations.

In this talk we will introduce the non-blocking programming paradigm as an alternative to programming multi-core systems. Non-blocking programming avoids many of the mentioned concurrency hazards, while retaining a large degree of parallelism. We will present non-blocking implementations of common data structures, such as a resizable array and a bounded queue. Our evaluation against known blocking and hybrid alternatives demonstrates that the performance of our implementations is competitive. The presented data-structures are portable because we rely only on the C11/C++11 concurrency model, which offers programmers fine-grained control over data synchronization. A brief overview of the C++11 concurrency model will also be given.


Dynamic Bug Detection for C, C++, and UPC

Peter Pirkelbauer

Abstract: At compile-time, main-stream programming languages can detect software flaws in source code to a varying degree. While finding all software bugs statically is a noble goal, in practice absolute safety is hard to attain for compilers of general purpose languages. I will present a dynamic error detection tool CIRM (code instrumentation and runtime monitor) for C, C++, and UPC source codes. Built on top of the ROSE source-to-source transformation infrastructure, CIRM instruments source files with code that monitors operations and tracks changes of the system state. CIRM can detect uninitialized variables, arithmetic overflow/underflow, invocation of C standard library functions with precondition violating arguments, memory errors (involving stack, heap, and UPC's global address space), and similar software defects that would be otherwise hard to spot. During program execution, the runtime monitor keeps track of changes to the overall system state and flags operations that could compromise a system's safety. I will present ROSE-CIRM's implementation for both sequential and parallel codes and results obtained by checking against an error detection benchmark test suite. We also measured the runtime performance overhead by running error-free test codes. To overcome the slowdown incurred by monitoring the program execution, I will discuss how relatively simple static analysis techniques can eliminate unnecessary safety checks.


Runtime Detection of C-Style Errors in UPC Code

Peter Pirkelbauer, Chunhua Liao, Thomas Panas, Dan Quinlan

Abstract: Unified Parallel C (UPC) extends the C programming language (ISO C 99) with explicit parallel programming support for the parti- tioned global address space (PGAS), which provides a global mem- ory space with localized partitions to each thread. Like its ancestor C, UPC is a low-level language that emphasizes code efficiency. The absence of dynamic (and static) safety checks allows program- mer oversights and software flaws that can be hard to spot. In this paper, we present an extension of a dynamic analysis tool, ROSE-Code Instrumentation and Runtime Monitor (ROSE- CIRM), for UPC to help programmers find C-style errors involving the global address space. Built on top of the ROSE source-to- source compiler infrastructure, the tool instruments source files with code that monitors operations and keeps track of changes to the system state. The resulting code is linked to a runtime monitor that observes the program execution and finds software defects. We describe the extensions to ROSE-CIRM that were necessary to support UPC. We discuss complications that arise from parallel code and our solutions. We test ROSE-CIRM against a runtime error detection test suite, and present performance results obtained from running error-free codes. ROSE-CIRM is released as part of the ROSE compiler under a BSD-style open source license.


Support for the evolution of C++ generic functions

Peter Pirkelbauer, Damian Dechev, Bjarne Stroustrup

Abstract: The choice of requirements for an argument of a generic type or algorithm is a central design issue in generic programming. In the context of C++, a specification of requirements for a template argument or a set of template arguments is called a concept. In this paper, we present a novel tool, TACE (template analysis and concept extraction), designed to help programmers understand the re- quirements that their code de facto imposes on arguments and help sim- plify and generalize those through comparisons with libraries of well- defined and precisely-specified concepts. TACE automatically extracts requirements from the body of function templates. These requirements are expressed using the notation and semantics developed by the ISO C++ standards committee. TACE converts implied requirements into concept definitions and compares them against concepts from a repos- itory. Components of a well-defined library exhibit commonalities that allow us to detect problems by comparing requirements from many com- ponents: Design and implementation problems manifest themselves as minor variations in requirements. TACE points to source code that can- not be constrained by concepts and to code where small modifications would allow the use of less constraining concepts. For people who use a version of C++ with concept support, TACE can serve as a core engine for automated source code rejuvenation.


Understanding and Effectively Preventing the ABA Problem in Descriptor-based Lock-free Designs

Damian Dechev, Peter Pirkelbauer, Bjarne Stroustrup

Abstract: An increasing number of modern real-time systems and the nowadays ubiquitous multicore architectures demand the application of programming techniques for reliable and efficient concurrent synchronization. Some recently developed Compare-And-Swap (CAS) based nonblocking techniques hold the promise of delivering practical and safer concurrency. The ABA problem is a fundamental problem to all CAS-based designs. Its significance has increased with the suggested use of CAS as a core atomic primitive for the implementation of portable lock-free algorithms. The ABA problem's occurrence is due to the intricate and complex interactions of the application's concurrent operations and, if not remedied, ABA can significantly corrupt the semantics of a nonblocking algorithm. The current state of the art leaves the elimination of the ABA hazards to the ingenuity of the software designer. In this work we provide the first systematic and detailed analysis of the ABA problem in lock-free Descriptor-based designs. We study the semantics of Descriptor-based lock-free data structures and propose a classification of their operations that helps us better understand the ABA problem and subsequently derive an effective ABA prevention scheme. Our ABA prevention approach outperforms by a large factor the use of the alternative CAS-based ABA prevention schemes. We demonstrate our ABA prevention scheme by integrating it into an advanced nonblocking data structure, a lock-free dynamically resizable array.


Source Code Rejuvenation is not Refactoring

Peter Pirkelbauer, Damian Dechev, Bjarne Stroustrup

Abstract: Programmers rely on programming idioms, design patterns, and workaround techniques to make up for missing programming language support. Evolving languages often address frequently encountered problems by adding language and library support to subsequent releases. By using new features, programmers can express their indent more directly. As new concerns, such as parallelism or security, arise, the previously invented idioms can become a serious liability. Modern code benefits from compiler optimizing new language features. Legacy code that does not conform to modern design and implementation choices sets aside improved static checking and optimization opportunities. Manual source code migration is expensive, time-consuming, and prone to errors.

In this paper, we discuss the notion of source code rejuvenation, the automated migration of legacy code. While refactoring improves structurally inadequate source code, source code rejuvenation leverages enhanced program language capabilities (i.e., language features and libraries) by finding and replacing coding patterns that can be expressed through higher-level software abstractions. Exposing high level abstractions benefits software maintainability, security, and performance.


Design and Evaluation of C++ Open Multi-Methods

Peter Pirkelbauer, Yuriy Solodkyy, Bjarne Stroustrup

Abstract: Multiple dispatch - the selection of a function to be invoked based on the dynamic type of two or more arguments - is a solution to several classical problems in object-oriented programming. Open multi-methods generalize multiple dispatch towards open-class extensions, which improve separation of concerns and provisions for retroactive design. We present the rationale, design, implementation, performance, programming guidelines, and experiences of working with a language feature, called open multi-methods, for C++. Our open multi-methods support both repeated and virtual inheritance. Our call resolution rules generalize both virtual function dispatch and overload resolution semantics. After using all information from argument types, these rules can resolve further ambiguities by using covariant return types. Care was taken to integrate open multi-methods with existing C++ language features and rules. We describe a model implementation and compare its performance and space requirements to existing open multi-method extensions and workaround techniques for C++. Compared to these techniques, our approach is simpler to use, catches more user mistakes, and resolves more ambiguities through link-time analysis, is comparable in memory usage, and runs significantly faster. In particular, the runtime cost of calling an open multi-method is constant and less than the cost of a double dispatch (two virtual function calls). Finally, we provide a sketch of a design for open multi-methods in the presence of dynamic loading and linking of libraries.


Dynamic Algorithm Selection for Runtime Concepts

Peter Pirkelbauer, Sean Parent, Mat Marcus, Bjarne Stroustrup

Abstract: A key benefit of generic programming is its support for producing modules with clean separation. In particular, generic algorithms are written to work with a wide variety of types without requiring modifications to them. The Runtime concept idiom extends this support by allowing unmodified concrete types to behave in a runtime polymorphic manner. In this paper, we describe one implementation of the runtime concept idiom, in the domain of the C++ standard template library (STL). We describe and measure the performance of runtime-polymorphic analogs of several STL algorithms. We complement the runtime concept idiom with an algorithm library that considers both type and concept information to maximize performance when selecting algorithm implementations. We present two implementations, one in ISO C++ and one using an experimental language extension. We use our implementations to describe and measure the performance of runtime-polymorphic analogs of several STL algorithms. The tests demonstrate the effects of different compile-time vs. run-time algorithm selection choices.


Verification and Semantic Parallelization of Goal-Driven Autonomous Software

Damian Dechev, Peter Pirkelbauer, Nicolas Rouquette, Bjarne Stroustrup

Abstract: Future space missions, such as Mars Science Laboratory, are built upon computing platforms providing a high degree of autonomy and diverse functionality. The increased sophistication of robotic spacecraft has skyrocketed the complexity and cost of its software development and validation. The engineering of autonomous spacecraft software relies on the availability and application of advanced methods and tools that deliver safe concurrent synchronization as well as enable the validation of domain-specific semantic invariants. The software design and certification methodologies applied at NASA do not reach the level of detail of providing guidelines for the development of reliable concurrent software. To achieve effective and safe concurrent interactions as well as guarantee critical domain-specific properties in code, we introduce the notion of a Semantically Enhanced Container (SEC). A SEC is a data structure engineered to deliver the flexibility and usability of the popular ISO C++ Standard Template Library containers, while at the same time it is hand-crafted to guarantee domain-specific policies. We demonstrate the SEC proof-of-concept by presenting a shared nonblocking SEC vector. To eliminate the hazards of the ABA problem (a fundamental problem in lock-free programming), we introduce an innovative library for querying C++ semantic information. Our SEC design aims at providing an effective model for shared data access within the JPL's Mission Data System. Our test results show that the SEC vector delivers significant performance gains (a factor of 3 or more) in contrast to the application of nonblocking synchronization amended with the traditional ABA avoidance scheme.


Programming and Validation Techniques for Reliable Goal-driven Autonomic Software

Damian Dechev, Nicolas Rouquette, Peter Pirkelbauer, Bjarne Stroustrup

Abstract: Future space missions such as the Mars Science Laboratory demand the engineering of some of the most complex man-rated autonomous software systems. According to some recent estimates, the certification cost for mission-critical software exceeds its development cost. The current process- oriented methodologies do not reach the level of detail of providing guidelines for the development and validation of concurrent software. Time and concurrency are the most critical notions in an autonomous space system. In this work we present the design and implementation of a first concurrency and time centered framework for verification and semantic parallelization of real-time C++ within the JPL Mission Data System Framework (MDS). The end goal of the industrial project that motivated our work is to provide certification artifacts and accelerated testing of the complex software interactions in autonomous flight systems. As a case study we demonstrate the verification and semantic parallelization of the MDS Goal Networks.


Verification and Semantic Parallelization of Goal-Driven Autonomous Software

Damian Dechev, Nicolas Rouquette, Peter Pirkelbauer, Bjarne Stroustrup

Abstract: Future space missions such as the Mars Science Laboratory demand the engineering of some of the most complex man-rated autonomous software systems. According to some recent estimates, the certification cost for mission-critical software exceeds its development cost. The current process- oriented methodologies do not reach the level of detail of providing guidelines for the development and validation of concurrent software. Time and concurrency are the most critical notions in an autonomous space system. In this work we present the design and implementation of a first concurrency and time centered framework for verification and semantic parallelization of real-time C++ within the JPL Mission Data System Framework (MDS). The end goal of the industrial project that motivated our work is to provide certification artifacts and accelerated testing of the complex software interactions in autonomous flight systems. As a case study we demonstrate the verification and semantic parallelization of the MDS Goal Networks.


Runtime Concepts for the C++ Standard Template Library

Peter Pirkelbauer, Sean Parent, Mat Marcus, Bjarne Stroustrup

Abstract: A key benefit of generic programming is its support for producing modules with clean separation. In particular, generic algorithms are written to work with a wide variety of unmodified types. The Runtime concept idiom extends this support by allowing unmodified concrete types to behave in a runtime polymorphic manner. In this paper, we describe one implementation of the runtime concept idiom, in the domain of the C++ standard template library (STL). We describe and measure the performance of runtime-polymorphic analogs of several STL algorithms. We augment the runtime concept idiom by employing a dispatch mechanism that considers both type and concept information to maximize performance when selecting algorithm implementations. We use our implementation to demonstrate the effects of different compile-time vs. run-time algorithm selection choices, and we indicate where improved language and compiler support would be useful.


Open Multi-Methods for C++

Peter Pirkelbauer, Yuriy Solodkyy, Bjarne Stroustrup

Abstract: Multiple dispatch - the selection of a function to be invoked based on the dynamic type of two or more arguments - is a solution to several classical problems in object-oriented programming. Open multi-methods generalize multiple dispatch towards open-class extensions, which improve separation of concerns and provisions for retroactive design. We present the rationale, design, implementation, and performance of a language feature, called open multi-methods, for C++. Our open multi-methods support both repeated and virtual inheritance. Our call resolution rules generalize both virtual function dispatch and overload resolution semantics. After using all information from argument types, these rules can resolve further ambiguities by using covariant return types. Great care was taken to integrate open multi-methods with existing C++ language features and rules. We describe a model implementation and compare its performance and space requirements to existing open multi-method extensions and workaround techniques for C++. Compared to these techniques, our approach is simpler to use, catches more user mistakes, and resolves more ambiguities through link-time analysis, runs significantly faster, and requires less memory. In particular, the runtime cost of calling an open multimethod is constant and less than the cost of a double dispatch (two virtual function calls). Finally, we provide a sketch of a design for open multi-methods in the presence of dynamic loading and linking of libraries.


Lock-free Dynamically Resizable Arrays

Damian Dechev, Peter Pirkelbauer, Bjarne Stroustrup

Abstract: We present a first lock-free design and implementation of a dynamically resizable array (vector). The most extensively used container in the C++ Standard Template Library (STL) is vector, offering a combination of dynamic memory management and constant-time random access. Our approach is based on a single 32-bit word atomic compare-and-swap (CAS) instruction. It provides a linearizable and highly parallelizable STL-like interface, lock-free memory allocation and management, and fast execution. Our current implementation is designed to be most efficient on multi-core architectures. Experiments on a dual-core Intel processor with shared L2 cache indicate that our lock-free vector outperforms its lock-based STL counterpart and the latest concurrent vector implementation provided by Intel by a large factor. The performance evaluation on a quad dual-core AMD system with non-shared L2 cache demonstrated timing results comparable to the best available lock-based techniques. The presented design implements the most common STL vector's interfaces, namely random access read and write, tail insertion and deletion, pre-allocation of memory, and query of the container's size. Using the current implementation, a user has to avoid one particular ABA problem.


Open Multi-Methods for C++ - Proposal to the ISO WG21 committee

Peter Pirkelbauer, Yuriy Solodkyy, Bjarne Stroustrup

Abstract: Multiple dispatch - the selection of a function to be invoked based on the dynamic type of two or more arguments - is a solution to several classical problems in object-oriented programming. We present the rationale, design, and implementation of a language feature, called open multi-methods, for C++. Open multi-methods support both repeated and virtual inheritance and our call resolution rules generalize both virtual function dispatch and overload resolution semantics. After using all information from argument types, these rules can resolve further ambiguities by using covariant return types. We describe a model implementation and compare its performance and space requirements to existing open multi-method extensions and workaround techniques for C++. Compared to these techniques, our approach is simpler to use, catches more user mistakes (such as ambiguities), performs significantly better, and requires less memory. For example, our implementation of a multi-method call is constant-time and more than twice as fast as double dispatch - only 4% slower than a C++ virtual function call. Finally, we provide a sketch of a design for open multi-methods in the presence of dynamic loading and linking of libraries.


Zero-Overhead Exception Handling Using Metainformation

Peter Pirkelbauer, Markus Hof, Hanspeter Mössenböck

Abstract: We present a novel approach to exception handling which is based on metaprogramming. Our mechanism does not require language support, imposes no run time overhead to error-free programs, and is easy to implement. Exception handlers are implemented as ordinary procedures. When an exception occurs, the corresponding handler is searched dynamically using the type of the exception as a search criterion. Our implementation was done in the Oberon System but it could be ported to most other systems that support metaprogramming.



peter 2018-01-07