Blog Articles

Martin Sebor

Red Hat

Recent Posts

In part 1, I shed light on trade-offs involved in the GCC implementation choices for various types of front-end warnings, such as preprocessor warnings, lexical warnings, type-safety warnings, and other warnings.

As useful as front-end warnings are, those based on the flow of control or data through the program have rather inconvenient limitations. To overcome them, flow-based warnings have increasingly been implemented in what GCC calls the “middle end.” Middle-end warnings are the focus of this article.

Most of us appreciate when our compiler lets us know we made a mistake. Finding coding errors early lets us correct them before they embarrass us in a code review or, worse, turn into bugs that impact our customers. Besides the compulsory errors, many projects enable additional diagnostics by using the -Wall and -Wextra command-line options. For this reason, some projects even turn them into errors via -Werror as their first line of defense. But not every instance of a warning necessarily means the code is buggy. Conversely, the absence of warnings for a piece of code is no guarantee that there are no bugs lurking in it.

In this article, I would like to shed more light on trade-offs involved in the GCC implementation choices. Besides illuminating underlying issues for GCC contributors interested in implementing new warnings or improving existing ones, I hope it will help calibrate expectations for GCC users about what kinds of problems can be expected to be detected and with what efficacy. Having a better understanding of the challenges should also reduce the frustration the limitations of the available solutions can sometimes cause. (See part 2 to learn more about middle-end warnings.)

The article isn’t specific to any GCC version, but some command-line options it refers to are more recent than others. Most are in GCC 4 that ships with Red Hat Enterprise Linux (RHEL), but some are as recent as GCC 7. The output of the compiler shown in the examples may vary between GCC versions. See How to install GCC 8 on RHEL if you’d like to use the latest GCC.

Continuing in the effort to detect common programming errors, the just-released GCC 8 contains a number of new warnings as well as enhancements to existing checkers to help find non-obvious bugs in C and C++ code. This article focuses on those that deal with inadvertent string truncation and discusses some of the approaches for avoiding the underlying problems. If you haven’t read it, you might also want to read David Malcolm’s article Usability improvements in GCC 8.

Why Is String Truncation a Problem?

It is well-known why buffer overflow is dangerous: writing past the end of an object can overwrite data in adjacent storage, resulting in data corruption. In the most benign cases, the corruption can simply lead to incorrect behavior of the program. If the adjacent data is an address in the executable text segment, the corruption may be exploitable to gain control of the affected process, which can lead to a security vulnerability. (See CWE-119 for more on buffer overflow.)

Overview

The week of April 3, I attended a meeting of WG14, the C standardization committee, in Markham, ON. Markham is a suburb of Toronto about 40 minutes drive north. Unlike Toronto itself, it’s not a particularly interesting destination. We had four days of rain followed by snow, freezing temperatures, and the wind, which was perfect for spending time indoors and made it easy to resist any temptation to go sightseeing.

A few months ago, I had to write some internal GCC passes to perform static analysis on the GNU C Library (glibc). I figured I might as well write them as plugins since they were unlikely to see the light of day outside of my little sandbox. Being a long time GCC contributor, but having no experience writing plugins I thought it’d be a good way to eat our own dog food, and perhaps write about my experience.

Introduction

GCC has a rich set of features designed to help detect many kinds of programming errors. Of particular interest are those that corrupt the memory of a running program and, in some cases, makes it vulnerable to security threats. Since 2006, GCC has provided a solution to detect and prevent a subset of buffer overflows in C and C++ programs. Although it is based on compiler technology, it’s best known under the name Fortify Source derived from the synonymous GNU C Library macro that controls the feature: _FORTIFY_SOURCE. GCC has changed and improved considerably since its 4.1 release in 2006, and with its ability to detect these sorts of errors. GCC 7, in particular, contains a number of enhancements that help detect several new kinds of programming errors in this area. This article provides a brief overview of these new features. For a comprehensive list of all major improvements in GCC 7, please see GCC 7 Changes document.

Trip Report: October 2016 WG14 Meeting

In October 2016, I attended the WG14 (C language committee) meeting in Pittsburgh, Pennsylvania. The meeting was hosted by the Computer Emergency Response Team (CERT) at the Software Engineering Institute (SEI) at Carnegie Mellon University (CMU). We had 25 representatives from 18 organizations in attendance, including CERT, Cisco, IBM, INRIA, Intel, LDRA, Oracle, Perennial, Plum Hall, Siemens, and the University of Cambridge. It was a productive four days spent on two major areas:

Work on C11 defect reports aimed at the upcoming C11 Technical Corrigendum (TC) expected to be finalized in 2017. This will be the last revision of C11 to be published. The next revision of C will be a “major” version that is for the time being referred to as C2X.

Review of proposals for the next revision of C, C2X. To meet the TC 2017 schedule some C11 defects will have to be deferred to C2X. The C2X charter is in N2086.

Below is a list of some of the interesting C2X proposals the group discussed.

Static Initialization

…the default (zero) initialization for objects with static or thread-local storage duration is guaranteed to produce a valid state.

This means that, for example, defining an atomic object at file scope without an initializer as shown below is guaranteed to initialize it to a default state and default (zero) value.

atomic_int counter;

Other than zero-initialization, the standards require that atomic objects with static and thread storage duration be initialized using one of the ATOMIC_VAR_INIT() and atomic_init() macros. This requirement is a vestige of the original proposal which specified atomic types as structs. The expectation was that emulated implementations that stored a mutex in the struct along with the value would define the ATOMIC_VAR_INIT() macro along the following lines

Introduction

Following the lead of C++, along with a memory model describing the requirements and semantics of multithreaded programs, the C11 standard adopted a proposal for a set of atomic types and operations into the language. This change has made it possible to write portable multi-threaded software that efficiently manipulates objects indivisibly and without data races. The atomic types are fully interoperable between the two languages so that programs can be developed that share objects of atomic types across the language boundary. This paper examines some of the trade-offs of the design, points out some of its shortcomings, and outlines solutions that simplify the use of atomic objects in both languages.

History

The need for software to operate on objects of basic types in an atomic way goes back to the first multiprocessor systems. Many solutions were developed over the decades, each with its own unique characteristics that made writing portable code that took advantage of those features difficult. The C11 and C++11 standards codify an approach that allows software to make use of the hardware support for atomics on the broadest spectrum of processors. The authors of the proposals for C and C++ atomics envisioned efficient, lock-free implementations of these interfaces on architectures that provide robust support for such operations. The authors, however, didn’t want to preclude lock-based emulations on older or less capable hardware. Although emulated implementations were expected to store the lock associated with each atomic object separately from the object itself, this aspect too wasn’t mandated by the proposed specification. The proposers didn’t want to preclude stateful implementations of atomics.