After tokenization, the stream of tokens may simply be passed straight to the compiler's parser. However, if it contains any operations in the preprocessing language, it will be transformed first. This stage corresponds roughly to the standard's "translation phase 4" and is what most people think of as the preprocessor's job.
The preprocessing language consists of directives to be executed and macros to be expanded. Its primary capabilities are:
There are a few more, less useful, features.
Except for expansion of predefined macros, all these operations are
triggered with preprocessing directives. Preprocessing directives
are lines in your program that start with #
. Whitespace is
allowed before and after the #
. The #
is followed by an
identifier, the directive name. It specifies the operation to
perform. Directives are commonly referred to as #
name
where name is the directive name. For example,
#define
is
the directive that defines a macro.
The #
which begins a directive cannot come from a macro
expansion. Also, the directive name is not macro expanded. Thus, if
foo
is defined as a macro expanding to define
, that does
not make #foo
a valid preprocessing directive.
The set of valid directive names is fixed. Programs cannot define new preprocessing directives.
Some directives require arguments; these make up the rest of the
directive line and must be separated from the directive name by
whitespace. For example, #define
must be followed by a macro
name and the intended expansion of the macro.
A preprocessing directive cannot cover more than one line. The line may, however, be continued with backslash-newline, or by a block comment which extends past the end of the line. In either case, when the directive is processed, the continuations have already been merged with the first line to make one long line.