When the C preprocessor is used with the C, C++, or Objective-C compilers, it is integrated into the compiler and communicates a stream of binary tokens directly to the compiler's parser. However, it can also be used in the more conventional standalone mode, where it produces textual output.
The output from the C preprocessor looks much like the input, except that all preprocessing directive lines have been replaced with blank lines and all comments with spaces. Long runs of blank lines are discarded.
The ISO standard specifies that it is implementation defined whether a preprocessor preserves whitespace between tokens, or replaces it with e.g. a single space. In GNU CPP, whitespace between tokens is collapsed to become a single space, with the exception that the first token on a non-directive line is preceded with sufficient spaces that it appears in the same column in the preprocessed output that it appeared in the original source file. This is so the output is easy to read. See Differences from previous versions. CPP does not insert any whitespace where there was none in the original source, except where necessary to prevent an accidental token paste.
Source file name and line number information is conveyed by lines of the form
# linenum filename flags
These are called linemarkers. They are inserted as needed into the output (but never within a string or character constant). They mean that the following line originated in file filename at line linenum. filename will never contain any non-printing characters; they are replaced with octal escape sequences.
After the file name comes zero or more flags, which are 1
,
2
, 3
, or 4
. If there are multiple flags, spaces
separate them. Here is what the flags mean:
1
2
3
4
extern "C"
block.
As an extension, the preprocessor accepts linemarkers in non-assembler
input files. They are treated like the corresponding #line
directive, (see Line Control), except that trailing flags are
permitted, and are interpreted with the meanings described above. If
multiple flags are given, they must be in ascending order.
Some directives may be duplicated in the output of the preprocessor.
These are #ident
(always), #pragma
(only if the
preprocessor does not handle the pragma itself), and #define
and
#undef
(with certain debugging options). If this happens, the
#
of the directive will always be in the first column, and there
will be no space between the #
and the directive name. If macro
expansion happens to generate tokens which might be mistaken for a
duplicated directive, a space will be inserted between the #
and
the directive name.