This is how CPP behaves in all the cases which the C standard describes as implementation-defined. This term means that the implementation is free to do what it likes, but must document its choice and stick to it.
Currently, GNU cpp only supports character sets that are strict supersets of ASCII, and performs no translation of characters.
In textual output, each whitespace sequence is collapsed to a single space. For aesthetic reasons, the first token on each non-directive line of output is preceded with sufficient spaces that it appears in the same column as it did in the original source file.
The preprocessor and compiler interpret character constants in the
same way; i.e. escape sequences such as \a
are given the
values they would have on the target machine.
The compiler values a multi-character character constant a character
at a time, shifting the previous value left by the number of bits per
target character, and then or-ing in the bit-pattern of the new
character truncated to the width of a target character. The final
bit-pattern is given type int
, and is therefore signed,
regardless of whether single characters are signed or not (a slight
change from versions 3.1 and earlier of GCC). If there are more
characters in the constant than would fit in the target int
the
compiler issues a warning, and the excess leading characters are
ignored.
For example, 'ab' for a target with an 8-bit char
would be
interpreted as (int) ((unsigned char) 'a' * 256 + (unsigned char) 'b'), and '\234a' as (int) ((unsigned char) '\234' * 256 + (unsigned char) 'a').
For a discussion on how the preprocessor locates header files, Include Operation.
#include
directive.
See Computed Includes.
#pragma
directive that after macro-expansion
results in a standard pragma.
No macro expansion occurs on any #pragma
directive line, so the
question does not arise.
Note that GCC does not yet implement any of the standard pragmas.