Tải bản đầy đủ (.pdf) (112 trang)

The New C Standard- P16

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (804.08 KB, 112 trang )

6.10.1 Conditional inclusion
1883
• The specification has changed between C90 and C99.
The problem with any guideline recommendation is that the total cost is likely to be greater than the total
benefit (a cost is likely to be incurred in many cases and a benefit obtained in very few cases). For this reason
no recommendation is made here. The discussion on suffixed integer constants is also applicable in the
835 integer
constant
type first in list
context of a conditional inclusion directive.
Example
In the following the developer may assume that unwanted higher bits in the value of
C
will be truncated when
shifted left.
1 #define C 0x1100u
2 #define INT_BITS 32
3
4 #define TOP_BYTE (C << (INT_BITS-8))
5
6 #if TOP_BYTE == 0
7 /
*
...
*
/
8 #endif
9
10 void f(void)
11 {
12 if (TOP_BYTE == 0)


13 /
*
...
*
/ ;
14 }
1881
This includes interpreting character constants, which may involve converting escape sequences into execution
#if
escape se-
quences
character set members.
Commentary
This conversion also occurs in translation phase 5.
133 transla-
tion phase
5
1882
Whether the numeric value for these character constants matches the value obtained when an identical
character constant occurs in an expression (other than within a
#if
or
#elif
directive) is implementation-
defined.
143)
Commentary
The C committee recognized that developers may choose to perform different phases of translation on
different hosts. For instance, source files may be preprocessed and then distributed for further translation on
other, different, hosts.

Common Implementations
Differences between the numeric values in these two cases is rare (although cases involving Ascii and
EBCDIC character sets do occur).
3 EBCDIC
Coding Guidelines
Making use of the numeric value of character constants is making use of representation information, which is
covered by a guideline recommendation. However, there are cases where deviations may occur.
569.1 represen-
tation in-
formation
using
569.1 represen-
tation in-
formation
using
Example
See footnote 141.
1874 footnote
141
1883
Also, whether a single-character character constant may have a negative value is implementation-defined. basic char-
acter set
may be negative
June 24, 2009 v 1.2
6.10.1 Conditional inclusion
1888
Commentary
The guarantee on the value being nonnegative does not apply during preprocessing. For instance, a pre-
basic char-
acter set

positive if stored
in char object
478
processing using the EBCDIC character set and acting as if the type
char
was signed. In other contexts
the value of a character constant containing a single-character that is not a member of the basic execution
character set is implementation-defined.
character
constant
more than
one character
885
Coding Guidelines
The discussion on the possibility of character constants having other implementation-defined values is
character
constant
more than
one character
885
applicable here.
1884
Preprocessing directives of the forms#ifdef
#ifndef
# ifdef identifier new-line group
opt
# ifndef identifier new-line group
opt
check whether the identifier is or is not currently defined as a macro name.
Commentary

There is no
#elifdef
form (although over half of the uses of the
#elif
directive are followed by a single
instance of the defined operator— Table 1872.1).
1885
Their conditions are equivalent to #if defined identifier and #if !defined identifier respectively.
Commentary
The
#ifdef
and
#ifndef
forms are rather like the unary
++
and
--
operators in that they provide a short
hand notation for commonly used functionality.
Coding Guidelines
The
#ifdef
forms are the most common form of conditional inclusion directive. Measurements (see
Table 1872.1) also show that nearly a third of the uses of the
defined
operator could be replaced by one of
these forms. There are advantages (e.g., most common form suggests most practiced form for readers, and
ease of visual scanning down the left edge of the source) and disadvantages (e.g., requires more effort to
add additional conditions to the single test being made) to using the
#ifdef

forms, instead of the
defined
operator. However, there does not appear to be a worthwhile cost/benefit to recommending one of the
possibilities.
1886
142) Thus on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant 0x8000
is signed and positive within a #if expression even though it is unsigned in translation phase 7.
Commentary
The wording was changed by the response to DR #265.
1887
143) Thus, the constant expression in the following
#if
directive and
if
statement is not guaranteed to
footnote
143
evaluate to the same value in these two contexts.
#if ’z’ - ’a’ == 25
if (’z’ - ’a’ == 25)
Commentary
This situation could occur, for instance, if the Ascii representation were used during the preprocessing phases
and EBCDIC were used during translation phase 5.
transla-
tion phase
5
133
1888
Each directive’s condition is checked in order.
v 1.2 June 24, 2009

6.10.1 Conditional inclusion
1890
Commentary
The order is from the lowest line number to the highest line number.
Coding Guidelines
It may be possible to obtain some translation time performance advantage (at least for the original developer)
by appropriately ordering the directives. Unlike developer behavior with
if
statements, developers do not
1739 selection
statement
syntax
usually aim to optimize speed of translation when deciding how to order conditional inclusion directives
(experience suggests that developers often simply append new directive to the end of any existing directives).
Recognizing a known pattern in a sequence of directives has several benefits for readers. They can make
use of any previous deductions they have made on how to interpret the directives and what they represent,
and the usage highlights common dependencies in the source. In the following code fragment more reader
effort is required to spot similarities in the sequence that directives are checked than if both sequences of
directives had occurred in the same order.
1 #ifdef MACHINE_A
2 /
*
...
*
/
3 #else
4 #ifdef MACHINE_B
5 /
*
...

*
/
6 #endif
7 #endif
8
9 #ifdef MACHINE_B
10 /
*
...
*
/
11 #else
12 #ifdef MACHINE_A
13 /
*
...
*
/
14 #endif
15 #endif
Given the lack of attention from developers on the relative ordering of directives and the benefits of using
the same ordering, where possible, a guideline recommendation appears worthwhile. However, a guideline
recommendation needs to be automatically enforceable and determining when two sequences of directives
0 guideline rec-
ommendation
enforceable
have the same affect, during translation, may be infeasible because information that is not contained within
the source may be required (e.g., dependencies between macro names that are likely to be defined via
translator command line options).
Rev

1888.1
Where possible the visual order of evaluation of expressions within different sequences of nested
conditional inclusion directives shall be the same.
1889
If it evaluates to false (zero), the group that it controls is skipped: directives are processed only through the
name that determines the directive in order to keep track of the level of nested conditionals;
Commentary
A parallel can be drawn with the behavior of
if
statements, in that if their controlling expression evaluates to
1744 if statement
operand compare
against 0
zero, during program execution, any statements in the associated block are skipped.
1890
directives are processed only through the name that determines the directive in order to keep track of the level
directive
processing
while skipping
of nested conditionals;
Commentary
The preprocessor operates on a representation of the source written by the developer, not translated machine
code. As such it needs to perform some processing on its input to be able to deduce when to stop skipping.
June 24, 2009 v 1.2
6.10.1 Conditional inclusion
1891
Physical lines skipped
Toplev elfiles
1
10

100
1,000
50 100 150
× #if part

#else part
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×××
×

×
××
××
×
×
×
×
×
×
×
×××
×××
×
×
×
×××
×××
×
×
××
×
×××
×
×
×
×
××
××××
×
×××

×
×
××
×
×××××××× ×
×








••









••



••








•••••• • •••

•• • • • •
Physical lines skipped
Translation units
50 100 150
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

××
×
×
×
××
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
××
××
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
××
×
×
×
×××
××











••













••


•••


• ••






• •
Figure 1889.1:

Number of top-level source files (i.e., the contents of any included files are not counted) and (right) complete
translation units (including the contents of any files
#include
d more than once) having a given number of lines skipped during
translation of this book’s benchmark programs.
Directives need to be processed to keep track of the level of nesting of conditionals and translation phases
1–3 still need to be performed (line splicing could affect what is or is not the start of a line) and characters
transla-
tion phase
1
116
within a comment must not be treated as directives.
The intent of only requiring a minimum of directive processing, while skipping, is to enable partially
written source code to be skipped and to allow preprocessors to optimize their performance in this special
case, speeding up the rate at which the input is processed.
Example
1 #if 1
2 extern int ei;
3
4 #elif " an unmatched quote character, undefined behavior
5
6 extern int foo_bar;
7 #endif
8
9 #if 0
10 printf("\
11 #endif \n");
12
13 #endif
14

15 #if 0
16 /
*
17 #endif
18
*
/
19 #endif
1891
the rest of the directives’ preprocessing tokens are ignored, as are the other preprocessing tokens in the
group.
Commentary
There is no requirement that any directive be properly formed, according to the preprocessor syntax. However,
preprocessor
directives
syntax
1854
preprocessing tokens still need to be created, before they are ignored (as part of translation phase 3).
transla-
tion phase
3
124
v 1.2 June 24, 2009
6.10.2 Source file inclusion
1896
Example
In the following the
#define
directive is not well formed. But because this group is being skipped the
translator is required to ignore this fact.

1 #if 0
2 #define M(e
3 #endif
1892
Only the first group whose control condition evaluates to true (nonzero) is processed.
Commentary
This group is processed exactly as-if it appeared in the source outside of any group.
1893
If none of the conditions evaluates to true, and there is a
#else
directive, the group controlled by the
#else
is
processed;
Commentary
A semantic rule to associate
#else
with the lexically nearest preceding
#if
(or similar form) directive, like
the one given for
if
statements, is not needed because conditional inclusion is terminated by a
#endif
1747 else
binds to near-
est if
directive.
Like the matching
#if

(or similar form) directive case, all preprocessing tokens in the group are treated as
if they appeared outside of any conditional inclusion directive. Processing continues until the first
#endif
is
encountered (which must match the opening directive).
Coding Guidelines
The arguments made for
if
statements always containing an
else
arm might be thought to also apply to
1745 else
conditional inclusion. However, the presence of a matching
#endif
directive reduces the likelihood that
readers will confuse which preprocessing directive any
#else
associates with (although other issues, such
as lack of indentation or a large number of source lines between directives can make it difficult to visually
associate matching directives).
1894
lacking a #else directive, all the groups until the #endif are skipped.
144)
Commentary
The affect of this specification mimics the behavior of if statements.
1747 else
binds to near-
est if
1895
Forward references:

macro replacement (6.10.3), source file inclusion (6.10.2), largest integer types
(7.18.1.5).
6.10.2 Source file inclusion
Constraints
1896
A #include directive shall identify a header or source file that can be processed by the implementation. source file
inclusion
Commentary
There is no requirement that a header be represented using a source file. It could be represented using prebuilt
2018 footnote
153
information within the translator that is enabled only when the appropriate
#include
directive is encountered
during preprocessing (but not in a group that is skipped). Also there is no requirement that the spelling of
the header in the C source file be represented by a source file of the same spelling. The C Standard has no
explicit knowledge of file systems and is silent on the issue of directory structures. Minimum required limits
on the implementation processing of a header name are specified elsewhere.
1909 #include
mapping to host
file
Failure to locate a header or source file that can be processed by the implementation (e.g., a file of the
specified name does not exist, at least along the places searched) is a constraint violation.
June 24, 2009 v 1.2
6.10.2 Source file inclusion
1896
Other Languages
Most languages do not specify a
#include
mechanism, although many of their implementations provide

one. The approach commonly used by C implementations is popular, but not universal. Some languages
explicitly state that a
#include
directive denotes a file of the given name in the translators host environment.
Common Implementations
For most implementations the header name maps to a file name of the same spelling. It is quite common
for the translation environment to ignore the case of alphabetic letters (e.g., MS-DOS and early versions of
Microsoft Windows), or to limit the number of significant characters in the file name denoted by a header
name (the remaining characters being ignored). Use of the
/
character in specifying a full path to a file is
sufficiently common usage that even host environments where this character is not normally associated with
a directory separator support such usage in header names (many Microsoft windows translators support this
character, as well as the \ character, as a directory separator).
In the majority of implementations
#include
directives specify files containing source in text form.
source file
representation
121
However, some implementations support what are known as precompiled headers.
header
precompiled
121
It is not uncommon (over 10% of
#include
s in Figure 1896.1) for the same header to be
#include
d
more than once when translating a source file (it is a requirement that implementations support this usage for

standard headers). The following are some of the techniques implementations use to reduce the overhead of
subsequent #includes.

A common convention is to bracket the contents of a header, starting with the preprocessing token
sequence
#ifndef _ _H_file_name_ _
/
#define _ _H_file_name_ _
and ending with
#endif
. The
processing of subsequent
#include
s of the same header is then reduced to the minimal processing
needed to skip to the matching
#endif
. Some implementations (e.g.,
gcc
) go one step further and
detect headers that contain such bracketing the first time they are processed, and completely skips
opening and processing the header if it is subsequently encountered again in a #include directive.

Support the preprocessing directive
#import
.
[359]
This directive is equivalent to the
#include
directive
except that if the specified header has already been included it is not included again.

Coding Guidelines
Some coding guideline documents recommend that implementation supplied headers appear before developer
written headers, in a source file. Such recommendations overlook the possibility that a developer written
header might itself #include an implementation header.
Times #included
Number of #includes
1 5 10
1
10
100
1,000
10,000
100,000
×
× All #includes

∆ User #includes


Nested user #includes


×


×


×



×


×


×


×


×

×
×
∆×
Figure 1896.1:
Number of times the same header was
#include
d during the translation of a single translation unit. The crosses
denote all headers (i.e., all systems headers are counted), triangles denote all headers delimited by quotes (i.e., likely to be user
defined headers) and bullets denote all quote delimited headers
#include
nested at least three levels deep. Based on the translated
form of this book’s benchmark programs.
v 1.2 June 24, 2009
6.10.2 Source file inclusion
1897

Unnecessary headers #include’d
Translation units
0 5 10 15 20
1
10
100
1,000
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
× ×
×
Figure 1896.2:
Number of preprocessing translation units (excluding system headers) containing a given number of
#include
s

whose contents are not referenced during translation (excludes the case where the same header is
#include
d more than once, see
Figure 1896.1). Based on the translated form of this book’s benchmark programs.
#includes
Source files
0 10 20 30 40 50 60
1
10
100
1,000
<header>
"header"
Figure 1896.3:
Number of
.c
source files containing a given number of
#include
directives (dashed lines represent number of
unique headers). Based on the visible form of the .c files.
Experience suggests that once a
#include
directive appears in a source file it is rarely removed (see
Figure 1896.2) and that new
#include
directives are simply added after the last one. The issue of redundant
code is discussed elsewhere.
190 redundant
code
There does not appear to be a worthwhile benefit in ordering

#include
directives in any way (apart from
any relative ordering dictated by dependencies between headers).
Table 1896.1:
Occurrence of two forms of
header-name
s (as a percentage of all
#include
directives), the percentage of each
kind that specifies a path to the header file, and number of absolute paths specified. Based on the visible form of the .c files.
Header Form % Occurrence % Uses Path Number Absolute Paths
<h-char-sequence> 75.0 86.4 0
"q-char-sequence" 25.0 17.2 0
Semantics
1897
A preprocessing directive of the form #include
h-char-sequence
# include <h-char-sequence> new-line
June 24, 2009 v 1.2
6.10.2 Source file inclusion
1897
Rank
Occurrences of header name
1
10
100
1,000
1 10 100 1000
× <header>×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×××
××
×
×
×
×
×
×
×
×
××
×
×
×
×
×

×
×
×
×
××
×
×
××
×
×
××
×
×
×
××
×
××
××
×
×
×
×
×
×
××
×
×
×
×××
×

××
×
×
×
××
××
×
×
×××
×
×
×
×××
×
×
×××
×
××
×
××××
×
×××
×
××
×××
×
×××
×
××
×

×××
×××
××
×××
×
××××
××××
××
×××
×××××××
×××××
×××××
××××
×××
×××××
××
×××
××××××
×××
××××××
××××××××
×××
××××××
×××
×××××
×××
×××××××××
×××××××
×××××××××××
×××××××××

××××××××××
×××××××××××××
××××××××
××××××××
××××××
×××××××××
×××××××××××
×××××××××××××××××
×××××××××××××××××××××
××××××××××××××××××
×××××××××××××××××
××××××××××××××××××××
×××××××××××××××××××××××
××××××××××××××××××××××××
×××××××××××××××××××××××××××
×××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××

"header"














••



••
•••










•••

••
•••
••

••
•••
••



••

••
•••
•••
••

••
•••••
•••
•••
••
••

•••••
••
••
••••
•••
•••••••

•••
•••
••••••
•••••••
••••
••••
••••••••
•••••••
••••••••
••••••••
•••••••••••
••••••••••••••
•••••••••
•••••••••••••••••••
•••••••••••••••••••••
•••••••••••••••••••••••
•••••••••••••••
••••••••••••••••••••••
•••••••••••••••••••
••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 1896.4: header-name
rank (based on character sequences appearing in
#include
directives) plotted against the number
of occurrences of each character sequence. Also see Figure 792.26. Fitting a power law using MLE for
<header-name>
and
"header-name"
gives respective an exponent of -2.26,
x
min
= 8
, and -1.8,
x
min
= 9
. Based on the visible form of the
.c
files.
searches a sequence of implementation-defined places for a header identified uniquely by the specified
sequence between the
<
and
>
delimiters, and causes the replacement of that directive by the entire contents
of the header.
Commentary
File systems invariably provide a unique method of identifying every file they contain (e.g., a full path

name). The base document recognized the disadvantages of requiring that the full path name be specified in
each
#include
directive and permitted a substring of it to be given. The implementation-defined places are
usually additional character sequences (e.g., directory names) added to the
h-char-sequence
in an attempt
header name
syntax
918
to create a full path name that refers to an existing file.
Rationale
The file search rules used for the filename in the
#include
directive were left as implementation-defined. The
Standard intends that the rules which are eventually provided by the implementor correspond as closely as
possible to the original K&R rules. The primary reason that explicit rules were not included in the Standard
is the infeasibility of describing a portable file system structure. It was considered unacceptable to include
UNIX-like directory rules due to significant differences between this structure and other popular commercial
file system structures.
Nested include files raise an issue of interpreting the file search rules. In UNIX C a #include directive found
within an included file entails a search for the named file relative to the file system directory that holds the
outer
#include
. Other implementations, including the earlier UNIX C described in K&R, always search relative
to the same current directory. The C89 Committee decided in principle in favor of K&R approach, but was
unable to provide explicit search rules as explained above.
Other Languages
Other languages (or an extension provided by their implementations) commonly use the double-quote
delimited form.

Common Implementations
The character sequence between the
<
and
>
delimiters is invariably treated as the name of a file, possibly in-
cluding a path. The ordering of the search sequence used for directives having the form
<h-char-sequence>
#include
mapping
to host file
1909
is often different from that used for the form
"q-char-sequence"
. For instance, in the
<h-char-sequence>
case the contents of
/usr/include
might be searched first, followed by the contents of the directory con-
taining the
.c
file, while in
"q-char-sequence"
case the contents of the directory containing the
.c
file
might be searched first, followed by other places.
v 1.2 June 24, 2009
6.10.2 Source file inclusion
1897

The environment in which a translator executes may also affect the sequence of places that are searched.
For instance, the affect of relative path names (e.g.,
../proj/abc.h
) on the identity of the current directory.
gcc
searches two directories,
/usr/include
and another directory that holds very machine specific files,
such as
stdarg.h
(e.g.,
/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/include
on your au-
thors computer).
gcc
supports the
#include_next
directive. This directive causes the search algorithm to
skip some of the initial implementation-defined places that would normally be searched. The initial places
that are skipped are those that were searched in locating the file containing the
#include_next
directive
(including the place where the search succeeded).
Tzerpos and Holt
[1416]
describe a well-formedness theory of header inclusion that enables unnecessary
#include directives to be deduced.
Coding Guidelines
The standard does not specify the order in which the implementation-defined places are searched. This is a
potential coding guideline issue because it is possible that a

h-char-sequence
will match in more than one
of the places (i.e., there is a file having the same name along several of the different possible search paths).
The behavior is thus dependent (i.e., it is assumed that the contents of the different headers will be different)
on the order in which the places are searched.
Experience suggests that the affect of a translator locating an
#include
d file different from the one
expected to be located by the developer has one of two consequences— (1) when the contents of the file
accessed is similar to the one intended (e.g., a different version of the intended file) the source file may be
successfully translated, and (2) when the contents of the file accessed has no connection with the intended
file the source is rarely successfully translated. The problem might therefore be considered to be one of
version management, rather than the choice of characters used in a
h-char-sequence
. There are a number
of reasons why a solution to this issue is to not use h-char-sequences at all, including the following:

For the
< >
delimited form, implementations usually look in a predefined location first (as described in
the Common implementation section above and in the following C sentence).
1898 #include
places to search
for
Ensuring that the names chosen by developers for the headers they create are different from those of
system headers is an almost impossible task. While it might be possible to enumerate the set of names
of existing file names of system headers contained in commercially important environments, members
are likely to be added to this set on a regular basis.
Rather than trying to avoid using file names likely to match those of system headers, developers could
ensure that places containing system headers are searched last.


The
< >
delimited form is often considered to denote externally supplied headers (e.g., provided by
the implementation or translator environment vendor). What constitutes a system supplied header is
open to interpretation. One distinction that can be made between system and developer headers is that
developers do not control of the contents of system headers. Consequently, it can be argued that their
contents are not subject to coding guidelines.
Headers whose contents have been written by developers are subject to coding guidelines. The
convention generally adopted to indicate this status is to use the double-quote character delimit form
of #include.
Rev
1897.1
Developer written headers in a #include directive shall not be delimited by the < and > characters.
Developers sometimes specify full path names in headers (see Table 1896.1). This is a configuration
management issue and is not considered to be within the scope these coding guidelines.
June 24, 2009 v 1.2
6.10.2 Source file inclusion
1899
Table 1897.1:
Number of various kinds of identifiers declared in the headers contained in the
/usr/include
directory of some
translation environments. Information was automatically extracted and represents an approximate lower bound. Versions of the
translation environments from approximately the same year (mid 1990s) were used. The counts for ISO C assumes that the
minimum set of required identifiers are declared and excludes the type generic macros.
Information Linux 2.0 AIX on RS/6000 HP/UX 9 SunOS 4 Solaris 2 ISO C
Number of headers 2,006 1,514 1,264 987 1,495 24
macro definitions 10,252 18,637 13,314 11,987 10,903 446
identifiers with external linkage 1,672 1,542 1,935 616 1,281 487

identifiers with internal linkage 80 34 2012 0 5 0
tag declaration 716 1,088 899 1,208 945 3
typedef name declared 1,024 828 15 493 1,027 55
1898
How the places are specified or the header identified is implementation-defined.#include
places to search
for
Commentary
The differences between the environments in which translation occurs has narrowed over the years. However,
even although there may be much common practice, such are issues are considered to be outside the scope of
the C Standard.
program
transformation
mechanism
10
Common Implementations
Implementations invariably search one or more predefined locations first (e.g.,
/usr/include
), followed
by a list of alternative places. A number of techniques are used to allow developers to specify a list of
alternative places to be searched for files corresponding to the headers specified in a
#include
directive. For
instance, the alternative places may be specified via a translator command line option (e.g.,
-I
), in a translator
configuration file (e.g., gcc version 2.91.66 hosted on RedHat Linux reads many default locations from the
file
/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs
, although the path

/usr/include
is still hard coded in the translator sources), or an environment variable (e.g., several Microsoft windows
based translators use INCLUDE).
The directory separator used in Unix and MS-DOS slants in different directions. Many implementations,
in both environments, recognize both characters as directory delimiters. One consequence of this is that
escape sequences are not recognized as such (something that is unlikely to be a problem in header names).
The RISCOS environment does not support filenames ending in
.h
. The implementation-defined behavior
for this host is to look in a directory called h, for a file of the given name with the .h removed.
Coding Guidelines
The implementation-defined behavior associated with how the places are specified occurs outside of the
source code and is the remit of configuration management guidelines. For this reason nothing further is said
here.
1899
A preprocessing directive of the form#include
q-char-sequence
# include "q-char-sequence" new-line
causes the replacement of that directive by the entire contents of the source file identified by the specified
sequence between the " delimiters.
Commentary
The commonly accepted intent of this form of the
#include
directive is that it is used to reference source files
created by developers (i.e., headers that are not provided as part of the implementation or host environment).
The only syntactic difference between
q-char-sequence
and
h-char-sequence
is that neither sequence

may contain their respective delimiters.
header name
syntax
918
Most
q-char-sequence
s end with one of two character sequences (i.e.,
.c
or
.h
). The character
sequences before these suffixes is often called the header name.
v 1.2 June 24, 2009
6.10.2 Source file inclusion
1901
Other Languages
The use of double-quote as the delimiter is the almost universal form used in other languages (although some
use the ’ character because that is what is used to delimit string literals).
Coding Guidelines
The term commonly used to refer to these source files is header. The context of the conversation often being
used to distinguish any other intended usage. The intent is that the contents of these source files is controlled
by developers and as such they are subject to coding guidelines.
1900
The named source file is searched for in an implementation-defined manner.
Commentary
While this “implementation-defined manner” might be the same as that for the
< >
delimited form. The intent
is for it to be sufficiently different that developers do not need to be concerned about the name of a header
created by them matching one provided as part of the implementation (and therefore potentially found by the

translator when searching for a matching header). For instance, your author does not know the names of
most of the 304 files (e.g.,
compface.h
) contained in
/usr/include
on his software development computer.
The discussion on the < > delimited form is applicable here.
1897 #include
h-char-sequence
Common Implementations
The search algorithm used invariably differs from that used for the
< >
delimited form (otherwise there would
be little point in distinguishing the two cases). The search algorithm used by some implementations is to
first look in the directory containing the source file currently being translated (which may itself have been
included). If that search fails, and the current source file has itself been included, the directory containing the
source file that
#include
it is then searched. This process continuing back through any nested
#include
directives. For instance, in:
file_1.c
1 #include "abc.h"
file_2.c
1 #include "/foo/file_1.c"
file_3.c
1 #include "/another/path/file_2.c"
(assuming the translation environment supports the path names used), translating the source file
file_3.c
causes

file_2.c
to be included, which in turn includes
file_3.c
. The source file
abc.h
will be searched
for in the directories /foo, /another/path and then the directory containing file_3.c.
Some implementations use the double-quote delimited form within their system headers, to change the
default first location that is searched. For instance, a third-party API may contain the header
abc.h
, which
in turn needs to include
ayx.h
. Using the form
"ayx.h"
means that the implementation will search in the
directory containing
abc.h
first, not
/usr/include
. This usage can help localize the files that belong to
specific APIs. Other implementations use a search algorithm that starts with the directory containing the
original source file being translated.
If the source file is not found after these places have been searched, some implementations then search
other places specified via any translator options. Other implementations simply follow the behavior described
1898 #include
places to search
for
by the following C sentence (which has the consequence of eventually checking these other places).
1901

If this search is not supported, or if the search fails, the directive is reprocessed as if it read
# include <h-char-sequence> new-line
with the identical contained sequence (including > characters, if any) from the original directive.
June 24, 2009 v 1.2
6.10.2 Source file inclusion
1908
Commentary
The previous search can fail in the sense that it does not find a matching source file.
Some existing code uses the double-quote delimited form of
#include
directive to include headers
provided by the implementation (rather than the
< >
delimited form). This requirement ensures that such
code continues to be conforming.
1902
144) As indicated by the syntax, a preprocessing token shall not follow a
#else
or
#endif
directive before the
footnote
144
terminating new-line character.
Commentary
Saying in words what is specified in the syntax.
Common Implementations
Many early implementations (and some present days ones, for compatibility with existing source) treated any
sequence of characters following one of these directives as a comment, e.g., #endif X == 1.
1903

However, comments may appear anywhere in a source file, including within a preprocessing directive.
Commentary
A comment is replaced by a single space character prior to preprocessing.
comment
replaced by space
126
preprocess-
ing directive
ended by
1858
1904
A preprocessing directive of the form
# include pp-tokens new-line
(that does not match one of the two previous forms) is permitted.
Commentary
This form permits the
< >
or double-quote delimited forms to be generated via macro expansion. However, it
#include
example 2
1914
is rarely used (11 instances in over 60,000
#include
directives in the visible source of the
.c
files). Whether
this is because developers are unaware of its existence, or because it has little utility is not known.
1905
The preprocessing tokens after
include

in the directive are processed just as in normal text. (Each identifier
#include
macros expanded
currently defined as a macro name is replaced by its replacement list of preprocessing tokens.)
Commentary
To be exact, the preprocessing tokens after
include
in the directive up to the first new-line character are
processed just as in normal text.
1906
(Each identifier currently defined as a macro name is replaced by its replacement list of preprocessing tokens.)
Commentary
This C sentence provides explicitly clarification that macro replacement occurs in this case (the same
clarification is also given elsewhere).
#line
macros expanded
1991
1907
The directive resulting after all replacements shall match one of the two previous forms.
145)
Commentary
It is not a violation of syntax if the directive does not match one of the two previous forms, because the
syntax of this form has been matched. It is a violation of semantics and therefore the behavior is undefined.
1908
The method by which a sequence of preprocessing tokens between a
<
and a
>
preprocessing token pair or a
pair of " characters is combined into a single header name preprocessing token is implementation-defined.

v 1.2 June 24, 2009
6.10.2 Source file inclusion
1909
Commentary
This implementation-defined behavior may take a number of forms, including:

The
##
operator can be used to glue preprocessing tokens together. However, the behavior is undefined
1958 ##
operator
if the resulting character sequence is not a valid preprocessing token. For instance, the five preprocess-
1963 ##
if result not
valid
ing tokens {
{
} {
string
} {
.
} {
h
} {
}
} cannot be glued together to form a valid preprocessing token
without going through intermediate stages whose behavior is undefined.

Creating a preprocessing token, via macro expansion, having the double-quote delimited form (i.e., a
string preprocessing token) need not depend on any implementation-defined behavior. The stringize

operator can be used to create a string preprocessing token.
1950 #
operator

Other implementation-defined behaviors might include the handling of space characters. For instance,
in the following:
1 #define bra <
2 #define ket >
3 #include bra stdio.h ket
does the implementation strip off the space character at the ends of the delimited character sequence?
Coding Guidelines
Given the rarity of use of this form of #include no guideline recommendations are given here.
Example
1 #define mk_sys_hdr(name) < ## name ## >
2
3 #if BUG_FIX
4 #define VERSION 2a /
*
works because pp-numbers include alphabetics
*
/
5 #else
6 #define VERSION 2
7 #endif
8
9 #define add_quotes(a) # a
10 #define mk_str(str, ver) add_quotes(str ## ver)
11
12 #include mk_str(Version, VERSION)
1909

The implementation shall provide unique mappings for sequences consisting of one or more letters or digits
#include
mapping
to host file
(as defined in 5.2.1) nondigits or digits (6.4.2.1) followed by a period (.) and a single letter nondigit.
Commentary
This C sentence and the following ones in this C paragraph are a specification of the minimum set of
requirements that an implementation must meet. For sequences outside of this set the implementation mapping
may be non-unique (like, for instance, the Microsoft Windows technique of mapping files ending in
.html
to
.htm
). The handling of character sequences that resemble UCNs may also differ, e.g.,
"\ubada\file.txt"
(Ubada is a city in Tanzania and BADA is the Hangul symbol
붚
in ISO 10646). The standard does not
permit any number of period characters because many operating systems do not permit them (at least one,
RISCOS, does not permit any).
The wording was changed by the response to DR #302 to extend the specification to be more consistent
with C
++
.
C
++
16.2p5
June 24, 2009 v 1.2
6.10.2 Source file inclusion
1911
The implementation provides unique mappings for sequences consisting of one or more nondigits (2.10) followed

by a period (.) and a single nondigit.
Other Languages
Other languages either specified to operate within the same operating systems and file systems limitations as
C and as such have to deal with the same issues, or require an integrated development environment to be
created before they can be used.
Common Implementations
Implementations invariably pass the sequence of characters that appear between the delimiters (when
searching other places a directory path may be added) as an argument in a call to
fopen
or equivalent system
function. The called library function will eventually call some host operating system function that interfaces
to the host file system. The C translator’s behavior is thus controlled by the characteristics of the host file
system and how it maps character sequences to file names. The handling of the period character varies
between file systems, known behaviors include:
• Unix based file systems permit more than one period in a file name.
• MS-DOS based file systems only permit a single period in a file name.

RISCOS, an operating system for the Acorn ARM processor does not support filenames that contain
a period. For this host file names, that contained a period, specified in a
#include
directive were
mapped using a directory structure. All file names ending in the characters
.h
were searched for in a
directory called h.
Coding Guidelines
Because an implementation is not required to provide a unique mapping for all sequences it is possible that
an unintended header or source file will be accessed, or the translator will fail to identify a known header or
source file. The possible consequences of an unintended access are discussed elsewhere, while failure to
#include

h-char-sequence
1897
identify known header or source file will cause a diagnostic to be issued. The cost/benefit issues associated
source file
inclusion
1896
with using character sequences having a unique mapping in the different environments that the source may
be translated in is outside the scope of these coding guidelines.
1910
The first character shall be a letter not be a digit.
Commentary
This requirement only applies to the first character of the sequence that implementations are required to
provide a unique mapping for.
The wording was changed by the response to DR #302.
C90
The requirement that the first character not be a digit is new in C99. Given that it is more restrictive than that
required for existing C90 implementations (and thus existing code) it is unlikely that existing code will be
affected by this requirement.
C
++
This requirement is new in C99 and is not specified in the C
++
Standard (the argument given in the C90
subsection (above) also applies to C
++
).
Common Implementations
Most implementations support a first character that is not a letter.
1911
The implementation may ignore the distinctions of alphabetical case and restrict the mapping to eight significant

header name
significant charac-
ters
characters before the period.
v 1.2 June 24, 2009
6.10.2 Source file inclusion
1914
Commentary
These permissions reflect known characteristics of file systems in which translators are executed.
C90
The limit specified by the C90 Standard was six significant characters. However, implementations invariably
used the number of significant characters available in the host file system (i.e., they do not artificially limit the
number of significant characters). It is unlikely that a header of source file will fail to be identified because
of a difference in what used to be a non-significant character.
C
++
The C
++
Standard does not give implementations any permissions to restrict the number of significant
characters before the period (16.1p5). However, the limits of the file system used during translation are likely
to be the same for both C and C
++
implementations and consequently no difference is listed here.
Common Implementations
All file systems place some limits on the number of characters in a source file name— for instance:

Most versions of the Microsoft DOS environment ignore the distinction of alphabetic case and restrict
the mapping to eight significant characters before any period (and a maximum of three after it).

POSIX requires that at least 14 characters be significant in a file name (it also requires implementations

to support at least 255 characters in a pathname). Many Linux file systems support up to 255 characters
in a filename and 4095 characters in a pathname.
Coding Guidelines
The potential problems associated with limits on sequences characters that are likely to be treated as unique
is a configuration management issue that is outside the scope of these coding guidelines.
1912
A
#include
preprocessing directive may appear in a source file that has been read because of a
#include
directive in another file, up to an implementation-defined nesting limit (see 5.2.4.1).
Commentary
Thus
#include
directives can be nested within source files whose contents have themselves been
#include
d.
This issue is discussed elsewhere. While this permission only applies to source files, an implementation
295 limit
#include nest-
ing
using some form of precompiled headers (which are not source files within the standard’s definition of the
121 header
precompiled
term) that did not support this functionality would not be popular with developers.
108 source files
1913
EXAMPLE 1 The most common uses of #include preprocessing directives are as in the following:
#include <stdio.h>
#include "myprog.h"

Other Languages
Some languages only have a single form of #include directive for all headers.
1914
EXAMPLE 2 This illustrates macro-replaced #include directives: #include
example 2
#if VERSION == 1
#define INCFILE "vers1.h"
#elif VERSION == 2
#define INCFILE "vers2.h" // and so on
#else
#define INCFILE "versN.h"
#endif
#include INCFILE
June 24, 2009 v 1.2
6.10.3 Macro replacement
1919
Commentary
This example does not illustrate any benefit compared to that obtained from placing separate
#include
directives in each arm of the conditional inclusion directive.
1915
Forward references: macro replacement (6.10.3).
1916
145) Note that adjacent string literals are not concatenated into a single string literal (see the translation
footnote
145
phases in 5.1.1.2);
Commentary
String concatenation occurs in translation phase 6 and so it is not possible to join together two existing strings
transla-

tion phase
6
135
to form another string within a #include directive.
1917
thus, an expansion that results in two string literals is an invalid directive.
Commentary
It is an invalid directive in that it violates a semantic requirement and thus the behavior is undefined. It is not
a syntax violation.
6.10.3 Macro replacement
Constraintsmacro replace-
ment
1918
Two replacement lists are identical if and only if the preprocessing tokens in both have the same number,
replacement list
identical if
ordering, spelling, and white-space separation, where all white-space separations are considered identical.
Commentary
This is actually a definition in a Constraints clause (it is used by two constraints in this C subsection).
The check against same spelling only needs to take into account the significant characters of an identifier.
internal
identifier
significant
characters
282
Considering all white-space separations to be identical removes the need for developers to be concerned about
use of different source layout (e.g., indentation) and method of spacing (e.g., space character vs. horizontal
tab).
Rationale
The specification of macro definition and replacement in the Standard was based on these principles:

• Interfere with existing code as little as possible.
• Keep the preprocessing model simple and uniform.
• Allow macros to be used wherever functions can be.

Define macro expansion such that it produces the same token sequence whether the macro calls
appear in open text, in macro arguments, or in macro definitions.
Preprocessing is specified in such a way that it can be implemented either as a separate text-to-text prepass
or as a token-oriented portion of the compiler itself. Thus, the preprocessing grammar is specified in terms of
tokens.
1919
An identifier currently defined as an object-like macro shall not be redefined by another
#define
preprocessing
object-like
macro redefini-
tion
directive unless the second definition is an object-like macro definition and the two replacement lists are
identical.
v 1.2 June 24, 2009
6.10.3 Macro replacement
1921
Commentary
There was an existing body of code, containing redefinitions of the same macro, when the C Standard
was first written. The C committee did not want to specify that existing code containing such usage was
non-conforming, but they did consider the case where the bodies of any subsequent definitions differed to be
an erroneous usage.
1983 EXAMPLE
macro redefinition
C90
The wording in the C90 Standard was modified by the response to DR #089.

Common Implementations
Some translators permit multiple definitions of a macro, independently of the contents of the contents of the
#define/#undef
stack
bodies. The behavior is for a new definition to cause the previous body to be pushed, in a stack-like fashion.
Any subsequent #undef of the macro name popping this stacked definition and to make it the current one.
Coding Guidelines
C permits more than one definition of the same macro name, with the same body, and more than one external
definition of the same object, with the same type and the coding guideline issues are the same for both (in
420 linkage
422.1 identifier
declared in one file
both cases translators are not always required to issue a diagnostic if the definitions are considered to be
different).
In both cases a technique for avoiding duplicate definitions, during translation but not in the visible source,
is to bracket definitions with
#ifndef MACRO_NAME
/
#endif
(in the case of the file scope object a macro
name needs to be created and associated with its declaration). Using this technique has the disadvantage that
it prevents the translator checking that any subsequent redeclarations of an identifier are the same (unless the
bracketing occurs around the only textual declaration that occurs in any source file used to build a program).
1920
Likewise, an identifier currently defined as a function-like macro shall not be redefined by another
#define function-like
macro redefinition
preprocessing directive unless the second definition is a function-like macro definition that has the same
number and spelling of parameters, and the two replacement lists are identical.
Commentary

The issues are the same as for object-like macros, with the addition of checks on the parameters. Requiring
1919 object-like
macro redefinition
that the parameters be spelled the same, rather than, for instance, that they have an identical effect, simplifies
the similarity checking of two macro bodies. For instance, in:
1 #define FM(foo) ((foo) + x)
2 #define FM(bar) ((bar) + x)
a translator is not required to deduce that the two definitions of FM are structurally identical.
1921
There shall be white-space between the identifier and the replacement list in the definition of an object-like
macro.
Commentary
In the following (assuming
$
is a member of the extended character set and permitted in an identifier
216 extended
character set
preprocessing token):
1 #define A$ x
an object-like macro with the name
A$
and the body
x
is defined, not macro with the name
A
and the body
$
x.
There is no requirement that there be white-space following the ) in a function-like macro definition.
C90

The response to DR #027 added the following requirements to the C90 Standard.
DR #027
June 24, 2009 v 1.2
6.10.3 Macro replacement
1922
Correction
Add to subclause 6.8, page 86 (Constraints):
In the definition of an object-like macro, if the first character of a replacement list is not a character required by
subclause 5.2.1, then there shall be white-space separation between the identifier and the replacement list.*
[Footnote *: This allows an implementation to choose to interpret the directive:
#define THIS$AND$THAT(a, b) ((a) + (b))
as defining a function-like macro
THIS$AND$THAT
, rather than an object-like macro
THIS
. Whichever choice it
makes, it must also issue a diagnostic.]
However, the complex interaction between this specification and UCNs was debated during the C9X review
process and it was decided to simplify the requirements to the current C99 form.
1 #define TEN.1 /
*
Define the macro TEN to have the body .1 in C90.
*
/
2 /
*
A constraint violation in C99.
*
/
C

++
The C
++
Standard specifies the same behavior as the C90 Standard.
Common Implementations
HP–was DEC– treats $ as part of the spelling of the macro name.
1922
If the identifier-list in the macro definition does not end with an ellipsis, the number of arguments (including
those arguments consisting of no preprocessing tokens) in an invocation of a function-like macro shall equal
the number of parameters in the macro definition.
Commentary
This requirement is the macro invocation equivalent of the one for function calls.
function call
arguments agree
with parameters
998
C90
If (before argument substitution) any argument consists of no preprocessing tokens, the behavior is undefined.
The behavior of the following was discussed in DR #003q3, DR #153, and raised against C99 in DR #259
(no committee response was felt necessary).
1 #define foo() A
2 #define bar(B) B
3
4 foo() // no arguments
5 bar() // one empty argument?
What was undefined behavior in C90 (an empty argument) is now explicitly supported in C99. The two most
likely C90 translator undefined behaviors are either to support them (existing source developed using such a
translator will may contain empty arguments in a macro invocation), or to issue a diagnostic (existing source
developed using such a translator will not contain any empty arguments in a macro invocation).
C

++
The C
++
Standard contains the same wording as the C90 Standard.
C
++
translators are not required to correctly process source containing macro invocations having any empty
arguments.
v 1.2 June 24, 2009
6.10.3 Macro replacement
1925
Common Implementations
Some C90 implementations (e.g.,
gcc
) treated empty arguments as an argument containing no preprocessing
tokens, while others (e.g., Microsoft C) treated an empty argument as being a missing argument (i.e., a
constraint violation).
1923
Otherwise, there shall be more arguments in the invocation than there are parameters in the macro definition
... arguments
macro
(excluding the ...).
Commentary
Rationale
There must be at least one argument to match the ellipsis. This requirement avoids the problems that occur
when the trailing arguments are included in a list of arguments to another macro or function. For example, if
dprintf had been defined as
#define dprintf(format,...) \
dfprintf(stderr, format, __VA_ARGS__)
and it were allowed for there to be only one argument, then there would be a trailing comma in the expanded

form. While some implementations have used various notations or conventions to work around this problem,
the Committee felt it better to avoid the problem altogether.
C90
Support for the form ... is new in C99.
C
++
Support for the form ... is new in C99 and is not specified in the C
++
Standard.
Common Implementations
gcc allowed zero arguments to match a macro parameter defined using the ... form.
Coding Guidelines
While some developers may be confused because the requirements on the number of arguments are different
from functions defined using the ellipsis notation, passing too few arguments is a constraint violation (i.e.,
translators are required to issue a diagnostic that a developer then needs to correct).
1924
There shall exist a ) preprocessing token that terminates the invocation. macro invocation
) terminates it
Commentary
While this requirement is specified in the syntax, it is interpreted as requiring the
)
preprocessing token to
occur before any macro replacement of the identifiers following the matching
(
preprocessing token. For
instance, in:
1 #define R_PAREN )
2
3 #define FUNC(a) a
4

5 static int glob = (1 + FUNC(1 R_PAREN );
the invocation is terminated by the
)
preprocessing token that occurs immediately before
;
, not the expanded
form of R_PAREN.
1925
The identifier
_ _VA_ARGS_ _
shall occur only in the replacement-list of a function-like macro that uses the
ellipsis notation in the argumentsparameters.
June 24, 2009 v 1.2
6.10.3 Macro replacement
1928
Commentary
This requirement simplifies a translators processing of occurrences of the identifier _ _VA_ARGS_ _.
This typographical correction was made by the response to DR #234.
C90
Support for _ _VA_ARGS_ _ is new in C99.
Source code declaring an identifier with the spelling
_ _VA_ARGS_ _
will cause a C99 translator to issue a
diagnostic (the behavior was undefined in C90).
C
++
Support for _ _VA_ARGS_ _ is new in C99 and is not specified in the C
++
Standard.
Common Implementations

gcc
required developers to give a name to the parameter that accepted a variable number of arguments. This
parameter name appeared in the replacement list wherever the variable number of arguments were to be
substituted.
Example
1 /
*
2
*
The following are constraint violations.
3
*
/
4 #define __VA_ARGS__
5 #define jparks __VA_ARGS__
6 #define jparks(__VA_ARGS__)
7 #define jparks(__VA_ARGS__, ...) __VA_ARGS__
8
9 #define jparks(x) x
10 jparks(__VA_ARGS__)
11
12 #define jparks(x, ...) x
13 jparks(__VA_ARGS__,1)
14 /
*
15
*
The following break the spirit, if not the wording
16
*

of this constraint.
17
*
/
18 #define jparks(x, y) x##y
19 jparks(__VA, _ARGS__)
20
21 #define jparks(x, y, ...) x##y
22 jparks(__VA, _ARGS__, 1)
1926
A parameter identifier in a function-like macro shall be uniquely declared within its scope.macro parameter
unique in scope
Commentary
This constraint is the macro equivalent of the one given for objects with no linkage. Its scope is the list
declaration
only one if
no linkage
1350
of parameters in the macro definition and the body of that definition. This scope ends at the new-line that
terminates the directive. Macro parameters are also discussed elsewhere.
macro pa-
rameter
scope extends
1934
identifier
macro parameter
396
Semantics
1927
The identifier immediately following the define is called the macro name.macro name

identifier
Commentary
This defines the term macro name. This term is generically used in software engineering to refer to this kind
of entity.
v 1.2 June 24, 2009
6.10.3 Macro replacement
1931
1928
There is one name space for macro names. macro
one name space
Commentary
Object-like and function-like macro names exist in the same name space. However, an identifier defined as
a function-like macro is only treated as such when its name is followed by an opening parenthesis. Name
1935 function-
like macro
followed by (
spaces are also discussed elsewhere.
438 name space
1929
Any white-space characters preceding or following the replacement list of preprocessing tokens are not
white-space
before/after re-
placement list
considered part of the replacement list for either form of macro.
Commentary
Specifying that such white-space should be considered to part of the replacement list has potential main-
tenance and comprehension costs (it restricts how the start of the replacement list may be indented and
white-space following the replacement list is not immediately visible to readers) for no obvious benefit.
Example
In the following the string literal "_ _" is assigned to p.

1 #define str_ize(a) #a
2 #define M _ _
3
4 char
*
p = str_ize(M);
1930
If a
#
preprocessing token, followed by an identifier, occurs lexically at the point at which a preprocessing
directive could begin, the identifier is not subject to macro replacement.
Commentary
This is a special case of a more general specification given elsewhere.
1867 tokens in
directive
not expanded
unless
Common Implementations
Some preprocessors used to perform this kind of replacement (some past entries in the Obfuscated C
contest
[642]
relied on such translator behavior).
Example
In the following, even although the identifier
define
is defined as a macro, the line starting
#define
still
processed as a macro definition directive, and not as a #undef directive.
1 #define define undef

2
3 #define X Y
1931
A preprocessing directive of the form macro
object-like
# define identifier replacement-list new-line
defines an object-like macro that causes each subsequent instance of the macro name
146)
to be replaced by
the replacement list of preprocessing tokens that constitute the remainder of the directive.
June 24, 2009 v 1.2
6.10.3 Macro replacement
1931
Commentary
This defines the term object-like macro. This term is not commonly used by developers, who tend to use the
generic term macro for all macro definitions and when a distinction needs to be made use the term function
macro (rather than the technically correct term function-like macro) to refer to the case of a macro defined to
have parameters. A macro’s replacement list is commonly known as a macro body, or simply its body.
The preprocessing tokens in a
text-line
are unconditional scanned for instances of macro names to
preprocessor
directives
syntax
1854
expand, as are preprocessing tokens in some preprocessing directives.
The standard lists a few restrictions on identifiers that can be defined as macro names. The issue of
predefined
macros
not #defined

2026
implementation limits on the number of macros that may be defined in one preprocessing translation unit is
discussed elsewhere.
limit
macro definitions
287
Other Languages
Some languages use def rather than define.
Common Implementations
Some preprocessors have a maximum limit on the number of characters that can occur in a replacement
list (e.g., an early version of Microsoft C
[947]
had a 512 byte limit; a limit of 4096 is still seen in some
preprocessors).
Implementations invariably provide a mechanism that is external to the source code for defining macros,
e.g., the -D command line option.
Coding Guidelines
Macros can be defined to serve a variety of purposes (see Table 1931.1 for measurements of actual usage)
including:

Giving a symbolic name to a constant or expression. The issue of symbolic names is discussed
elsewhere, as are the advantages of using enumeration types for related identifiers.
symbolic
name
822
enumeration
set of named
constants
517


Representing an expression without the overhead of a function call. Having made the decision to
represent an expression with a symbolic name the decision on whether to use a function call or macro
then needs to be made, the human decision making factors involved are discussed elsewhere.
agenda
effects
decision making
0

Parameterized code duplication. This kind of usage occurs because a function definition does not
provide the necessary flexibility (for instance, the parameterization may involve constructs other than
expressions).
• Parameterizing the definition of a type. This issue is discussed in more detail under typedef names.
typedef name
syntax
1629
• Controlling conditional inclusion. In this case their status as a macro definition is used as a flag.
boolean role 476
Some coding guideline documents recommend against what are sometimes known as syntax changing macro
names. This terminology comes from the fact that uses of such macro names change the syntax, at least
visually, of C. For instance, a developer familiar with Pascal might define the macro names
begin
and
end
to represent the C punctuators
{
and
}
respectively (this existing usage was one reason these macro
names were not used as alternative spellings, in
<iso646.h>

, for these punctuators; it could have rendered
existing conforming code nonconforming), or a developer wanting to modify existing code to use greater
floating-point precision might define the macro name float to be double.
The growth in the usage languages with C-like syntax over the last 10 years means that these days it is
rare for developers to attempt to change the visual appearance of C source to be closer to a language they are
more familiar with. While a macro name that maps to a C token may be surprising to readers of the source, it
is unlikely to conflict with their existing C knowledge, and therefore might be considered at worse a minor
inconvenience (i.e., cost).
Defining a macro whose name is the same as a keyword means that the behavior of translated source can
be very different from that expected from its visual appearance (such usage also results in undefined behavior
v 1.2 June 24, 2009
6.10.3 Macro replacement
1931
if the definition occurs prior to the inclusion of any library header). The presence of such a definition requires
that readers substitute their existing, default response, knowledge of behavior for a new behavior (assuming
that they had noticed the definition of the macro). Experience suggests that the short-term benefit of defining
and using such macro names is less than the longer term (which may be only a few days) costs associated
with comprehension and miscomprehension of the affected source.
Cg
1931.1
A source file shall not define a macro name to have the spelling of a keyword.
Replacement lists may look innocuous enough when viewed in isolation. However, in the context in which
they occur the expanded form may interact in unexpected ways with adjacent tokens. For instance, looking at
the components of the following source in isolation:
1 #define SUM a + b
2
3 extern int glob;
4
5 void f(void)
6 {

7 int loc = glob
*
SUM;
8 }
the appearance of the replacement list of
SUM
suggests that
a
will be added to
b
and looking at the use of
SUM
in the initialization of
loc
suggests that it will be multiplied by the value of
glob
. However, the token
sequence after macro replacement is glob
*
a+b, which has a very different interpretation.
The visual appearance of a replacement list containing statements can also be misleading. For instance, in:
1 #define INIT c=0; d=0;
2
3 extern int glob;
4
5 void f(void)
6 {
7 if (glob == 0)
8 INIT;
9 }

the assignment to
d
is not dependent on the value of
glob
. Which is counter to what the visual appearance of
the source suggests.
A general solution to both of these problems is to bracket the replacement list, ensuring that the visually
expected behavior is the same as the behavior that occurs after macro replacement.
Cg
1931.2
A replacement list having the form of an expression containing one or more binary operators shall be
bracketed with parentheses, unless the binary operators are only those included in the production of a
postfix-expr.
Cg
1931.3
A replacement list consisting of more than one statement shall be completely enclosed in a pair of
braces (which make take the form of a do statement).
The visual appearance of declarations can also be deceptive when macro replacements are involved. For
instance, in:
June 24, 2009 v 1.2
6.10.3 Macro replacement
1931
1 #define INFO_PTR int
*
2
3 INFO_PTR glob_1,
4 glob_2;
glob_1 is declared to have a pointer type, while glob_2 is declared to have an integer type.
The bracketing technique cannot be used with a replacement list that represents a type (it would violate
C syntax). However, using a typedef name is not a general solution, it is possible to use macro names in

situations where a typedef name cannot be used. For instance, in:
1 #define X_TYPE int
2
3 unsigned X_TYPE glob;
it is possible to modify the type denoted by
X_TYPE
because the macro expanded form represents a valid
integer type when preceded by
unsigned
. However, the type denoted by a typedef name cannot be so
modified.
type spec-
ifiers
possible sets of
1382
Cg
1931.4
A replacement list shall not consist of a sequence of preprocessing tokens that has, after expansion,
the syntax of a pointer type.
The replacement list of a macro definition has to appear on a single logical source line. Experience suggests
logical
source line
118
that constructs that appear on separate lines in other contexts often appear on the same line within a
replacement list. The developer cost (typing the characters) of using splicing, to give the replacement list
physical
source line
118
a visible form that closely resembles that seen when it appears in other contexts is small. The benefit for
subsequent readers is the ability to use the same strategies to read source constructs as they use in other

contexts.
There are a number of ways in which token sequences appearing in various contexts might visually
resemble each other. For instance, in the following definitions both
ZERO_ARRAY_1
and
ZERO_ARRAY_2
visually associate preprocessing tokens in the macro body, while in
ZERO_ARRAY_3
preprocessing tokens in
the macro body visual interacts with the preprocessing tokens in the preprocessing directive.
1 #define ZERO_ARRAY_1(a, n) for (int i = 0; i < (n); i++) \
2 (a)[i]=0;
3 #define ZERO_ARRAY_2(a, n) \
4 for (int i = 0; i < (n); i++) \
5 (a)[i]=0;
6 #define ZERO_ARRAY_3(a, n) for (int i = 0; i < (n); i++) \
7 (a)[i]=0;
The following guideline recommendation leaves the decision on what constitutes the same visual layout to
developers.
Rev
1931.5
Token sequences shall have the same visual layout in the replacement list of a macro definition as they
do in other contexts.
Usage
Usage information on the number of macro names defined in source files is given elsewhere.
limit
macro definitions
287
v 1.2 June 24, 2009
6.10.3 Macro replacement

1933
Macro names expanded
Translation units
100 200 400
1
10
100
1,000
×
× all macro expansions
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
• •
function-macro expansions









••








••






















Figure 1931.1:
Number of translation units containing a given number of macro names which were macro expanded, excluding
expansions that occurred while processing the contents of system headers. Based on the translated form of this book’s benchmark
programs.
Table 1931.1:
Detailed breakdown of the kinds of replacement lists occurring in macro definitions. Adapted from Ernst, Badros,
and Notkin.
[404]
Replacement List % Example
constant 42 #define ARG_MAX 1000
expression 33 #define SHFT_UP(x) ((x) << 8)
empty 6.9 #define DUMMY
unknown identifier 6.9 #define INTERN_BUF buffer
statement 5.1 #define TERMINATE goto func_end
type 2.1 #define NODE_PTR void
*
other 1.9 #define OPTION -X=23
symbol 1.4 #define ALLOC_STORAGE malloc
syntactic 0.5 #define begin {
Table 1931.2:
Common macro definitions listed with an abstracted form of their replacement list (as a percentage of all macro
definitions). Note that function-call may also be a macro invocation. Based on the visible form of the .c and .h files.
Kind of Macro Defined and Abstract Form of its Replacement List %
object-like macro integer-constant 50.7
object-like macro identifier 5.9
object-like macro expression 5.8
function-like macro function-call 4.7
object-like macro function-call 3.7
object-like macro string-literal 3.4

function-like macro expression 3.4
object-like macro 3.4
object-like macro constant-expression 2.0
function-like macro 1.7
others 15.4
1932
The replacement list is then rescanned for more macro names as specified below.
Commentary
This sentence was added by the response to DR #306 and removes the possibility of a reader interpreting the
rescanning clause as only applying to function-like macros.
1968 rescanning
1933 macro
function-like
1933
A preprocessing directive of the form macro
function-like
June 24, 2009 v 1.2

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×