The New C Standard- P16

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (804.08 KB, 112 trang )

6.10.1 Conditional inclusion
1883
• The speciﬁcation has changed between C90 and C99.
The problem with any guideline recommendation is that the total cost is likely to be greater than the total
beneﬁt (a cost is likely to be incurred in many cases and a beneﬁt obtained in very few cases). For this reason
no recommendation is made here. The discussion on sufﬁxed integer constants is also applicable in the
835 integer
constant
type ﬁrst in list
context of a conditional inclusion directive.
Example
In the following the developer may assume that unwanted higher bits in the value of
C
will be truncated when
shifted left.
1 #define C 0x1100u
2 #define INT_BITS 32
3
4 #define TOP_BYTE (C << (INT_BITS-8))
5
6 #if TOP_BYTE == 0
7 /
*
...
*
/
8 #endif
9
10 void f(void)
11 {
12 if (TOP_BYTE == 0)

13 /
*
...
*
/ ;
14 }
1881
This includes interpreting character constants, which may involve converting escape sequences into execution
#if
escape se-
quences
character set members.
Commentary
This conversion also occurs in translation phase 5.
133 transla-
tion phase
5
1882
Whether the numeric value for these character constants matches the value obtained when an identical
character constant occurs in an expression (other than within a
#if
or
#elif
directive) is implementation-
deﬁned.
143)
Commentary
The C committee recognized that developers may choose to perform different phases of translation on
different hosts. For instance, source ﬁles may be preprocessed and then distributed for further translation on
other, different, hosts.

Common Implementations
Differences between the numeric values in these two cases is rare (although cases involving Ascii and
EBCDIC character sets do occur).
3 EBCDIC
Coding Guidelines
Making use of the numeric value of character constants is making use of representation information, which is
covered by a guideline recommendation. However, there are cases where deviations may occur.
569.1 represen-
tation in-
formation
using
569.1 represen-
tation in-
formation
using
Example
See footnote 141.
1874 footnote
141
1883
Also, whether a single-character character constant may have a negative value is implementation-deﬁned. basic char-
acter set
may be negative
June 24, 2009 v 1.2
6.10.1 Conditional inclusion
1888
Commentary
The guarantee on the value being nonnegative does not apply during preprocessing. For instance, a pre-
basic char-
acter set

positive if stored
in char object
478
processing using the EBCDIC character set and acting as if the type
char
was signed. In other contexts
the value of a character constant containing a single-character that is not a member of the basic execution
character set is implementation-deﬁned.
character
constant
more than
one character
885
Coding Guidelines
The discussion on the possibility of character constants having other implementation-deﬁned values is
character
constant
more than
one character
885
applicable here.
1884
Preprocessing directives of the forms#ifdef
#ifndef
# ifdef identifier new-line group
opt
# ifndef identifier new-line group
opt
check whether the identiﬁer is or is not currently deﬁned as a macro name.
Commentary

There is no
#elifdef
form (although over half of the uses of the
#elif
directive are followed by a single
instance of the defined operator— Table 1872.1).
1885
Their conditions are equivalent to #if defined identifier and #if !defined identifier respectively.
Commentary
The
#ifdef
and
#ifndef
forms are rather like the unary
++
and
--
operators in that they provide a short
hand notation for commonly used functionality.
Coding Guidelines
The
#ifdef
forms are the most common form of conditional inclusion directive. Measurements (see
Table 1872.1) also show that nearly a third of the uses of the
defined
operator could be replaced by one of
these forms. There are advantages (e.g., most common form suggests most practiced form for readers, and
ease of visual scanning down the left edge of the source) and disadvantages (e.g., requires more effort to
add additional conditions to the single test being made) to using the
#ifdef

forms, instead of the
defined
operator. However, there does not appear to be a worthwhile cost/beneﬁt to recommending one of the
possibilities.
1886
142) Thus on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant 0x8000
is signed and positive within a #if expression even though it is unsigned in translation phase 7.
Commentary
The wording was changed by the response to DR #265.
1887
143) Thus, the constant expression in the following
#if
directive and
if
statement is not guaranteed to
footnote
143
evaluate to the same value in these two contexts.
#if ’z’ - ’a’ == 25
if (’z’ - ’a’ == 25)
Commentary
This situation could occur, for instance, if the Ascii representation were used during the preprocessing phases
and EBCDIC were used during translation phase 5.
transla-
tion phase
5
133
1888
Each directive’s condition is checked in order.
v 1.2 June 24, 2009

6.10.1 Conditional inclusion
1890
Commentary
The order is from the lowest line number to the highest line number.
Coding Guidelines
It may be possible to obtain some translation time performance advantage (at least for the original developer)
by appropriately ordering the directives. Unlike developer behavior with
if
statements, developers do not
1739 selection
statement
syntax
usually aim to optimize speed of translation when deciding how to order conditional inclusion directives
(experience suggests that developers often simply append new directive to the end of any existing directives).
Recognizing a known pattern in a sequence of directives has several beneﬁts for readers. They can make
use of any previous deductions they have made on how to interpret the directives and what they represent,
and the usage highlights common dependencies in the source. In the following code fragment more reader
effort is required to spot similarities in the sequence that directives are checked than if both sequences of
directives had occurred in the same order.
1 #ifdef MACHINE_A
2 /
*
...
*
/
3 #else
4 #ifdef MACHINE_B
5 /
*
...

*
/
6 #endif
7 #endif
8
9 #ifdef MACHINE_B
10 /
*
...
*
/
11 #else
12 #ifdef MACHINE_A
13 /
*
...
*
/
14 #endif
15 #endif
Given the lack of attention from developers on the relative ordering of directives and the beneﬁts of using
the same ordering, where possible, a guideline recommendation appears worthwhile. However, a guideline
recommendation needs to be automatically enforceable and determining when two sequences of directives
0 guideline rec-
ommendation
enforceable
have the same affect, during translation, may be infeasible because information that is not contained within
the source may be required (e.g., dependencies between macro names that are likely to be deﬁned via
translator command line options).
Rev

1888.1
Where possible the visual order of evaluation of expressions within different sequences of nested
conditional inclusion directives shall be the same.
1889
If it evaluates to false (zero), the group that it controls is skipped: directives are processed only through the
name that determines the directive in order to keep track of the level of nested conditionals;
Commentary
A parallel can be drawn with the behavior of
if
statements, in that if their controlling expression evaluates to
1744 if statement
operand compare
against 0
zero, during program execution, any statements in the associated block are skipped.
1890
directives are processed only through the name that determines the directive in order to keep track of the level
directive
processing
while skipping
of nested conditionals;
Commentary
The preprocessor operates on a representation of the source written by the developer, not translated machine
code. As such it needs to perform some processing on its input to be able to deduce when to stop skipping.
June 24, 2009 v 1.2
6.10.1 Conditional inclusion
1891
Physical lines skipped
Toplev elﬁles
1
10

100
1,000
50 100 150
× #if part
•
#else part
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×××
×

×
××
××
×
×
×
×
×
×
×
×××
×××
×
×
×
×××
×××
×
×
××
×
×××
×
×
×
×
××
××××
×
×××

×
×
××
×
×××××××× ×
×
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
••
•
•
•
••
•

•
•
•
•
•
•
•••••• • •••
•
•• • • • •
Physical lines skipped
Translation units
50 100 150
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

××
×
×
×
××
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
××
××
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
××
×
×
×
×××
××
•
•
•
•
•
•
•
•
•

•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•••
•
•
• ••
•
•
•
•
•
•
• •
Figure 1889.1:

Number of top-level source ﬁles (i.e., the contents of any included ﬁles are not counted) and (right) complete
translation units (including the contents of any ﬁles
#include
d more than once) having a given number of lines skipped during
translation of this book’s benchmark programs.
Directives need to be processed to keep track of the level of nesting of conditionals and translation phases
1–3 still need to be performed (line splicing could affect what is or is not the start of a line) and characters
transla-
tion phase
1
116
within a comment must not be treated as directives.
The intent of only requiring a minimum of directive processing, while skipping, is to enable partially
written source code to be skipped and to allow preprocessors to optimize their performance in this special
case, speeding up the rate at which the input is processed.
Example
1 #if 1
2 extern int ei;
3
4 #elif " an unmatched quote character, undefined behavior
5
6 extern int foo_bar;
7 #endif
8
9 #if 0
10 printf("\
11 #endif \n");
12
13 #endif
14

15 #if 0
16 /
*
17 #endif
18
*
/
19 #endif
1891
the rest of the directives’ preprocessing tokens are ignored, as are the other preprocessing tokens in the
group.
Commentary
There is no requirement that any directive be properly formed, according to the preprocessor syntax. However,
preprocessor
directives
syntax
1854
preprocessing tokens still need to be created, before they are ignored (as part of translation phase 3).
transla-
tion phase
3
124
v 1.2 June 24, 2009
6.10.2 Source ﬁle inclusion
1896
Example
In the following the
#define
directive is not well formed. But because this group is being skipped the
translator is required to ignore this fact.

1 #if 0
2 #define M(e
3 #endif
1892
Only the ﬁrst group whose control condition evaluates to true (nonzero) is processed.
Commentary
This group is processed exactly as-if it appeared in the source outside of any group.
1893
If none of the conditions evaluates to true, and there is a
#else
directive, the group controlled by the
#else
is
processed;
Commentary
A semantic rule to associate
#else
with the lexically nearest preceding
#if
(or similar form) directive, like
the one given for
if
statements, is not needed because conditional inclusion is terminated by a
#endif
1747 else
binds to near-
est if
directive.
Like the matching
#if

(or similar form) directive case, all preprocessing tokens in the group are treated as
if they appeared outside of any conditional inclusion directive. Processing continues until the ﬁrst
#endif
is
encountered (which must match the opening directive).
Coding Guidelines
The arguments made for
if
statements always containing an
else
arm might be thought to also apply to
1745 else
conditional inclusion. However, the presence of a matching
#endif
directive reduces the likelihood that
readers will confuse which preprocessing directive any
#else
associates with (although other issues, such
as lack of indentation or a large number of source lines between directives can make it difﬁcult to visually
associate matching directives).
1894
lacking a #else directive, all the groups until the #endif are skipped.
144)
Commentary
The affect of this speciﬁcation mimics the behavior of if statements.
1747 else
binds to near-
est if
1895
Forward references:

macro replacement (6.10.3), source ﬁle inclusion (6.10.2), largest integer types
(7.18.1.5).
6.10.2 Source ﬁle inclusion
Constraints
1896
A #include directive shall identify a header or source ﬁle that can be processed by the implementation. source ﬁle
inclusion
Commentary
There is no requirement that a header be represented using a source ﬁle. It could be represented using prebuilt
2018 footnote
153
information within the translator that is enabled only when the appropriate
#include
directive is encountered
during preprocessing (but not in a group that is skipped). Also there is no requirement that the spelling of
the header in the C source ﬁle be represented by a source ﬁle of the same spelling. The C Standard has no
explicit knowledge of ﬁle systems and is silent on the issue of directory structures. Minimum required limits
on the implementation processing of a header name are speciﬁed elsewhere.
1909 #include
mapping to host
ﬁle
Failure to locate a header or source ﬁle that can be processed by the implementation (e.g., a ﬁle of the
speciﬁed name does not exist, at least along the places searched) is a constraint violation.
June 24, 2009 v 1.2
6.10.2 Source ﬁle inclusion
1896
Other Languages
Most languages do not specify a
#include
mechanism, although many of their implementations provide

one. The approach commonly used by C implementations is popular, but not universal. Some languages
explicitly state that a
#include
directive denotes a ﬁle of the given name in the translators host environment.
Common Implementations
For most implementations the header name maps to a ﬁle name of the same spelling. It is quite common
for the translation environment to ignore the case of alphabetic letters (e.g., MS-DOS and early versions of
Microsoft Windows), or to limit the number of signiﬁcant characters in the ﬁle name denoted by a header
name (the remaining characters being ignored). Use of the
/
character in specifying a full path to a ﬁle is
sufﬁciently common usage that even host environments where this character is not normally associated with
a directory separator support such usage in header names (many Microsoft windows translators support this
character, as well as the \ character, as a directory separator).
In the majority of implementations
#include
directives specify ﬁles containing source in text form.
source ﬁle
representation
121
However, some implementations support what are known as precompiled headers.
header
precompiled
121
It is not uncommon (over 10% of
#include
s in Figure 1896.1) for the same header to be
#include
d
more than once when translating a source ﬁle (it is a requirement that implementations support this usage for

standard headers). The following are some of the techniques implementations use to reduce the overhead of
subsequent #includes.
•
A common convention is to bracket the contents of a header, starting with the preprocessing token
sequence
#ifndef _ _H_file_name_ _
/
#define _ _H_file_name_ _
and ending with
#endif
. The
processing of subsequent
#include
s of the same header is then reduced to the minimal processing
needed to skip to the matching
#endif
. Some implementations (e.g.,
gcc
) go one step further and
detect headers that contain such bracketing the ﬁrst time they are processed, and completely skips
opening and processing the header if it is subsequently encountered again in a #include directive.
•
Support the preprocessing directive
#import
.
[359]
This directive is equivalent to the
#include
directive
except that if the speciﬁed header has already been included it is not included again.

Coding Guidelines
Some coding guideline documents recommend that implementation supplied headers appear before developer
written headers, in a source ﬁle. Such recommendations overlook the possibility that a developer written
header might itself #include an implementation header.
Times #included
Number of #includes
1 5 10
1
10
100
1,000
10,000
100,000
×
× All #includes
∆
∆ User #includes
•
•
Nested user #includes
∆
•
×
∆
•
×
∆
•
×
∆

•
×
∆
•
×
∆
•
×
∆
•
×
∆
•
×
∆
×
×
∆×
Figure 1896.1:
Number of times the same header was
#include
d during the translation of a single translation unit. The crosses
denote all headers (i.e., all systems headers are counted), triangles denote all headers delimited by quotes (i.e., likely to be user
deﬁned headers) and bullets denote all quote delimited headers
#include
nested at least three levels deep. Based on the translated
form of this book’s benchmark programs.
v 1.2 June 24, 2009
6.10.2 Source ﬁle inclusion
1897

Unnecessary headers #include’d
Translation units
0 5 10 15 20
1
10
100
1,000
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
× ×
×
Figure 1896.2:
Number of preprocessing translation units (excluding system headers) containing a given number of
#include
s

whose contents are not referenced during translation (excludes the case where the same header is
#include
d more than once, see
Figure 1896.1). Based on the translated form of this book’s benchmark programs.
#includes
Source ﬁles
0 10 20 30 40 50 60
1
10
100
1,000
<header>
"header"
Figure 1896.3:
Number of
.c
source ﬁles containing a given number of
#include
directives (dashed lines represent number of
unique headers). Based on the visible form of the .c ﬁles.
Experience suggests that once a
#include
directive appears in a source ﬁle it is rarely removed (see
Figure 1896.2) and that new
#include
directives are simply added after the last one. The issue of redundant
code is discussed elsewhere.
190 redundant
code
There does not appear to be a worthwhile beneﬁt in ordering

#include
directives in any way (apart from
any relative ordering dictated by dependencies between headers).
Table 1896.1:
Occurrence of two forms of
header-name
s (as a percentage of all
#include
directives), the percentage of each
kind that speciﬁes a path to the header ﬁle, and number of absolute paths speciﬁed. Based on the visible form of the .c ﬁles.
Header Form % Occurrence % Uses Path Number Absolute Paths
<h-char-sequence> 75.0 86.4 0
"q-char-sequence" 25.0 17.2 0
Semantics
1897
A preprocessing directive of the form #include
h-char-sequence
# include <h-char-sequence> new-line
June 24, 2009 v 1.2
6.10.2 Source ﬁle inclusion
1897
Rank
Occurrences of header name
1
10
100
1,000
1 10 100 1000
× <header>×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×××
××
×
×
×
×
×
×
×
×
××
×
×
×
×
×

×
×
×
×
××
×
×
××
×
×
××
×
×
×
××
×
××
××
×
×
×
×
×
×
××
×
×
×
×××
×

××
×
×
×
××
××
×
×
×××
×
×
×
×××
×
×
×××
×
××
×
××××
×
×××
×
××
×××
×
×××
×
××
×

×××
×××
××
×××
×
××××
××××
××
×××
×××××××
×××××
×××××
××××
×××
×××××
××
×××
××××××
×××
××××××
××××××××
×××
××××××
×××
×××××
×××
×××××××××
×××××××
×××××××××××
×××××××××

××××××××××
×××××××××××××
××××××××
××××××××
××××××
×××××××××
×××××××××××
×××××××××××××××××
×××××××××××××××××××××
××××××××××××××××××
×××××××××××××××××
××××××××××××××××××××
×××××××××××××××××××××××
××××××××××××××××××××××××
×××××××××××××××××××××××××××
×××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
•
"header"

•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
••
•••
•
•
•
•
•
•
•
•
•
•
•••

••
•••
••
•
••
•••
••
•
•
•
••
•
••
•••
•••
••
•
••
•••••
•••
•••
••
••
•
•••••
••
••
••••
•••
•••••••

•••
•••
••••••
•••••••
••••
••••
••••••••
•••••••
••••••••
••••••••
•••••••••••
••••••••••••••
•••••••••
•••••••••••••••••••
•••••••••••••••••••••
•••••••••••••••••••••••
•••••••••••••••
••••••••••••••••••••••
•••••••••••••••••••
••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Figure 1896.4: header-name
rank (based on character sequences appearing in
#include
directives) plotted against the number
of occurrences of each character sequence. Also see Figure 792.26. Fitting a power law using MLE for
<header-name>
and
"header-name"
gives respective an exponent of -2.26,
x
min
= 8
, and -1.8,
x
min
= 9
. Based on the visible form of the
.c
ﬁles.
searches a sequence of implementation-deﬁned places for a header identiﬁed uniquely by the speciﬁed
sequence between the
<
and
>
delimiters, and causes the replacement of that directive by the entire contents
of the header.
Commentary
File systems invariably provide a unique method of identifying every ﬁle they contain (e.g., a full path

name). The base document recognized the disadvantages of requiring that the full path name be speciﬁed in
each
#include
directive and permitted a substring of it to be given. The implementation-deﬁned places are
usually additional character sequences (e.g., directory names) added to the
h-char-sequence
in an attempt
header name
syntax
918
to create a full path name that refers to an existing ﬁle.
Rationale
The ﬁle search rules used for the ﬁlename in the
#include
directive were left as implementation-deﬁned. The
Standard intends that the rules which are eventually provided by the implementor correspond as closely as
possible to the original K&R rules. The primary reason that explicit rules were not included in the Standard
is the infeasibility of describing a portable ﬁle system structure. It was considered unacceptable to include
UNIX-like directory rules due to signiﬁcant differences between this structure and other popular commercial
ﬁle system structures.
Nested include ﬁles raise an issue of interpreting the ﬁle search rules. In UNIX C a #include directive found
within an included ﬁle entails a search for the named ﬁle relative to the ﬁle system directory that holds the
outer
#include
. Other implementations, including the earlier UNIX C described in K&R, always search relative
to the same current directory. The C89 Committee decided in principle in favor of K&R approach, but was
unable to provide explicit search rules as explained above.
Other Languages
Other languages (or an extension provided by their implementations) commonly use the double-quote
delimited form.

Common Implementations
The character sequence between the
<
and
>
delimiters is invariably treated as the name of a ﬁle, possibly in-
cluding a path. The ordering of the search sequence used for directives having the form
<h-char-sequence>
#include
mapping
to host ﬁle
1909
is often different from that used for the form
"q-char-sequence"
. For instance, in the
<h-char-sequence>
case the contents of
/usr/include
might be searched ﬁrst, followed by the contents of the directory con-
taining the
.c
ﬁle, while in
"q-char-sequence"
case the contents of the directory containing the
.c
ﬁle
might be searched ﬁrst, followed by other places.
v 1.2 June 24, 2009
6.10.2 Source ﬁle inclusion
1897

The environment in which a translator executes may also affect the sequence of places that are searched.
For instance, the affect of relative path names (e.g.,
../proj/abc.h
) on the identity of the current directory.
gcc
searches two directories,
/usr/include
and another directory that holds very machine speciﬁc ﬁles,
such as
stdarg.h
(e.g.,
/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/include
on your au-
thors computer).
gcc
supports the
#include_next
directive. This directive causes the search algorithm to
skip some of the initial implementation-deﬁned places that would normally be searched. The initial places
that are skipped are those that were searched in locating the ﬁle containing the
#include_next
directive
(including the place where the search succeeded).
Tzerpos and Holt
[1416]
describe a well-formedness theory of header inclusion that enables unnecessary
#include directives to be deduced.
Coding Guidelines
The standard does not specify the order in which the implementation-deﬁned places are searched. This is a
potential coding guideline issue because it is possible that a

h-char-sequence
will match in more than one
of the places (i.e., there is a ﬁle having the same name along several of the different possible search paths).
The behavior is thus dependent (i.e., it is assumed that the contents of the different headers will be different)
on the order in which the places are searched.
Experience suggests that the affect of a translator locating an
#include
d ﬁle different from the one
expected to be located by the developer has one of two consequences— (1) when the contents of the ﬁle
accessed is similar to the one intended (e.g., a different version of the intended ﬁle) the source ﬁle may be
successfully translated, and (2) when the contents of the ﬁle accessed has no connection with the intended
ﬁle the source is rarely successfully translated. The problem might therefore be considered to be one of
version management, rather than the choice of characters used in a
h-char-sequence
. There are a number
of reasons why a solution to this issue is to not use h-char-sequences at all, including the following:
•
For the
< >
delimited form, implementations usually look in a predeﬁned location ﬁrst (as described in
the Common implementation section above and in the following C sentence).
1898 #include
places to search
for
Ensuring that the names chosen by developers for the headers they create are different from those of
system headers is an almost impossible task. While it might be possible to enumerate the set of names
of existing ﬁle names of system headers contained in commercially important environments, members
are likely to be added to this set on a regular basis.
Rather than trying to avoid using ﬁle names likely to match those of system headers, developers could
ensure that places containing system headers are searched last.

•
The
< >
delimited form is often considered to denote externally supplied headers (e.g., provided by
the implementation or translator environment vendor). What constitutes a system supplied header is
open to interpretation. One distinction that can be made between system and developer headers is that
developers do not control of the contents of system headers. Consequently, it can be argued that their
contents are not subject to coding guidelines.
Headers whose contents have been written by developers are subject to coding guidelines. The
convention generally adopted to indicate this status is to use the double-quote character delimit form
of #include.
Rev
1897.1
Developer written headers in a #include directive shall not be delimited by the < and > characters.
Developers sometimes specify full path names in headers (see Table 1896.1). This is a conﬁguration
management issue and is not considered to be within the scope these coding guidelines.
June 24, 2009 v 1.2
6.10.2 Source ﬁle inclusion
1899
Table 1897.1:
Number of various kinds of identiﬁers declared in the headers contained in the
/usr/include
directory of some
translation environments. Information was automatically extracted and represents an approximate lower bound. Versions of the
translation environments from approximately the same year (mid 1990s) were used. The counts for ISO C assumes that the
minimum set of required identiﬁers are declared and excludes the type generic macros.
Information Linux 2.0 AIX on RS/6000 HP/UX 9 SunOS 4 Solaris 2 ISO C
Number of headers 2,006 1,514 1,264 987 1,495 24
macro deﬁnitions 10,252 18,637 13,314 11,987 10,903 446
identiﬁers with external linkage 1,672 1,542 1,935 616 1,281 487

identiﬁers with internal linkage 80 34 2012 0 5 0
tag declaration 716 1,088 899 1,208 945 3
typedef name declared 1,024 828 15 493 1,027 55
1898
How the places are speciﬁed or the header identiﬁed is implementation-deﬁned.#include
places to search
for
Commentary
The differences between the environments in which translation occurs has narrowed over the years. However,
even although there may be much common practice, such are issues are considered to be outside the scope of
the C Standard.
program
transformation
mechanism
10
Common Implementations
Implementations invariably search one or more predeﬁned locations ﬁrst (e.g.,
/usr/include
), followed
by a list of alternative places. A number of techniques are used to allow developers to specify a list of
alternative places to be searched for ﬁles corresponding to the headers speciﬁed in a
#include
directive. For
instance, the alternative places may be speciﬁed via a translator command line option (e.g.,
-I
), in a translator
conﬁguration ﬁle (e.g., gcc version 2.91.66 hosted on RedHat Linux reads many default locations from the
ﬁle
/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs
, although the path

/usr/include
is still hard coded in the translator sources), or an environment variable (e.g., several Microsoft windows
based translators use INCLUDE).
The directory separator used in Unix and MS-DOS slants in different directions. Many implementations,
in both environments, recognize both characters as directory delimiters. One consequence of this is that
escape sequences are not recognized as such (something that is unlikely to be a problem in header names).
The RISCOS environment does not support ﬁlenames ending in
.h
. The implementation-deﬁned behavior
for this host is to look in a directory called h, for a ﬁle of the given name with the .h removed.
Coding Guidelines
The implementation-deﬁned behavior associated with how the places are speciﬁed occurs outside of the
source code and is the remit of conﬁguration management guidelines. For this reason nothing further is said
here.
1899
A preprocessing directive of the form#include
q-char-sequence
# include "q-char-sequence" new-line
causes the replacement of that directive by the entire contents of the source ﬁle identiﬁed by the speciﬁed
sequence between the " delimiters.
Commentary
The commonly accepted intent of this form of the
#include
directive is that it is used to reference source ﬁles
created by developers (i.e., headers that are not provided as part of the implementation or host environment).
The only syntactic difference between
q-char-sequence
and
h-char-sequence
is that neither sequence

may contain their respective delimiters.
header name
syntax
918
Most
q-char-sequence
s end with one of two character sequences (i.e.,
.c
or
.h
). The character
sequences before these sufﬁxes is often called the header name.
v 1.2 June 24, 2009
6.10.2 Source ﬁle inclusion
1901
Other Languages
The use of double-quote as the delimiter is the almost universal form used in other languages (although some
use the ’ character because that is what is used to delimit string literals).
Coding Guidelines
The term commonly used to refer to these source ﬁles is header. The context of the conversation often being
used to distinguish any other intended usage. The intent is that the contents of these source ﬁles is controlled
by developers and as such they are subject to coding guidelines.
1900
The named source ﬁle is searched for in an implementation-deﬁned manner.
Commentary
While this “implementation-deﬁned manner” might be the same as that for the
< >
delimited form. The intent
is for it to be sufﬁciently different that developers do not need to be concerned about the name of a header
created by them matching one provided as part of the implementation (and therefore potentially found by the

translator when searching for a matching header). For instance, your author does not know the names of
most of the 304 ﬁles (e.g.,
compface.h
) contained in
/usr/include
on his software development computer.
The discussion on the < > delimited form is applicable here.
1897 #include
h-char-sequence
Common Implementations
The search algorithm used invariably differs from that used for the
< >
delimited form (otherwise there would
be little point in distinguishing the two cases). The search algorithm used by some implementations is to
ﬁrst look in the directory containing the source ﬁle currently being translated (which may itself have been
included). If that search fails, and the current source ﬁle has itself been included, the directory containing the
source ﬁle that
#include
it is then searched. This process continuing back through any nested
#include
directives. For instance, in:
file_1.c
1 #include "abc.h"
file_2.c
1 #include "/foo/file_1.c"
file_3.c
1 #include "/another/path/file_2.c"
(assuming the translation environment supports the path names used), translating the source ﬁle
file_3.c
causes

file_2.c
to be included, which in turn includes
file_3.c
. The source ﬁle
abc.h
will be searched
for in the directories /foo, /another/path and then the directory containing file_3.c.
Some implementations use the double-quote delimited form within their system headers, to change the
default ﬁrst location that is searched. For instance, a third-party API may contain the header
abc.h
, which
in turn needs to include
ayx.h
. Using the form
"ayx.h"
means that the implementation will search in the
directory containing
abc.h
ﬁrst, not
/usr/include
. This usage can help localize the ﬁles that belong to
speciﬁc APIs. Other implementations use a search algorithm that starts with the directory containing the
original source ﬁle being translated.
If the source ﬁle is not found after these places have been searched, some implementations then search
other places speciﬁed via any translator options. Other implementations simply follow the behavior described
1898 #include
places to search
for
by the following C sentence (which has the consequence of eventually checking these other places).
1901

If this search is not supported, or if the search fails, the directive is reprocessed as if it read
# include <h-char-sequence> new-line
with the identical contained sequence (including > characters, if any) from the original directive.
June 24, 2009 v 1.2
6.10.2 Source ﬁle inclusion
1908
Commentary
The previous search can fail in the sense that it does not ﬁnd a matching source ﬁle.
Some existing code uses the double-quote delimited form of
#include
directive to include headers
provided by the implementation (rather than the
< >
delimited form). This requirement ensures that such
code continues to be conforming.
1902
144) As indicated by the syntax, a preprocessing token shall not follow a
#else
or
#endif
directive before the
footnote
144
terminating new-line character.
Commentary
Saying in words what is speciﬁed in the syntax.
Common Implementations
Many early implementations (and some present days ones, for compatibility with existing source) treated any
sequence of characters following one of these directives as a comment, e.g., #endif X == 1.
1903

However, comments may appear anywhere in a source ﬁle, including within a preprocessing directive.
Commentary
A comment is replaced by a single space character prior to preprocessing.
comment
replaced by space
126
preprocess-
ing directive
ended by
1858
1904
A preprocessing directive of the form
# include pp-tokens new-line
(that does not match one of the two previous forms) is permitted.
Commentary
This form permits the
< >
or double-quote delimited forms to be generated via macro expansion. However, it
#include
example 2
1914
is rarely used (11 instances in over 60,000
#include
directives in the visible source of the
.c
ﬁles). Whether
this is because developers are unaware of its existence, or because it has little utility is not known.
1905
The preprocessing tokens after
include

in the directive are processed just as in normal text. (Each identiﬁer
#include
macros expanded
currently deﬁned as a macro name is replaced by its replacement list of preprocessing tokens.)
Commentary
To be exact, the preprocessing tokens after
include
in the directive up to the ﬁrst new-line character are
processed just as in normal text.
1906
(Each identiﬁer currently deﬁned as a macro name is replaced by its replacement list of preprocessing tokens.)
Commentary
This C sentence provides explicitly clariﬁcation that macro replacement occurs in this case (the same
clariﬁcation is also given elsewhere).
#line
macros expanded
1991
1907
The directive resulting after all replacements shall match one of the two previous forms.
145)
Commentary
It is not a violation of syntax if the directive does not match one of the two previous forms, because the
syntax of this form has been matched. It is a violation of semantics and therefore the behavior is undeﬁned.
1908
The method by which a sequence of preprocessing tokens between a
<
and a
>
preprocessing token pair or a
pair of " characters is combined into a single header name preprocessing token is implementation-deﬁned.

v 1.2 June 24, 2009
6.10.2 Source ﬁle inclusion
1909
Commentary
This implementation-deﬁned behavior may take a number of forms, including:
•
The
##
operator can be used to glue preprocessing tokens together. However, the behavior is undeﬁned
1958 ##
operator
if the resulting character sequence is not a valid preprocessing token. For instance, the ﬁve preprocess-
1963 ##
if result not
valid
ing tokens {
{
} {
string
} {
.
} {
h
} {
}
} cannot be glued together to form a valid preprocessing token
without going through intermediate stages whose behavior is undeﬁned.
•
Creating a preprocessing token, via macro expansion, having the double-quote delimited form (i.e., a
string preprocessing token) need not depend on any implementation-deﬁned behavior. The stringize

operator can be used to create a string preprocessing token.
1950 #
operator
•
Other implementation-deﬁned behaviors might include the handling of space characters. For instance,
in the following:
1 #define bra <
2 #define ket >
3 #include bra stdio.h ket
does the implementation strip off the space character at the ends of the delimited character sequence?
Coding Guidelines
Given the rarity of use of this form of #include no guideline recommendations are given here.
Example
1 #define mk_sys_hdr(name) < ## name ## >
2
3 #if BUG_FIX
4 #define VERSION 2a /
*
works because pp-numbers include alphabetics
*
/
5 #else
6 #define VERSION 2
7 #endif
8
9 #define add_quotes(a) # a
10 #define mk_str(str, ver) add_quotes(str ## ver)
11
12 #include mk_str(Version, VERSION)
1909

The implementation shall provide unique mappings for sequences consisting of one or more letters or digits
#include
mapping
to host ﬁle
(as deﬁned in 5.2.1) nondigits or digits (6.4.2.1) followed by a period (.) and a single letter nondigit.
Commentary
This C sentence and the following ones in this C paragraph are a speciﬁcation of the minimum set of
requirements that an implementation must meet. For sequences outside of this set the implementation mapping
may be non-unique (like, for instance, the Microsoft Windows technique of mapping ﬁles ending in
.html
to
.htm
). The handling of character sequences that resemble UCNs may also differ, e.g.,
"\ubada\file.txt"
(Ubada is a city in Tanzania and BADA is the Hangul symbol
붚
in ISO 10646). The standard does not
permit any number of period characters because many operating systems do not permit them (at least one,
RISCOS, does not permit any).
The wording was changed by the response to DR #302 to extend the speciﬁcation to be more consistent
with C
++
.
C
++
16.2p5
June 24, 2009 v 1.2
6.10.2 Source ﬁle inclusion
1911
The implementation provides unique mappings for sequences consisting of one or more nondigits (2.10) followed

by a period (.) and a single nondigit.
Other Languages
Other languages either speciﬁed to operate within the same operating systems and ﬁle systems limitations as
C and as such have to deal with the same issues, or require an integrated development environment to be
created before they can be used.
Common Implementations
Implementations invariably pass the sequence of characters that appear between the delimiters (when
searching other places a directory path may be added) as an argument in a call to
fopen
or equivalent system
function. The called library function will eventually call some host operating system function that interfaces
to the host ﬁle system. The C translator’s behavior is thus controlled by the characteristics of the host ﬁle
system and how it maps character sequences to ﬁle names. The handling of the period character varies
between ﬁle systems, known behaviors include:
• Unix based ﬁle systems permit more than one period in a ﬁle name.
• MS-DOS based ﬁle systems only permit a single period in a ﬁle name.
•
RISCOS, an operating system for the Acorn ARM processor does not support ﬁlenames that contain
a period. For this host ﬁle names, that contained a period, speciﬁed in a
#include
directive were
mapped using a directory structure. All ﬁle names ending in the characters
.h
were searched for in a
directory called h.
Coding Guidelines
Because an implementation is not required to provide a unique mapping for all sequences it is possible that
an unintended header or source ﬁle will be accessed, or the translator will fail to identify a known header or
source ﬁle. The possible consequences of an unintended access are discussed elsewhere, while failure to
#include

h-char-sequence
1897
identify known header or source ﬁle will cause a diagnostic to be issued. The cost/beneﬁt issues associated
source ﬁle
inclusion
1896
with using character sequences having a unique mapping in the different environments that the source may
be translated in is outside the scope of these coding guidelines.
1910
The ﬁrst character shall be a letter not be a digit.
Commentary
This requirement only applies to the ﬁrst character of the sequence that implementations are required to
provide a unique mapping for.
The wording was changed by the response to DR #302.
C90
The requirement that the ﬁrst character not be a digit is new in C99. Given that it is more restrictive than that
required for existing C90 implementations (and thus existing code) it is unlikely that existing code will be
affected by this requirement.
C
++
This requirement is new in C99 and is not speciﬁed in the C
++
Standard (the argument given in the C90
subsection (above) also applies to C
++
).
Common Implementations
Most implementations support a ﬁrst character that is not a letter.
1911
The implementation may ignore the distinctions of alphabetical case and restrict the mapping to eight signiﬁcant

header name
signiﬁcant charac-
ters
characters before the period.
v 1.2 June 24, 2009
6.10.2 Source ﬁle inclusion
1914
Commentary
These permissions reﬂect known characteristics of ﬁle systems in which translators are executed.
C90
The limit speciﬁed by the C90 Standard was six signiﬁcant characters. However, implementations invariably
used the number of signiﬁcant characters available in the host ﬁle system (i.e., they do not artiﬁcially limit the
number of signiﬁcant characters). It is unlikely that a header of source ﬁle will fail to be identiﬁed because
of a difference in what used to be a non-signiﬁcant character.
C
++
The C
++
Standard does not give implementations any permissions to restrict the number of signiﬁcant
characters before the period (16.1p5). However, the limits of the ﬁle system used during translation are likely
to be the same for both C and C
++
implementations and consequently no difference is listed here.
Common Implementations
All ﬁle systems place some limits on the number of characters in a source ﬁle name— for instance:
•
Most versions of the Microsoft DOS environment ignore the distinction of alphabetic case and restrict
the mapping to eight signiﬁcant characters before any period (and a maximum of three after it).
•
POSIX requires that at least 14 characters be signiﬁcant in a ﬁle name (it also requires implementations

to support at least 255 characters in a pathname). Many Linux ﬁle systems support up to 255 characters
in a ﬁlename and 4095 characters in a pathname.
Coding Guidelines
The potential problems associated with limits on sequences characters that are likely to be treated as unique
is a conﬁguration management issue that is outside the scope of these coding guidelines.
1912
A
#include
preprocessing directive may appear in a source ﬁle that has been read because of a
#include
directive in another ﬁle, up to an implementation-deﬁned nesting limit (see 5.2.4.1).
Commentary
Thus
#include
directives can be nested within source ﬁles whose contents have themselves been
#include
d.
This issue is discussed elsewhere. While this permission only applies to source ﬁles, an implementation
295 limit
#include nest-
ing
using some form of precompiled headers (which are not source ﬁles within the standard’s deﬁnition of the
121 header
precompiled
term) that did not support this functionality would not be popular with developers.
108 source ﬁles
1913
EXAMPLE 1 The most common uses of #include preprocessing directives are as in the following:
#include <stdio.h>
#include "myprog.h"

Other Languages
Some languages only have a single form of #include directive for all headers.
1914
EXAMPLE 2 This illustrates macro-replaced #include directives: #include
example 2
#if VERSION == 1
#define INCFILE "vers1.h"
#elif VERSION == 2
#define INCFILE "vers2.h" // and so on
#else
#define INCFILE "versN.h"
#endif
#include INCFILE
June 24, 2009 v 1.2
6.10.3 Macro replacement
1919
Commentary
This example does not illustrate any beneﬁt compared to that obtained from placing separate
#include
directives in each arm of the conditional inclusion directive.
1915
Forward references: macro replacement (6.10.3).
1916
145) Note that adjacent string literals are not concatenated into a single string literal (see the translation
footnote
145
phases in 5.1.1.2);
Commentary
String concatenation occurs in translation phase 6 and so it is not possible to join together two existing strings
transla-

tion phase
6
135
to form another string within a #include directive.
1917
thus, an expansion that results in two string literals is an invalid directive.
Commentary
It is an invalid directive in that it violates a semantic requirement and thus the behavior is undeﬁned. It is not
a syntax violation.
6.10.3 Macro replacement
Constraintsmacro replace-
ment
1918
Two replacement lists are identical if and only if the preprocessing tokens in both have the same number,
replacement list
identical if
ordering, spelling, and white-space separation, where all white-space separations are considered identical.
Commentary
This is actually a deﬁnition in a Constraints clause (it is used by two constraints in this C subsection).
The check against same spelling only needs to take into account the signiﬁcant characters of an identiﬁer.
internal
identiﬁer
signiﬁcant
characters
282
Considering all white-space separations to be identical removes the need for developers to be concerned about
use of different source layout (e.g., indentation) and method of spacing (e.g., space character vs. horizontal
tab).
Rationale
The speciﬁcation of macro deﬁnition and replacement in the Standard was based on these principles:

• Interfere with existing code as little as possible.
• Keep the preprocessing model simple and uniform.
• Allow macros to be used wherever functions can be.
•
Deﬁne macro expansion such that it produces the same token sequence whether the macro calls
appear in open text, in macro arguments, or in macro deﬁnitions.
Preprocessing is speciﬁed in such a way that it can be implemented either as a separate text-to-text prepass
or as a token-oriented portion of the compiler itself. Thus, the preprocessing grammar is speciﬁed in terms of
tokens.
1919
An identiﬁer currently deﬁned as an object-like macro shall not be redeﬁned by another
#define
preprocessing
object-like
macro redeﬁni-
tion
directive unless the second deﬁnition is an object-like macro deﬁnition and the two replacement lists are
identical.
v 1.2 June 24, 2009
6.10.3 Macro replacement
1921
Commentary
There was an existing body of code, containing redeﬁnitions of the same macro, when the C Standard
was ﬁrst written. The C committee did not want to specify that existing code containing such usage was
non-conforming, but they did consider the case where the bodies of any subsequent deﬁnitions differed to be
an erroneous usage.
1983 EXAMPLE
macro redeﬁnition
C90
The wording in the C90 Standard was modiﬁed by the response to DR #089.

Common Implementations
Some translators permit multiple deﬁnitions of a macro, independently of the contents of the contents of the
#deﬁne/#undef
stack
bodies. The behavior is for a new deﬁnition to cause the previous body to be pushed, in a stack-like fashion.
Any subsequent #undef of the macro name popping this stacked deﬁnition and to make it the current one.
Coding Guidelines
C permits more than one deﬁnition of the same macro name, with the same body, and more than one external
deﬁnition of the same object, with the same type and the coding guideline issues are the same for both (in
420 linkage
422.1 identiﬁer
declared in one ﬁle
both cases translators are not always required to issue a diagnostic if the deﬁnitions are considered to be
different).
In both cases a technique for avoiding duplicate deﬁnitions, during translation but not in the visible source,
is to bracket deﬁnitions with
#ifndef MACRO_NAME
/
#endif
(in the case of the ﬁle scope object a macro
name needs to be created and associated with its declaration). Using this technique has the disadvantage that
it prevents the translator checking that any subsequent redeclarations of an identiﬁer are the same (unless the
bracketing occurs around the only textual declaration that occurs in any source ﬁle used to build a program).
1920
Likewise, an identiﬁer currently deﬁned as a function-like macro shall not be redeﬁned by another
#define function-like
macro redeﬁnition
preprocessing directive unless the second deﬁnition is a function-like macro deﬁnition that has the same
number and spelling of parameters, and the two replacement lists are identical.
Commentary

The issues are the same as for object-like macros, with the addition of checks on the parameters. Requiring
1919 object-like
macro redeﬁnition
that the parameters be spelled the same, rather than, for instance, that they have an identical effect, simpliﬁes
the similarity checking of two macro bodies. For instance, in:
1 #define FM(foo) ((foo) + x)
2 #define FM(bar) ((bar) + x)
a translator is not required to deduce that the two deﬁnitions of FM are structurally identical.
1921
There shall be white-space between the identiﬁer and the replacement list in the deﬁnition of an object-like
macro.
Commentary
In the following (assuming
$
is a member of the extended character set and permitted in an identiﬁer
216 extended
character set
preprocessing token):
1 #define A$ x
an object-like macro with the name
A$
and the body
x
is deﬁned, not macro with the name
A
and the body
$
x.
There is no requirement that there be white-space following the ) in a function-like macro deﬁnition.
C90

The response to DR #027 added the following requirements to the C90 Standard.
DR #027
June 24, 2009 v 1.2
6.10.3 Macro replacement
1922
Correction
Add to subclause 6.8, page 86 (Constraints):
In the deﬁnition of an object-like macro, if the ﬁrst character of a replacement list is not a character required by
subclause 5.2.1, then there shall be white-space separation between the identiﬁer and the replacement list.*
[Footnote *: This allows an implementation to choose to interpret the directive:
#define THIS$AND$THAT(a, b) ((a) + (b))
as deﬁning a function-like macro
THIS$AND$THAT
, rather than an object-like macro
THIS
. Whichever choice it
makes, it must also issue a diagnostic.]
However, the complex interaction between this speciﬁcation and UCNs was debated during the C9X review
process and it was decided to simplify the requirements to the current C99 form.
1 #define TEN.1 /
*
Define the macro TEN to have the body .1 in C90.
*
/
2 /
*
A constraint violation in C99.
*
/
C

++
The C
++
Standard speciﬁes the same behavior as the C90 Standard.
Common Implementations
HP–was DEC– treats $ as part of the spelling of the macro name.
1922
If the identiﬁer-list in the macro deﬁnition does not end with an ellipsis, the number of arguments (including
those arguments consisting of no preprocessing tokens) in an invocation of a function-like macro shall equal
the number of parameters in the macro deﬁnition.
Commentary
This requirement is the macro invocation equivalent of the one for function calls.
function call
arguments agree
with parameters
998
C90
If (before argument substitution) any argument consists of no preprocessing tokens, the behavior is undeﬁned.
The behavior of the following was discussed in DR #003q3, DR #153, and raised against C99 in DR #259
(no committee response was felt necessary).
1 #define foo() A
2 #define bar(B) B
3
4 foo() // no arguments
5 bar() // one empty argument?
What was undeﬁned behavior in C90 (an empty argument) is now explicitly supported in C99. The two most
likely C90 translator undeﬁned behaviors are either to support them (existing source developed using such a
translator will may contain empty arguments in a macro invocation), or to issue a diagnostic (existing source
developed using such a translator will not contain any empty arguments in a macro invocation).
C

++
The C
++
Standard contains the same wording as the C90 Standard.
C
++
translators are not required to correctly process source containing macro invocations having any empty
arguments.
v 1.2 June 24, 2009
6.10.3 Macro replacement
1925
Common Implementations
Some C90 implementations (e.g.,
gcc
) treated empty arguments as an argument containing no preprocessing
tokens, while others (e.g., Microsoft C) treated an empty argument as being a missing argument (i.e., a
constraint violation).
1923
Otherwise, there shall be more arguments in the invocation than there are parameters in the macro deﬁnition
... arguments
macro
(excluding the ...).
Commentary
Rationale
There must be at least one argument to match the ellipsis. This requirement avoids the problems that occur
when the trailing arguments are included in a list of arguments to another macro or function. For example, if
dprintf had been deﬁned as
#define dprintf(format,...) \
dfprintf(stderr, format, __VA_ARGS__)
and it were allowed for there to be only one argument, then there would be a trailing comma in the expanded

form. While some implementations have used various notations or conventions to work around this problem,
the Committee felt it better to avoid the problem altogether.
C90
Support for the form ... is new in C99.
C
++
Support for the form ... is new in C99 and is not speciﬁed in the C
++
Standard.
Common Implementations
gcc allowed zero arguments to match a macro parameter deﬁned using the ... form.
Coding Guidelines
While some developers may be confused because the requirements on the number of arguments are different
from functions deﬁned using the ellipsis notation, passing too few arguments is a constraint violation (i.e.,
translators are required to issue a diagnostic that a developer then needs to correct).
1924
There shall exist a ) preprocessing token that terminates the invocation. macro invocation
) terminates it
Commentary
While this requirement is speciﬁed in the syntax, it is interpreted as requiring the
)
preprocessing token to
occur before any macro replacement of the identiﬁers following the matching
(
preprocessing token. For
instance, in:
1 #define R_PAREN )
2
3 #define FUNC(a) a
4

5 static int glob = (1 + FUNC(1 R_PAREN );
the invocation is terminated by the
)
preprocessing token that occurs immediately before
;
, not the expanded
form of R_PAREN.
1925
The identiﬁer
_ _VA_ARGS_ _
shall occur only in the replacement-list of a function-like macro that uses the
ellipsis notation in the argumentsparameters.
June 24, 2009 v 1.2
6.10.3 Macro replacement
1928
Commentary
This requirement simpliﬁes a translators processing of occurrences of the identiﬁer _ _VA_ARGS_ _.
This typographical correction was made by the response to DR #234.
C90
Support for _ _VA_ARGS_ _ is new in C99.
Source code declaring an identiﬁer with the spelling
_ _VA_ARGS_ _
will cause a C99 translator to issue a
diagnostic (the behavior was undeﬁned in C90).
C
++
Support for _ _VA_ARGS_ _ is new in C99 and is not speciﬁed in the C
++
Standard.
Common Implementations

gcc
required developers to give a name to the parameter that accepted a variable number of arguments. This
parameter name appeared in the replacement list wherever the variable number of arguments were to be
substituted.
Example
1 /
*
2
*
The following are constraint violations.
3
*
/
4 #define __VA_ARGS__
5 #define jparks __VA_ARGS__
6 #define jparks(__VA_ARGS__)
7 #define jparks(__VA_ARGS__, ...) __VA_ARGS__
8
9 #define jparks(x) x
10 jparks(__VA_ARGS__)
11
12 #define jparks(x, ...) x
13 jparks(__VA_ARGS__,1)
14 /
*
15
*
The following break the spirit, if not the wording
16
*

of this constraint.
17
*
/
18 #define jparks(x, y) x##y
19 jparks(__VA, _ARGS__)
20
21 #define jparks(x, y, ...) x##y
22 jparks(__VA, _ARGS__, 1)
1926
A parameter identiﬁer in a function-like macro shall be uniquely declared within its scope.macro parameter
unique in scope
Commentary
This constraint is the macro equivalent of the one given for objects with no linkage. Its scope is the list
declaration
only one if
no linkage
1350
of parameters in the macro deﬁnition and the body of that deﬁnition. This scope ends at the new-line that
terminates the directive. Macro parameters are also discussed elsewhere.
macro pa-
rameter
scope extends
1934
identiﬁer
macro parameter
396
Semantics
1927
The identiﬁer immediately following the define is called the macro name.macro name

identiﬁer
Commentary
This deﬁnes the term macro name. This term is generically used in software engineering to refer to this kind
of entity.
v 1.2 June 24, 2009
6.10.3 Macro replacement
1931
1928
There is one name space for macro names. macro
one name space
Commentary
Object-like and function-like macro names exist in the same name space. However, an identiﬁer deﬁned as
a function-like macro is only treated as such when its name is followed by an opening parenthesis. Name
1935 function-
like macro
followed by (
spaces are also discussed elsewhere.
438 name space
1929
Any white-space characters preceding or following the replacement list of preprocessing tokens are not
white-space
before/after re-
placement list
considered part of the replacement list for either form of macro.
Commentary
Specifying that such white-space should be considered to part of the replacement list has potential main-
tenance and comprehension costs (it restricts how the start of the replacement list may be indented and
white-space following the replacement list is not immediately visible to readers) for no obvious beneﬁt.
Example
In the following the string literal "_ _" is assigned to p.

1 #define str_ize(a) #a
2 #define M _ _
3
4 char
*
p = str_ize(M);
1930
If a
#
preprocessing token, followed by an identiﬁer, occurs lexically at the point at which a preprocessing
directive could begin, the identiﬁer is not subject to macro replacement.
Commentary
This is a special case of a more general speciﬁcation given elsewhere.
1867 tokens in
directive
not expanded
unless
Common Implementations
Some preprocessors used to perform this kind of replacement (some past entries in the Obfuscated C
contest
[642]
relied on such translator behavior).
Example
In the following, even although the identiﬁer
define
is deﬁned as a macro, the line starting
#define
still
processed as a macro deﬁnition directive, and not as a #undef directive.
1 #define define undef

2
3 #define X Y
1931
A preprocessing directive of the form macro
object-like
# define identifier replacement-list new-line
deﬁnes an object-like macro that causes each subsequent instance of the macro name
146)
to be replaced by
the replacement list of preprocessing tokens that constitute the remainder of the directive.
June 24, 2009 v 1.2
6.10.3 Macro replacement
1931
Commentary
This deﬁnes the term object-like macro. This term is not commonly used by developers, who tend to use the
generic term macro for all macro deﬁnitions and when a distinction needs to be made use the term function
macro (rather than the technically correct term function-like macro) to refer to the case of a macro deﬁned to
have parameters. A macro’s replacement list is commonly known as a macro body, or simply its body.
The preprocessing tokens in a
text-line
are unconditional scanned for instances of macro names to
preprocessor
directives
syntax
1854
expand, as are preprocessing tokens in some preprocessing directives.
The standard lists a few restrictions on identiﬁers that can be deﬁned as macro names. The issue of
predeﬁned
macros
not #deﬁned

2026
implementation limits on the number of macros that may be deﬁned in one preprocessing translation unit is
discussed elsewhere.
limit
macro deﬁnitions
287
Other Languages
Some languages use def rather than define.
Common Implementations
Some preprocessors have a maximum limit on the number of characters that can occur in a replacement
list (e.g., an early version of Microsoft C
[947]
had a 512 byte limit; a limit of 4096 is still seen in some
preprocessors).
Implementations invariably provide a mechanism that is external to the source code for deﬁning macros,
e.g., the -D command line option.
Coding Guidelines
Macros can be deﬁned to serve a variety of purposes (see Table 1931.1 for measurements of actual usage)
including:
•
Giving a symbolic name to a constant or expression. The issue of symbolic names is discussed
elsewhere, as are the advantages of using enumeration types for related identiﬁers.
symbolic
name
822
enumeration
set of named
constants
517
•

Representing an expression without the overhead of a function call. Having made the decision to
represent an expression with a symbolic name the decision on whether to use a function call or macro
then needs to be made, the human decision making factors involved are discussed elsewhere.
agenda
effects
decision making
0
•
Parameterized code duplication. This kind of usage occurs because a function deﬁnition does not
provide the necessary ﬂexibility (for instance, the parameterization may involve constructs other than
expressions).
• Parameterizing the deﬁnition of a type. This issue is discussed in more detail under typedef names.
typedef name
syntax
1629
• Controlling conditional inclusion. In this case their status as a macro deﬁnition is used as a ﬂag.
boolean role 476
Some coding guideline documents recommend against what are sometimes known as syntax changing macro
names. This terminology comes from the fact that uses of such macro names change the syntax, at least
visually, of C. For instance, a developer familiar with Pascal might deﬁne the macro names
begin
and
end
to represent the C punctuators
{
and
}
respectively (this existing usage was one reason these macro
names were not used as alternative spellings, in
<iso646.h>

, for these punctuators; it could have rendered
existing conforming code nonconforming), or a developer wanting to modify existing code to use greater
ﬂoating-point precision might deﬁne the macro name float to be double.
The growth in the usage languages with C-like syntax over the last 10 years means that these days it is
rare for developers to attempt to change the visual appearance of C source to be closer to a language they are
more familiar with. While a macro name that maps to a C token may be surprising to readers of the source, it
is unlikely to conﬂict with their existing C knowledge, and therefore might be considered at worse a minor
inconvenience (i.e., cost).
Deﬁning a macro whose name is the same as a keyword means that the behavior of translated source can
be very different from that expected from its visual appearance (such usage also results in undeﬁned behavior
v 1.2 June 24, 2009
6.10.3 Macro replacement
1931
if the deﬁnition occurs prior to the inclusion of any library header). The presence of such a deﬁnition requires
that readers substitute their existing, default response, knowledge of behavior for a new behavior (assuming
that they had noticed the deﬁnition of the macro). Experience suggests that the short-term beneﬁt of deﬁning
and using such macro names is less than the longer term (which may be only a few days) costs associated
with comprehension and miscomprehension of the affected source.
Cg
1931.1
A source ﬁle shall not deﬁne a macro name to have the spelling of a keyword.
Replacement lists may look innocuous enough when viewed in isolation. However, in the context in which
they occur the expanded form may interact in unexpected ways with adjacent tokens. For instance, looking at
the components of the following source in isolation:
1 #define SUM a + b
2
3 extern int glob;
4
5 void f(void)
6 {

7 int loc = glob
*
SUM;
8 }
the appearance of the replacement list of
SUM
suggests that
a
will be added to
b
and looking at the use of
SUM
in the initialization of
loc
suggests that it will be multiplied by the value of
glob
. However, the token
sequence after macro replacement is glob
*
a+b, which has a very different interpretation.
The visual appearance of a replacement list containing statements can also be misleading. For instance, in:
1 #define INIT c=0; d=0;
2
3 extern int glob;
4
5 void f(void)
6 {
7 if (glob == 0)
8 INIT;
9 }

the assignment to
d
is not dependent on the value of
glob
. Which is counter to what the visual appearance of
the source suggests.
A general solution to both of these problems is to bracket the replacement list, ensuring that the visually
expected behavior is the same as the behavior that occurs after macro replacement.
Cg
1931.2
A replacement list having the form of an expression containing one or more binary operators shall be
bracketed with parentheses, unless the binary operators are only those included in the production of a
postfix-expr.
Cg
1931.3
A replacement list consisting of more than one statement shall be completely enclosed in a pair of
braces (which make take the form of a do statement).
The visual appearance of declarations can also be deceptive when macro replacements are involved. For
instance, in:
June 24, 2009 v 1.2
6.10.3 Macro replacement
1931
1 #define INFO_PTR int
*
2
3 INFO_PTR glob_1,
4 glob_2;
glob_1 is declared to have a pointer type, while glob_2 is declared to have an integer type.
The bracketing technique cannot be used with a replacement list that represents a type (it would violate
C syntax). However, using a typedef name is not a general solution, it is possible to use macro names in

situations where a typedef name cannot be used. For instance, in:
1 #define X_TYPE int
2
3 unsigned X_TYPE glob;
it is possible to modify the type denoted by
X_TYPE
because the macro expanded form represents a valid
integer type when preceded by
unsigned
. However, the type denoted by a typedef name cannot be so
modiﬁed.
type spec-
iﬁers
possible sets of
1382
Cg
1931.4
A replacement list shall not consist of a sequence of preprocessing tokens that has, after expansion,
the syntax of a pointer type.
The replacement list of a macro deﬁnition has to appear on a single logical source line. Experience suggests
logical
source line
118
that constructs that appear on separate lines in other contexts often appear on the same line within a
replacement list. The developer cost (typing the characters) of using splicing, to give the replacement list
physical
source line
118
a visible form that closely resembles that seen when it appears in other contexts is small. The beneﬁt for
subsequent readers is the ability to use the same strategies to read source constructs as they use in other

contexts.
There are a number of ways in which token sequences appearing in various contexts might visually
resemble each other. For instance, in the following deﬁnitions both
ZERO_ARRAY_1
and
ZERO_ARRAY_2
visually associate preprocessing tokens in the macro body, while in
ZERO_ARRAY_3
preprocessing tokens in
the macro body visual interacts with the preprocessing tokens in the preprocessing directive.
1 #define ZERO_ARRAY_1(a, n) for (int i = 0; i < (n); i++) \
2 (a)[i]=0;
3 #define ZERO_ARRAY_2(a, n) \
4 for (int i = 0; i < (n); i++) \
5 (a)[i]=0;
6 #define ZERO_ARRAY_3(a, n) for (int i = 0; i < (n); i++) \
7 (a)[i]=0;
The following guideline recommendation leaves the decision on what constitutes the same visual layout to
developers.
Rev
1931.5
Token sequences shall have the same visual layout in the replacement list of a macro deﬁnition as they
do in other contexts.
Usage
Usage information on the number of macro names deﬁned in source ﬁles is given elsewhere.
limit
macro deﬁnitions
287
v 1.2 June 24, 2009
6.10.3 Macro replacement

1933
Macro names expanded
Translation units
100 200 400
1
10
100
1,000
×
× all macro expansions
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
• •
function-macro expansions
•
•
•
•
•
•
•
•

••
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

•
Figure 1931.1:
Number of translation units containing a given number of macro names which were macro expanded, excluding
expansions that occurred while processing the contents of system headers. Based on the translated form of this book’s benchmark
programs.
Table 1931.1:
Detailed breakdown of the kinds of replacement lists occurring in macro deﬁnitions. Adapted from Ernst, Badros,
and Notkin.
[404]
Replacement List % Example
constant 42 #define ARG_MAX 1000
expression 33 #define SHFT_UP(x) ((x) << 8)
empty 6.9 #define DUMMY
unknown identiﬁer 6.9 #define INTERN_BUF buffer
statement 5.1 #define TERMINATE goto func_end
type 2.1 #define NODE_PTR void
*
other 1.9 #define OPTION -X=23
symbol 1.4 #define ALLOC_STORAGE malloc
syntactic 0.5 #define begin {
Table 1931.2:
Common macro deﬁnitions listed with an abstracted form of their replacement list (as a percentage of all macro
deﬁnitions). Note that function-call may also be a macro invocation. Based on the visible form of the .c and .h ﬁles.
Kind of Macro Deﬁned and Abstract Form of its Replacement List %
object-like macro integer-constant 50.7
object-like macro identiﬁer 5.9
object-like macro expression 5.8
function-like macro function-call 4.7
object-like macro function-call 3.7
object-like macro string-literal 3.4

function-like macro expression 3.4
object-like macro 3.4
object-like macro constant-expression 2.0
function-like macro 1.7
others 15.4
1932
The replacement list is then rescanned for more macro names as speciﬁed below.
Commentary
This sentence was added by the response to DR #306 and removes the possibility of a reader interpreting the
rescanning clause as only applying to function-like macros.
1968 rescanning
1933 macro
function-like
1933
A preprocessing directive of the form macro
function-like
June 24, 2009 v 1.2

The New C Standard- P16

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về