Chapter 18 Unsafe code
2. Lexical structure
A C# program consists of one or more source files, known formally as compilation units (§9.1). A source file is an ordered sequence of Unicode characters. Source files typically have a one-to-one correspondence with files in a file system, but this correspondence is not required. For mamixal portability, it is recommended that files in a file system be encoded with the UTF-8 encoding.
Conceptually speaking, a program is compiled using three steps:
Transformation, which converts a file from a particular character repertoire and encoding scheme into a sequence of Unicode characters.
Lexical analysis, which translates a stream of Unicode input characters into a stream of tokens.
Syntactic analysis, which translates the stream of tokens into executable code.
This specification presents the syntax of the C# programming language using two grammars. The lexical grammar (§2.2.2) defines how Unicode characters are combined to form line terminators, white space, comments, tokens, and pre-processing directives. The syntactic grammar (§2.2.3) defines how the tokens resulting from the lexical grammar are combined to form C# programs.
2.2.1 Grammar notation
The lexical and syntactic grammars are presented using grammar productions. Each grammar production defines a non-terminal symbol and the possible expansions of that non-terminal symbol into sequences of non-terminal or terminal symbols. In grammar productions, non-terminal symbols are shown in italic type, and terminal symbols are shown in a fixed-width font.
The first line of a grammar production is the name of the non-terminal symbol being defined, followed by a colon. Each successive indented line contains a possible expansion of the non-terminal given as a sequence of non-terminal or terminal symbols. For example, the production:
while-statement: while ( boolean-expression ) embedded-statement
defines a while-statement to consist of the token while, followed by the token “(”, followed by a boolean-expression, followed by the token “)”, followed by an embedded-statement.
When there is more than one possible expansion of a non-terminal symbol, the alternatives are listed on separate lines. For example, the production:
statement-list: statement statement-list statement
defines a statement-list to either consist of a statement or consist of a statement-list followed by a statement. In other words, the definition is recursive and specifies that a statement list consists of one or more statements.
A subscripted suffix “opt” is used to indicate an optional symbol. The production:
Copyright Microsoft Corporation 1999-2003. All Rights Reserved.33