X hits on this document





46 / 396

Chapter ‎18   Unsafe code

2. Lexical structure

2.1 Programs

A C# program consists of one or more source files, known formally as compilation units‎9.1). A source file is an ordered sequence of Unicode characters. Source files typically have a one-to-one correspondence with files in a file system, but this correspondence is not required. For mamixal portability, it is recommended that files in a file system be encoded with the UTF-8 encoding.

Conceptually speaking, a program is compiled using three steps:

Transformation, which converts a file from a particular character repertoire and encoding scheme into a sequence of Unicode characters.

Lexical analysis, which translates a stream of Unicode input characters into a stream of tokens.

Syntactic analysis, which translates the stream of tokens into executable code.

2.2 Grammars

This specification presents the syntax of the C# programming language using two grammars. The lexical grammar (§‎2.2.2) defines how Unicode characters are combined to form line terminators, white space, comments, tokens, and pre-processing directives. The syntactic grammar (§‎2.2.3) defines how the tokens resulting from the lexical grammar are combined to form C# programs.

2.2.1 Grammar notation

The lexical and syntactic grammars are presented using grammar productions. Each grammar production defines a non-terminal symbol and the possible expansions of that non-terminal symbol into sequences of non-terminal or terminal symbols. In grammar productions, non-terminal symbols are shown in italic type, and terminal symbols are shown in a fixed-width font.

The first line of a grammar production is the name of the non-terminal symbol being defined, followed by a colon. Each successive indented line contains a possible expansion of the non-terminal given as a sequence of non-terminal or terminal symbols. For example, the production:

while-statement: while   (   boolean-expression   )   embedded-statement

defines a while-statement to consist of the token while, followed by the token “(”, followed by a boolean-expression, followed by the token “)”, followed by an embedded-statement.

When there is more than one possible expansion of a non-terminal symbol, the alternatives are listed on separate lines. For example, the production:

statement-list: statement statement-list   statement

defines a statement-list to either consist of a statement or consist of a statement-list followed by a statement. In other words, the definition is recursive and specifies that a statement list consists of one or more statements.

A subscripted suffix “opt” is used to indicate an optional symbol. The production:

Copyright Microsoft Corporation 1999-2003. All Rights Reserved.33

Document info
Document views408
Page views408
Page last viewedSat Oct 22 11:48:36 UTC 2016