X hits on this document

PDF document

1. INTRODUCTION - page 3 / 10





3 / 10

characters in it. The following program adds one layer of obfuscation, by using a function to print out the "hello, world!\n" string one character at a time:

int i;

main() { for(i=0 ; i<14 ; i++) { write one letter("hello, world!\n" + i); } _ _


write_one_letter(letter) {

write(0,letter,1); }

This makes it harder to see how the program works, but it makes visible some of the trickery that is possible, some would even say encouraged, in C. Notice that part of this program involves adding a string constant and a number, an operation which cannot be done in many strongly typed programming languages. In Java, where addition of String objects is defined as concatenation, evaluating the expression ("string" + 17) involves constructing a String out of the number, then adding the two: the result is "string17". A string constant in C is “really” a number, however, which means that adding a string and a number has an entirely different meaning. The string, seen as a number, is the address in memory where the first character resides. Add one to this number, and the result is the location of the second character. So this for loop, starting at position 0 and finishing at 13, has the effect of sending each character in the string to the write_one_letter function for printing.

To obfuscate the for loop a bit more, the i<14 condition is written in a more elaborate way. Oddly enough, this condition could be written "xxxxxxxxxxxxxx"[i], which has the effect of returning character number i from a string that has 14 characters in it. This yields a positive number (meaning TRUE) until i reaches 14, which corresponds to the end of the string; when the end of the string is reached it returns FALSE. This happens to be the case because strings in C are terminated with NULL, which, in C, means the same thing as FALSE. Now, to make things more puzzling, any array reference in C can either be written a[b] or b[a]. The values of a and b are added together and their sum is used to look up the array entry, so it doesn't matter which one is inside the brackets and which one comes before them. Thus, the condition can be written even more confusingly as i["xxxxxxxxxxxxxx"]. Also, any string that is 14 characters long can be used in this condition. To create additional confusion about the program’s syntax, the fully-obfuscated program uses a different string to create the condition i["]<i;++i){--i;}"]. This makes it difficult to see where the data of the string ends and the code of the program begins.

The function write_one_letter is also given two additional, superfluous parameters and its name is changed to read. Redefining read to be a function that writes one letter is a particularly gruesome move, but this is allowed in C; read is a system call, not a keyword.

int i;

main() { for(i=0 ; i["]<i;++i){--i;}"] ; i++) { read(0,"hello, world!\n" + i,1); }


read(j,letter,p) {

write(0,letter,1); }

The meaningful name letter can be changed to i to make it seem as if this is the same i that was used previously — it is not. And, within the read function, i is written as i--, which suggests that the i up above might be getting decremented when this happens — it is not; this decrementing has no effect because this variable i “expires” immediately, at the end of the function. The call to read can be crammed into the increment part of the for statement, with the ++ operator is placed after i, to increment its value after the statement has been executed; then another + can be added to perform addition and make the puzzling-looking +++. The initialization of i to 0 can be left out. Integer variables in C are set to zero when they are defined, so the i=0 in the program actually has no effect, except to make the program easier to understand. With these changes, the code looks like this:

int i;

main() {

for( ; i["]<i;++i){--i;}"] ; read(0,i+++"hello, world!\n",1));


read(j,i,p) {

write(0,i--,1); }

There are only two differences between this code and the final obfuscated program: the formatting of the text and the use of some confusing ways to write zero and one. To turn to the second of these, one fancy way to write zero is '-'-'-', that is, the numerical value of the '-' character subtracted from itself. Similarly, '/'/'/' divides the numerical value of the '/' character by itself, giving one. (Doing arithmetic with characters, like adding numbers and strings, is also not the most standard programming practice, although programmers are of course aware that characters have numerical representations.) The fancy zero and fancy one values that are obtained by doing this are passed to the read function as the variables j and p; that function then uses other elaborate ways to write zero and one. j/p+p is always 0/2 in this code and thus always zero. i/i is always one. i---j is a way of writing (i--)-j, and, since j has the value zero, this does a meaningless subtraction and is the same as just writing i--. Adding in these elaborate ways of expressing zero and one, the code looks like this:

Document info
Document views37
Page views37
Page last viewedWed Jan 18 02:38:25 UTC 2017