June 27th, 2004


String Lift

My current bathroom book, Programming Perl, has an interesting note in it --the Perl compiler doesn't hoist initializations out of loops, suggesting thatthe programmer should exercise some common sense. This is a bad inconsistancy,I think. For you non-programmers, let me demonstrate what hoisting does..

int num_processed=0;char* thisline;while(thisline = readline("Give me a name:")) { int length=0; length = strlen(thisline); if(length == 0) {break;} printf("The name was %d characters long\n", length); }

A good C compiler will effectively turn it into this (let's ignore non-hoistingoptimizations):int num_processed=0;char* thisline;int length;while(thisline = readline("Give me a name:")) { length = strlen(thisline); if(length == 0) {break;} printf("The name was %d characters long\n", length); }

There are two things to note here, first, length always gets a new value beforeit is read, so the initializing to zero is unneeded. Further, and this is themain point, there's no need to keep creating and destroying the length variableevery time the while block is run through again, so it can do it once outsidethe block instead. At least as of the writing of that Perl book, Perl will notdo this optimization for you. There are no doubt all sorts of otheroptimizations that Perl does for you, but not that, because they want people tomake obvious optimizations themselves. This is important -- it is a good thingto suggest to programmers that they make high-level optimizations, but generallythese are of the kind that a compiler cannot (or at least would be difficult to)make for you. It would be fascinating to design a programming language wherevery rich metadata were provided routinely so really intelligent compilers couldmore easily make optimizations that no compiler of current languages could dosafely.. Perl6 will be making some steps towards this with a rich attributesystem. Anyhow, the reason that this particular optimization should be madeautomatically for the user, contrary to Perl design at that time (hey, maybein Perl 5.8 or later it changed), is another philosophy in programming that Itake to a sort of extreme -- avoid globals. Why avoid globals? It's hard totell when variables change when it could be changed anywhere in the program.Frequently, the globals are not actually used through the entire program anyhow,but some lazy and bad programmers make everything global. I extend the avoidingof globals to .. well, let's call it superscoping. Under this philosophy,even within functions, we try moderately hard to keep the number-of-lines-scopeof variables small. If, for example, we're in the middle of a large, complicatedfunction, and enter an area of the code where we're trying to do somethingcomplex and need a lot of variables for it that don't belong in the rest of thecode, we'll create an unconditioned block to scope those variables. Example:

Instead ofvoid doOpenPrefsFile(...){ // First acquire the lock, then open the file, then read it // Stage 1: Acquireint lockid;int lockmgrsocket;char* fname;FILE* myfile;char errstringLINK="";

  1. define THISLINE_SIZE 80
char thislineLINK;int parseline;int in_block_parse;int in_multiline;

lockid = socket(...)....

// Stage 2: Openmyfile = fopen(fname, ...)if(myfile == NULL) { strncpy(errstring, "Failed to open file: "); strncat(errstring, my_geterror()); ... } // Stage 3: Read

while(thisline = fgets(thisline, THISLINE_SIZE, myfile)) { if(regex_match(thisline, "^\S*#")) { ... } ... parseline++; }}

A superscoping way to do that would be:void doOpenPrefsFile(...){ // First acquire the lock, then open the file, then read itchar* fname;FILE* myfile;

// Stage 1: Acquire

int lockid; int lockmgrsocket; lockid = socket(...) .... }

// Stage 2: Open

char errstringLINK=""; myfile = fopen(fname, ...) if(myfile == NULL) { strncpy(errstring, "Failed to open file: "); strncat(errstring, my_geterror()); ... }

// Stage 3: Read

  1. define THISLINE_SIZE 80
char thislineLINK; int parseline; int in_block_parse; int in_multiline; while(thisline = fgets(thisline, THISLINE_SIZE, myfile)) { if(regex_match(thisline, "^\S*#")) { ... } ... parseline++; } }}
From that, it's obvious that there's no hidden meaning to any ofthose variables outside where they're actually used, because theirscope and their use are much more tightly bound. The compiler doesn'tneed to work as hard (and can be smarter, especially in dynamic languageslike Perl where it's hard to be smart and consistant at the same time).It's tempting, of course, to simply declare that those blocks aresufficiently seperate that they should be their own functions. This issometimes the case, but depending on how much connectedness there isbetween the parts of the function, it can be a pain to pass everythingneeded to break things into functions. This approach offers a middle groundfor when it makes sense, between splitting things off and keeping them ina single function, and in fact makes it easier to break things off latershould that prove desirable. Note, however, that when it makes sense forthe compiler to hoist (most of superscoping applies to conditional blocksas well), it should do it. Dividing what actually happens in the code fromthe conceptually clean version is the point of optimization, and withsuperscoping, it should be a win-win situation, instead of, as in Perl,a lose-lose one (either your code is slower or it's harder to read andpossibly slower anyhow because it's hard to optimize, or, optionally, it'smuch uglier if you use double blocks).