The Tesla Testament
Amazon best-seller December 2007!

Non-stop action. A vulnerable hero. A quest to save the world. The Tesla Testament is the most exciting novel of the decade.


Developing With Google App Engine

This book introduces development with Google App Engine, a platform that provides developers and users with infrastructure that Google itself uses for deploying their massively scalable applications.

Member of The Internet Defense League

Hey, POSIX! Just Adopt gawk as Your Standard!

I'm a denizen of the #awk IRC channel. For the most part we recommend to our users that they adhere to writing POSIX-compliant awk programs to attain maximum portability. POSIX is the yardstick for all the major awk versions out there (BSD, nawk, mawk, gawk, etc.). We see spikes in awk use from people who get annoyed at shell syntax and realize that awk is way more expressive and can act as a pretty good replacement for use cases with lots of data manipulation and transformation, and a relative low number of command line tool calls. What peeves us is that gawk has some excellent additions to the awk command set that aren't part of the POSIX standard and that hinder portability if used. This results in portability tricks like the one in Listing 1 to implement equivalent functionality for gawk and POSIX-compliant awk.


 1 #!/usr/bin/awk -f 

 2 

 3 

 4 function hex2DecimalPortable(hexString,    command, n) {

 5   command = sprintf("echo $((%s))", hexString);

 6 

 7   command | getline n

 8   close(command);

 9 

10   return n;

11 } # hex2DecimalPortable

12 

13 

14 function hex2Decimal(hexString) {

15   if (length(PROCINFO) > 0)     # gawk detection

16     return strtonum(hexString);

17 

18   return hex2DecimalPortable(hexString);

19 } # hex2Decimal

20 

21 

22 BEGIN {

23   testValue = "0xff09";

24   printf("%s = %d\n", testValue, hex2Decimal(testValue));

25 } # BEGIN

26 

Listing 1 - convert hexadecimal strings to decimal values

We end up having portable code at the expense of efficiency. This results in a significant performance difference if the application processes several megabytes (or gigabytes!) of data, like in the use case for which this function was developed. There are many other gawk-only functions that end up being implemented through extra awk coding, or through calls to tools such as /usr/bin/sort from within the script. Wouldn't it be nice to just make gawk the POSIX standard, since the code is freely available anyway? The standard could evolve from its current static definition to a dynamic implementation such as "POSIX awk is whatever conforms to the stable syntax and command list of gawk as June 30 of every odd year beginning in 2011." Vendors could then look at the implementation at any given level and decide whether to take the gawk source and adopt it in their stack (I leave the boring licensing and philosophical discussions for others to tackle), or take that syntax and command specificatio n and re-implement as needed.

What would be a better way of keeping the pace of awk evolution? How do we prevent POSIX vapid conformance from forcing us to use explicit portability tricks to have portable code that sacrifice program efficiency?

(Leave a comment)

Scalable Systems Newsletter

Subscribe to the newsletter and get every issue mailed free - with access to the latest system scalability, high availability, and performance news.