Hey, POSIX! Just Adopt gawk as Your Standard!
I'm a denizen of the #awk IRC channel. For the most part we recommend to our users that they adhere to writing POSIX-compliant awk programs to attain maximum portability. POSIX is the yardstick for all the major awk versions out there (BSD, nawk, mawk, gawk, etc.). We see spikes in awk use from people who get annoyed at shell syntax and realize that awk is way more expressive and can act as a pretty good replacement for use cases with lots of data manipulation and transformation, and a relative low number of command line tool calls. What peeves us is that gawk has some excellent additions to the awk command set that aren't part of the POSIX standard and that hinder portability if used. This results in portability tricks like the one in Listing 1 to implement equivalent functionality for gawk and POSIX-compliant awk.
1 #!/usr/bin/awk -f 2 3 4 function hex2DecimalPortable(hexString, command, n) { 5 command = sprintf("echo $((%s))", hexString); 6 7 command | getline n 8 close(command); 9 10 return n; 11 } # hex2DecimalPortable 12 13 14 function hex2Decimal(hexString) { 15 if (length(PROCINFO) > 0) # gawk detection 16 return strtonum(hexString); 17 18 return hex2DecimalPortable(hexString); 19 } # hex2Decimal 20 21 22 BEGIN { 23 testValue = "0xff09"; 24 printf("%s = %d\n", testValue, hex2Decimal(testValue)); 25 } # BEGIN 26
Listing 1 - convert hexadecimal strings to decimal values
We end up having portable code at the expense of efficiency. This results in a significant performance difference if the application processes several megabytes (or gigabytes!) of data, like in the use case for which this function was developed. There are many other gawk-only functions that end up being implemented through extra awk coding, or through calls to tools such as /usr/bin/sort
from within the script. Wouldn't it be nice to just make gawk the POSIX standard, since the code is freely available anyway? The standard could evolve from its current static definition to a dynamic implementation such as "POSIX awk is whatever conforms to the stable syntax and command list of gawk as June 30 of every odd year beginning in 2011." Vendors could then look at the implementation at any given level and decide whether to take the gawk source and adopt it in their stack (I leave the boring licensing and philosophical discussions for others to tackle), or take that syntax and command specificatio
n and re-implement as needed.
What would be a better way of keeping the pace of awk evolution? How do we prevent POSIX vapid conformance from forcing us to use explicit portability tricks to have portable code that sacrifice program efficiency?
Scalable Systems Newsletter
Subscribe to the newsletter and get every issue mailed free - with access to the latest system scalability, high availability, and performance news.