Oct 282008

AWK is a general purpose programming language that is designed for processing text-based data, either in files or data streams, and was created at Bell Labs in the 1970s

I noticed that Erik Wendelin wrote an article “awk is a beautiful tool.” In this article he said that it was best to introduce Awk with practical examples. I totally agree with Erik.

Eric Pement’s Awk one-liner collection consists of five sections:

  • 1. File spacing,
  • 2. Numbering and calculations,
  • 3. Text conversion and substitution,
  • 4. Selective printing of certain lines,
  • 5. Selective deleting of certain lines.

The first part of the article will explain the first two sections: “File spacing” and “Numbering and calculations.” The second part will explain “Text conversion and substitution”, and the last part “Selective printing/deleting of certain lines.”

These one-liners work with all versions of awk, such as nawk (AT&T’s new awk), gawk (GNU’s awk), mawk (Michael Brennan’s awk) and oawk (old awk).

Let’s start!

1. Line Spacing

1. Double-space a file.

awk '1; { print "" }'   filname.ext

So how does it work? A one-liner is an Awk program and every Awk program consists of a sequence of pattern-action statements “pattern { action statements }“. In this case there are two statements “1″ and “{ print “” }”. In a pattern-action statement either the pattern or the action may be missing. If the pattern is missing, the action is applied to every single line of input. A missing action is equivalent to ‘{ print }’. Thus, this one-liner translates to:

awk '1 { print } { print "" }'   filname.ext

An action is applied only if the pattern matches, i.e., pattern is true. Since ’1′ is always true, this one-liner translates further into two print statements:

awk '{ print } { print "" }'   filname.ext

Every print statement in Awk is silently followed by an ORS – Output Record Separator variable, which is a newline by default. The first print statement with no arguments is equivalent to “print $0″, where $0 is a variable holding the entire line. The second print statement prints nothing, but knowing that each print statement is followed by ORS, it actually prints a newline. So there we have it, each line gets double-spaced.

2. Another way to double-space a file.

awk 'BEGIN { ORS="\n\n" }; 1'   filname.ext

BEGIN is a special kind of pattern which is not tested against the input. It is executed before any input is read. This one-liner double-spaces the file by setting the ORS variable to two newlines. As I mentioned previously, statement “1″ gets translated to “{ print }”, and every print statement gets terminated with the value of ORS variable.

3. Double-space a file so that no more than one blank line appears between lines of text.

awk 'NF { print $0 "\n" }'   filname.ext

The one-liner uses another special variable called NF – Number of Fields. It contains the number of fields the current line was split into. For example, a line “this is a test” splits in four pieces and NF gets set to 4. The empty line “” does not split into any pieces and NF gets set to 0. Using NF as a pattern can effectively filter out empty lines. This one liner says: “If there are any number of fields, print the whole line followed by newline.”

4. Triple-space a file.

awk '1; { print "\n" }'   filname.ext

This one-liner is very similar to previous ones. ’1′ gets translated into ‘{ print }’ and the resulting Awk program is:

awk '{ print; print "\n" }'   filname.ext

It prints the line, then prints a newline followed by terminating ORS, which is newline by default.

2. Numbering and Calculations

5. Number lines in each file separately.

awk '{ print FNR "\t" $0 }'   filname.ext

This Awk program appends the FNR – File Line Number predefined variable and a tab (\t) before each line. FNR variable contains the current line for each file separately. For example, if this one-liner was called on two files, one containing 10 lines, and the other 12, it would number lines in the first file from 1 to 10, and then resume numbering from one for the second file and number lines in this file from 1 to 12. FNR gets reset from file to file.

6. Number lines for all files together.

awk '{ print NR "\t" $0 }'   filname.ext

This one works the same as #5 except that it uses NR – Line Number variable, which does not get reset from file to file. It counts the input lines seen so far. For example, if it was called on the same two files with 10 and 12 lines, it would number the lines from 1 to 22 (10 + 12).

7. Number lines in a fancy manner.

awk '{ printf("%5d : %s\n", NR, $0) }'   filname.ext

This one-liner uses printf() function to number lines in a custom format. It takes format parameter just like a regular printf() function. Note that ORS does not get appended at the end of printf(), so we have to print the newline (\n) character explicitly. This one right-aligns line numbers, followed by a space and a colon, and the line.

8. Number only non-blank lines in files.

awk 'NF { $0=++a " :" $0 }; { print }'   filname.ext

Awk variables are dynamic; they come into existence when they are first used. This one-liner pre-increments variable ‘a’ each time the line is non-empty, then it appends the value of this variable to the beginning of line and prints it out.

9. Count lines in files (emulates wc -l).

awk 'END { print NR }'   filname.ext

END is another special kind of pattern which is not tested against the input. It is executed when all the input has been exhausted. This one-liner outputs the value of NR special variable after all the input has been consumed. NR contains total number of lines seen (= number of lines in the file).

10. Print the sum of fields in every line.

awk '{ s = 0; for (i = 1; i <= NF; i++) s = s+$i; print s }'   filname.ext

Awk has some features of C language, like the for (;;) { … } loop. This one-liner loops over all fields in a line (there are NF fields in a line), and adds the result in variable ’s’. Then it prints the result out and proceeds to the next line.

11. Print the sum of fields in all lines.

awk '{ for (i = 1; i <= NF; i++) s = s+$i }; END { print s }'   filname.ext

This one-liner is basically the same as #10, except that it prints the sum of all fields. Notice how it did not initialize variable ’s’ to 0. It was not necessary as variables come into existence dynamically.

12. Replace every field by its absolute value.

awk '{ for (i = 1; i <= NF; i++) if ($i < 0) $i = -$i; print }'   filname.ext

This one-liner uses two other features of C language, namely the if (…) { … } statement and omission of curly braces. It loops over all fields in a line and checks if any of the fields is less than 0. If any of the fields is less than 0, then it just negates the field to make it positive. Fields can be addresses indirectly by a variable. For example, i = 5; $i = ‘hello’, sets field number 5 to string ‘hello’.

Here is the same one-liner rewritten with curly braces for clarity. The ‘print’ statement gets executed after all the fields in the line have been replaced by their absolute values.

awk '{
  for (i = 1; i <= NF; i++) {
    if ($i < 0) {
      $i = -$i;
}'   filname.ext

13. Count the total number of fields (words) in a file.

awk '{ total = total + NF }; END { print total }'   filname.ext

This one-liner matches all the lines and keeps adding the number of fields in each line. The number of fields seen so far is kept in a variable named ‘total’. Once the input has been processed, special pattern ‘END { … }’ is executed, which prints the total number of fields.

14. Print the total number of lines containing word “Beth”.

awk '/Beth/ { n++ }; END { print n+0 }'   filname.ext

This one-liner has two pattern-action statements. The first one is ‘/Beth/ { n++ }’. A pattern between two slashes is a regular expression. It matches all lines containing pattern “Beth” (not necessarily the word “Beth”, it could as well be “Bethe” or “theBeth333″). When a line matches, variable ‘n’ gets incremented by one. The second pattern-action statement is ‘END { print n+0 }’. It is executed when the file has been processed. Note the ‘+0′ in ‘print n+0′ statement. It forces ’0′ to be printed in case there were no matches (’n’ was undefined). Had we not put ‘+0′ there, an empty line would have been printed.

15. Find the line containing the largest (numeric) first field.

awk '$1 > max { max=$1; maxline=$0 }; END { print max, maxline }'   filname.ext

This one-liner keeps track of the largest number in the first field (in variable ‘max’) and the corresponding line (in variable ‘maxline’). Once it has looped over all lines, it prints them out.

16. Print the number of fields in each line, followed by the line.

awk '{ print NF ":" $0 } '   filname.ext

This one-liner just prints out the predefined variable NF – Number of Fields, which contains the number of fields in the line, followed by a colon and the line itself.

17. Print the last field of each line.

awk '{ print $NF }'   filname.ext

Fields in Awk need not be referenced by constants. For example, code like ‘f = 3; print $f’ would print out the 3rd field. This one-liner prints the field with the value of NF. $NF is last field in the line.

18. Print the last field of the last line.

awk '{ field = $NF }; END { print field }'   filname.ext

This one-liner keeps track of the last field in variable ‘field’. Once it has looped all the lines, variable ‘field’ contains the last field of the last line, and it just prints it out.

19. Print every line with more than 4 fields.

awk 'NF > 4'   filname.ext

This one-liner omits the action statement. As I noted in one-liner #1, a missing action statement is equivalent to ‘{ print }’.

20. Print every line where the value of the last field is greater than 4.

awk '$NF > 4'   filname.ext

This one-liner is similar to #17. It references the last field by NF variable. If it’s greater than 4, it prints it out.

Enjoy !!

  • Share/Bookmark

  14 Responses to “AWK ! A boon for CLI enthusiasts”

  1. Nish and Me search this combination of command that is useful
    if you want to sum up the size of the files in perticuler directory

    ls -l | awk ‘{s = s+$5 }; END { print s }’

  2. dude This is so cool i didnt know somehting like this even existed. I am experimenting with this soon

  3. Where are part 2 and 3 ??

    Thanks anyways … it was a very nice illustration …..

  4. I lately came across your blog and have been learning along. I thought I would leave my first comment. I don

  5. Transparency of accounting systems is another good indicator of honest online casinos. When testimonials are published make sure that it is from more than a set of initials, and make sure they have a state or country. Free casino chips may be ‘virtual’ in concept, but they are used by real people!

    If you did the same you’d have deposited $550 via Neteller $550, and have a total balance of $980. (PLUS another $50 in value for the Bentley Key Code.
    Cash Balance: $550.00
    Bonus Balance: $430.00
    Total Balance: $980.00 (Almost doubled your money from the start)
    online casino bonuses
    Casino slots, blackjack, roulette, and other casino games are a relaxing way to unwind after a day of work. You may choose to open an account with a specific online casino operator. Some online gaming sites offer bonus dollars into your account upon sign-up. Many have 24/7 virtual support to help you understand their site and all it offers the player. Do you have a competitive streak? Do you want to match yourself up against other players? Many online casinos offer casino slots and table game tournaments. Enjoy the competition from your own home while in your pajamas. Try getting away with that in Vegas or Atlantic City, or anywhere else.

    It is very important that the casino you choose to register for the offer at any of your favorite games, such as virtual table games, slots and rollers. The online casinos usually have a list of games they offer on their website to see the potential players. In case your favorite games are not listed, it is advisable to proceed to the next casino.

    Many online casinos feature a low entry cost and attractive payouts because they want to convert the millions of online entertainment seekers into casino players. Sure enough, online casinos are places where some fantastic entertainment can be had at a very small price. Having said that, it is important for a player to select a casino only after running a thorough check on it because a casino has to be reliable and it must inspire trust in the player.

  6. You should definitely take advantage of the free $50 bonus being offered at our brand new Bet Phoenix Casino. Available now at: Bet Phoenix. Good luck!

  7. How do I scan a file and print all content between every occurence of <svg and nearest into separate files? Need to recover some svgs from a formatted partition.

  8. *and nearest </svg>

  9. Great work!!! I find this very useful! Thanks….

  10. 11. Print the sum of fields in all lines.

    awk ‘{ for (i = 1; i <= NF; i++) s = s+$i }; END { print s }’ filname.ext

    this command doesnt work out…….

    please do try this

    ” awk ‘{for(i = 1; i <= NF; i++) s = s+$i } END { print s }' num.data"

  11. please ignore the above comment

  12. This tutorial is very good and effective. A pity the following parts were not made available !

    In example 2, the semicolon is useless. You could as well have this :

    awk ‘BEGIN { ORS=”\n\n” } 1′

    instead of :

    awk ‘BEGIN { ORS=”\n\n” }; 1′

  13. Is there a way to use awk to get rid of the comment spam?

  14. your examples contain < (html escape for less than) instead of < (less than symbol)

 Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">