How to kill blank lines elegantly
Article posted 16/03/2014
If you do a Web search for 'linux delete blank lines' you'll find lots of command-line advice. Some of the advice, though, only applies to special cases. Here I explain two elegant and general methods for finding and deleting any blank lines in a text file.
As a sample text I'll use the 8-line file blanks, shown below.
The goal here is to reduce the text to the five non-blank lines, as shown.
And the winner is...
The simplest answer uses AWK. Just pass the text through
The default action of the
awk command is to print what it finds on each line. In this case we ask AWK to find NF, which is the number of fields on a line. Fields are recognised by AWK as strings of text between field separators. The default field separators on a line are whitespaces and tabs. If AWK finds no field separators at all or no strings between field separators, as in the three blank lines in our sample, then NF in those lines is zero, and the lines are ignored because no fields were found.
Notice that NF is not enclosed here by single quotes. The 'search-pattern + action' part of an AWK command is normally surrounded by single quotes so that the shell doesn't interpret anything in that part. Since NF means nothing to the shell, single quotes aren't needed. (See below for more on NF.)
Also an elegant command but a little harder to understand, because it uses a POSIX character class and regular expressions. The character class [:blank:] includes the whitespace and tab characters. The asterisk (*) after the class means 'any number of occurrences of that expression'. The initial caret (^) means that the character class begins the line. The dollar sign ($) means that any number of occurrences of the character class also finish the line, with no additional characters in between.
grep command searched the sample looking for that pattern, it would find all three blank lines. The -v option tells
grep to invert the search and find all the lines that don't match that pattern.
Nice try, but...
I like this use of the
tr command. The -s option 'squeezes' multiple occurrences of a single character into one occurrence. In our sample the newline character \n occurs twice in a row: once at the end of line 1, and once at the end of the next (blank) line. By squeezing \n,
tr, removes that blank line. But not the other two blank lines...
Here again the
grep command has its search inverted by the -v option. In other words,
grep looks for lines that don't have any characters between the start of the line (^) and the end of the line ($). The first blank line is removed, but not the next two blank lines, which contain invisible tab and whitespace characters. Better luck next time.
Many of the answers I've seen on the Web to the 'delete blank lines' question assume that all the non-blank lines start with a printing character. Here the
grep command looks for lines beginning with any visible character (you get the same result with [:alnum:] and [:word:]). The non-blank lines beginning with whitespace and tab are missed out, alas.
grep searches for lines beginning with anything in the [:print:] character class, which includes whitespace. The 'blank' line with whitespace is found, and the line beginning with a tab isn't. Fail.
More on NF
awk NF is shorthand for
awk 'NF != 0', or 'print those lines where the number of fields is not zero':
And you can see how AWK counts fields by getting it to print the number of fields in each line:
Since we didn't specify a field separator, AWK decided to use whitespace, and the field number for each non-blank line in blanks is simply the number of whitespace-separated words.