Friday, December 16, 2011

Learning Linux Commands: sed


1. Introduction

Welcome to the second part of our series, a part that will focus on sed, the GNU version. As you will see, there are several variants of sed, which is available for quite a few platforms, but we will focus on GNU sed versions 4.x. Many of you have already heard about sed and already used it, mainly as a substitution tool. But that is just a segment of what sed can do, and we will do our best to show you as much as possible of what you can do with it. The name stands for Stream EDitor, and here "stream" can be a file, a pipe or simply stdin. We expect you to have basic Linux knowledge and if you already worked with regular expressions or at least know what a regexp is, the better. We don't have the space for a full tutorial on regular expressions, so instead we will only give you a basic idea and lots of sed examples. There are lots of documents that deal with the subject, and we'll even have some recommendations, as you will see in a minute. 

2. Installation

There's not much to tell here, because chances are you have sed installed already, because it's used in various system scripts and an invaluable tool in the life of a Linux user that wants to be efficient. You can test what version you have by typing
$ sed --version
On my system, this command tells me I have GNU sed 4.2.1 installed, plus links to the home page and other useful stuff. The package is named simply 'sed' regardless of the distribution, but if Gentoo offers sed implicitly, I believe that means you can rest assured.

3. Concepts

Before we go further, we feel it's important to point out what exactly is it that sed does, because "stream editor" may not ring too many bells. sed takes the input text, does the specified operations on every line (unless otherwise specified) and prints the modified text. The specified operations can be append, insert, delete or substitute. This is not as simple as it may look: be forewarned that there are a lot of options and combinations that can make a sed command rather difficult to digest. So if you want to use sed, we recommend you learn the basics of regexps, and you can catch the rest as you go. Before we start the tutorial, we want to thank Eric Pement and others for inspiration and for what he's done for everyone who wants to learn and use sed.

4. Regular expressions

As sed commands/scripts tend to become cryptic, we feel that our readers must understand the basic concepts instead of blindly copying and pasting commands they don't know the meaning of. When one wants to understand what a regexp is, the key word is "matching". Or even better, "pattern matching". For example, in a report for your HR department you wrote the name of Nick when referring to the network architect. But Nick moved on and John came to take his place, so now you have to replace the word Nick with John. If the file is called report.txt, you could do
$ cat report.txt | sed 's/Nick/John/g' > report_new.txt
By default sed uses stdout, so you may want to use your shell's redirect operator, as in our example below. This is a most simple example, but we illustrated a few points: we match the pattern "Nick" and we substitute all instances with "John". Note that sed is case-sensitive, so be careful and check your output file to see if all the substitutions were made. The above could have been written also like this:
$ sed 's/Nick/John/g' report.txt > report_new.txt
OK, but where's the regular expressions, you ask? Well, we first wanted to get your feet wet with the concept of matching and here comes the interesting part.
If you aren't sure if you wrote "nick" by mistake instead of "Nick" and want to match that as well, you could use sed 's/Nick|nick/John/g'. The vertical bar has same meaning that you might know if you used C, that is, your expression will match Nick or nick. As you will see, the pipe can be used in other ways too, but its' meaning will remain. Other operators widely used in regexps are '?', that match zero or one instance of the preceding element (flavou?r will match flavor and flavour), '*' means zero or more and '+' matches one or more elements. '^' matches the start of the string, while '$' does the opposite. If you're a vi(m) user, some of these things might look familiar. After all, these utilities, together with awk or C have their roots in the early days of Unix. We won't insist anymore on the subject, as things will become simpler by reading examples, but what you should know is that there are various implementations of regexps: POSIX, POSIX Extended, Perl or various implementations of fuzzy regular expressions, guaranteed to give you a headache.

5. sed examples

Learning Linux sed command with examples
Linux command syntaxLinux command description
sed 's/Nick/John/g' report.txt
Replace every occurrence of Nick with John in report.txt
sed 's/Nick|nick/John/g' report.txt
 
Replace every occurrence of Nick or nick with John.
sed 's/^/ /' file.txt >file_new.txt
Add 8 spaces to the left of a text for pretty printing.
sed -n '/Of course/,/attention you \
pay/p' myfile
Display only one paragraph, starting with "Of course"
and ending in "attention you pay"
sed -n 12,18p file.txt
Show only lines 12-18 of file.txt
sed 12,18d file.txt
Show all of file.txt except for lines from 12 to 18
sed G file.txt 
Double-space file.txt
sed -f script.sed file.txt
Write all commands in script.sed and execute them
sed '5!s/ham/cheese/' file.txt
Replace ham with cheese in file.txt except in the 5th line
sed '$d' file.txt
Delete the last line
sed '/[0-9]\{3\}/p' file.txt
Print only lines with three consecutive digits
sed '/boom/!s/aaa/bb/' file.txt
Unless boom is found replace aaa with bb
sed '17,/disk/d' file.txt
Delete all lines from line 17 to 'disk'
echo ONE TWO | sed "s/one/unos/I"
Replaces one with unos in a case-insensitive manner,
so it will print "unos TWO"
sed 'G;G' file.txt
Triple-space a file
sed 's/.$//' file.txt
A way to replace dos2unix :)
sed 's/^[ ^t]*//' file.txt
Delete all spaces in front of every line of file.txt
sed 's/[ ^t]*$//' file.txt
Delete all spaces at the end of every line of file.txt
sed 's/^[ ^t]*//;s/[ ^]*$//' file.txt
Delete all spaces in front and at the end of every line
of file.txt
sed 's/foo/bar/' file.txt
Replace foo with bar only for the first instance in a line.
sed 's/foo/bar/4' file.txt
Replace foo with bar only for the 4th instance in a line.
sed 's/foo/bar/g' file.txt 
Replace foo with bar for all instances in a line.
sed '/baz/s/foo/bar/g' file.txt
Only if line contains baz, substitute foo with bar
sed '/./,/^$/!d' file.txt
Delete all consecutive blank lines except for EOF
sed '/^$/N;/\n$/D' file.txt
Delete all consecutive blank lines, but allows
only top blank line
sed '/./,$!d' file.txt
Delete all leading blank lines
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' \
file.txt
Delete all trailing blank lines
sed -e :a -e '/\\$/N; s/\\\n//; ta' \
file.txt
If a file ends in a backslash, join it with the next (useful
for shell scripts)
sed '/regex/,+5/expr/'
Match regex plus the next 5 lines
sed '1~3d' file.txt
Delete every third line, starting with the first
sed -n '2~5p' file.txt
Print every 5th line starting with the second
sed 's/[Nn]ick/John/g' report.txt
Another way to write some example above.
Can you guess which one?
sed -n '/RE/{p;q;}' file.txt
Print only the first match of
RE (regular expression)
sed '0,/RE/{//d;}' file.txt
Delete only the first match
sed '0,/RE/s//to_that/' file.txt
Change only the first match
sed 's/^[^,]*,/9999,/' file.csv
Change first field to 9999 in a CSV file
s/^ *\(.*[^ ]\) *$/|\1|/;
s/" *, */"|/g;
: loop
s/| *\([^",|][^,|]*\) *, */|\1|/g;
s/| *, */|\1|/g;
t loop
s/ *|/|/g;
s/| */|/g;
s/^|\(.*\)|$/\1/;
sed script to convert CSV file to bar-separated
(works only on some types of CSV,
with embedded "s and commas)
sed ':a;s/\(^\|[^0-9.]\)\([0-9]\+\)\\
([0-9]\{3\}\)/\1\2,\3/g;ta' file.txt
Change numbers from file.txt from 1234.56 form to 1.234.56
sed -r "s/\<(reg|exp)[a-z]+/\U&/g"
Convert any word starting with reg or exp to uppercase
sed '1,20 s/Johnson/White/g' file.txt
Do replacement of Johnson with White only on
lines between 1 and 20
sed '1,20 !s/Johnson/White/g' file.txt
The above reversed (match all except lines 1-20)
sed '/from/,/until/ { s/\/magenta/g; \
s/\/cyan/g; }' file.txt
Replace only between "from" and "until"
sed '/ENDNOTES:/,$ { s/Schaff/Herzog/g; \
s/Kraft/Ebbing/g; }' file.txt
Replace only from the word "ENDNOTES:" until EOF
sed '/./{H;$!d;};x;/regex/!d' file.txt
Print paragraphs only if they contain regex
sed -e '/./{H;$!d;}' -e 'x;/RE1/!d;\
/RE2/!d;/RE3/!d' file.txt
Print paragraphs only if they contain RE1,
RE2 and RE3
sed ':a; /\\$/N; s/\\\n//; ta' file.txt
Join two lines in the first ends in a backslash
sed 's/14"/fourteen inches/g' file.txt
This is how you can use double quotes
sed 's/\/some\/UNIX\/path/\/a\/new\\
/path/g' file.txt
Working with Unix paths
sed 's/[a-g]//g' file.txt
Remove all characters from a to g from file.txt
sed 's/\(.*\)foo/\1bar/' file.txt
Replace only the last match of foo with bar
sed '1!G;h;$!d' 
A tac replacement
sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1\
/;//D;s/.//'
A rev replacement
sed 10q file.txt
A head replacement
sed -e :a -e '$q;N;11,$D;ba' \
file.txt
A tail replacement
sed '$!N; /^\(.*\)\n\1$/!P; D' \
file.txt
A uniq replacement
sed '$!N; s/^\(.*\)\n\1$/\1/;\
 t; D' file.txt
The opposite (or uniq -d equivalent)
sed '$!N;$!D' file.txt
Equivalent to tail -n 2
sed -n '$p' file.txt
... tail -n 1 (or tail -1)
sed '/regexp/!d' file.txt
grep equivalent
sed -n '/regexp/{g;1!p;};h' file.txt
Print the line before the one matching regexp, but
not the one containing the regexp
sed -n '/regexp/{n;p;}' file.txt
Print the line after the one matching the regexp, but
not the one containing the regexp
sed '/pattern/d' file.txt
Delete lines matching pattern
sed '/./!d' file.txt
Delete all blank lines from a file
sed '/^$/N;/\n$/N;//D' file.txt
Delete all consecutive blank lines
except for the first two
sed -n '/^$/{p;h;};/./{x;/./p;}'\
 file.txt
Delete the last line of each paragraph
sed 's/.\x08//g' file
Remove nroff overstrikes
sed '/^$/q'
Get mail header
sed '1,/^$/d'
Get mail body
sed '/^Subject: */!d; s///;q'
Get mail subject
sed 's/^/> /'
Quote mail message by inserting a
"> " in front of every line
sed 's/^> //'
The opposite (unquote mail message)
sed -e :a -e 's/<[^>]*>//g;/
Remove HTML tags
sed '/./{H;d;};x;s/\n/={NL}=/g'\
 file.txt | sort \
| sed '1s/={NL}=//;s/={NL}=/\n/g'
Sort paragraphs of file.txt alphabetically
sed 's@/usr/bin@&/local@g' path.txt
Replace /usr/bin with /usr/bin/local in path.txt
sed 's@^.*$@<<<&>>>@g' path.txt
Try it and see :)
sed 's/\(\/[^:]*\).*/\1/g' path.txt
Provided path.txt contains $PATH, this will
echo only the first path on each line
sed 's/\([^:]*\).*/\1/' /etc/passwd
awk replacement - displays only the users
from the passwd file
echo "Welcome To The Suresh Stuff" | sed \
's/\(\b[A-Z]\)/\(\1\)/g'
(W)elcome (T)o (T)he (S)uresh (S)tuff
Self-explanatory
sed -e '/^$/,/^END/s/hills/\
mountains/g' file.txt
Swap 'hills' for 'mountains', but only on blocks
of text beginning
with a blank line, and ending with a line beginning
with the three characters 'END', inclusive
sed -e '/^#/d' /etc/services | more
View the services file without the commented lines
sed '$s@\([^:]*\):\([^:]*\):\([^:]*\
\)@\3:\2:\1@g' path.txt
Reverse order of items in the last line of path.txt
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}'\
 -e h file.txt
Print 1 line of context before and after the line matching,
with a line number where the matching occurs
sed '/regex/{x;p;x;}' file.txt
Insert a new line above every line matching regex
sed '/AAA/!d; /BBB/!d; /CCC/!d' file.txt
Match AAA, BBB and CCC in any order
sed '/AAA.*BBB.*CCC/!d' file.txt
Match AAA, BBB and CCC in that order
sed -n '/^.\{65\}/p' file.txt
Print lines 65 chars long or more
sed -n '/^.\{65\}/!p' file.txt
Print lines 65 chars long or less
sed '/regex/G' file.txt
Insert blank line below every line
sed '/regex/{x;p;x;G;}' file.txt
Insert blank line above and below
sed = file.txt | sed 'N;s/\n/\t/'
Number lines in file.txt
sed -e :a -e 's/^.\{1,78\}$/\
 &/;ta' file.txt
Align text flush right
sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e \
's/\( *\)\1/\1/' file.txt
Align text center

6. Conclusion

This is only a part of what can be told about sed, but this series is meant as a practical guide, so we hope it helps you discover the power of Unix tools and become more efficient in your work.