Friday, December 16, 2011

Unix and Linux: a bit of history

1. Introduction

If you don't like history, don't worry, I'm not a big fan either. But this kind of history is different: it's (supposedly) fun, educative, and right on time. Right on time because not so long ago one of the founding fathers of Unix, Dennis Ritchie, passed away, so we felt this article was in order. This is not the kind of article where you're supposed to have some knowledge about this and that. Many people don't know the origins of their operating system of choice, and in order to understand it better, you should know where it's coming from.

2. Unix

The Unix name is derived from Unics, derived from Multics, which was a mainframe OS developed in the 1960's. Multics had lots of problems, so a handful of good people at Bell Labs started scaling it down. These people were Ken Thompson, Dennis Ritchie, M. D. McIlroy, and J. F. Ossanna, and their work started to take shape enough that in 1970, their offspring was christened Unics, which stands for Uniplexed Information and Computing Service, obviously a pun on Multics. Since later Unics became multi-user, it became Unix. Up to here, nothing out of the ordinary. But the founding fathers never thought about the success Unix would have in years to come.

After Unix got rewritten in C, AT&T started distributing it to universities and government institutions, together with source code. The versioning was done after the user manuals, so the terms "edition" and "version" were used to basically refer to the same number. Slowly, the '80s began, and so did trouble. AT&T was involved in a painful antitrust lawsuit with the Department of Justice, which broke Bell Labs and allowed Unix to be sold as a product. By that time, with improvements from Berkeley such as the vi text editor or the curses library, System V release 1 was out the door, which became one of the first commercial Unices, but that almost destroyed Unix. The year is 1983, and in the same year Richard Stallman started the GNU Foundation. The commercial move had another consequence: it was less university-friendly than before, so a group from Berkeley started hacking on Unix on their own. Many improvements that first appeared in BSD are here to this day in many operating systems, even older versions of Windows: the C shell, the TCP/IP stack or the sockets API.

Later, Bill Joy, father of vi and co-parent to other widely used pieces of software like NFS, went on and co-founded Sun Microsystems (now bought by Oracle), which is "responsible" for SunOS/Solaris, Java, the SPARC processor and many, many more. Other companies started selling SystemV or BSD-based systems of their own, and thus started what is called "the Unix wars". The initial idea was to create a Unix standard that would make everyone's lives easier, since Unix was successful already and there were many vendors selling it, but what came out instead was a bunch of groups, each with its' own standards and yelling at each other. If this reminds you of something that is happening right now, you got the idea. In the end, IEEE pushed the POSIX standard, which is in effect today as well. POSIX stands for Portable Operating System Interface for uniX and it defines APIs, shells and utilities an OS should have in order to be compliant.

As the '90s started, the Intel platform slowly started advancing on the market as a viable alternative to the expensive proprietary solutions of the time. This was a period when giants like Sun, HP or DEC dominated the market, but the machines were so expensive only big institutions could afford them, so the Intel processor was like a breath of fresh air. And, as some expected, AT&T sued Berkeley because of trademark issues and proprietary AT&T code. This slowed down development, and didn't do well on BSD's popularity either.

1991 is the year a lot of things happened that were essential to what Open Source software is today. First, the appearance of 386BSD, a free software BSD offshoot, that would serve as base for NetBSD and FreeBSD, and later OpenBSD as derived from NetBSD in 1995. Second, Linus Torvalds and his Linux kernel, started with the now popular announcement on the Minix newsgroup.

We don't want you to get bored with political/commercial events that followed in the '90s. The next important event comes in 1997, when Apple decided to rebase its' OS on NEXTSTEP, and implicitly on BSD, making it the most popular desktop/workstation OS based on Unix.

Important Unix operating systems that existed or still exist are Apple MacOSX, Oracle Solaris, HP HP-UX, SGI Irix or IBM AIX. All these systems can be named Unix, because they are compliant to the Single Unix Specification and the entities that sell them pay a yearly fee to the Open Group, actual owner of the Unix trademark. Since Linux or the BSDs aren't totally compliant, they are said to be Unix-like, although the Open Group frowns upon the term.

3. Linux

There are many times when instead of Linux you see GNU/Linux. That's because Linux is just a kernel that needs userland utilities to make a complete OS. Here's where GNU software comes into play. By the time Linus Torvalds announced his kernel, the GNU project had all the userland utilities written, with the purpose of 100% Free Software, licensed by the GPL. But lower-level software was lacking or incomplete, so Linus needed a userland implementation, GNU needed a kernel and what's today known as a Linux distribution slowly started to appear with the help of hundreds, then thousands, of people across the globe. This is where the USLA vs BSD lawsuit did good: 386BSD wasn't yet released in 1991 because of legal problems, although it was already under development. Linus Torvalds said: " If 386BSD had been available when I started on Linux, Linux would probably never had happened." (1993, Meta Magazine).

The oldest distribution that still lives today is Slackware, but Debian and Red Hat were also announced in the same year, only few months later. Debian is known for its' plethora of derivatives, including popular ones like Ubuntu or Mint, and Red Hat is the father of Fedora plus the best-selling enterprise-grade distribution. Now Linux is everywhere, in routers, TVs, phones, servers, supercomputers or laptops. Linus Torvalds had the luck or insight to find the perfect moment for starting Linux: he wrote it for the PC, which was inexpensive, so it could be accessible to people like you and me. But those people couldn't afford a SunOS license, for example, so they needed an inexpensive or even free OS. And that's where Linux helped and this is how it became what it is today.

4. Licensing

One of the holy wars in the IT world is Linux vs BSD. But not necessarily is this related to the technical qualities or lack thereof. It's about licensing. In the Linux world, the most used licensing scheme is the GNU General Public License, which states that you can do what you want with the software at no cost at all, but if you want to modify the work and pass it on, you'll have to do that under the same license. Some argue that it's not freedom in the full sense of it, and that software should be free in any way possible. We, of course, will remain neutral. But the standard when it comes to permissive licensing is the BSD family of licenses. There are a few variants of the license, but we'll give you the general idea: "you can copy the code, do whatever you want with it, but if you redistribute it don't delete the license text from there and be careful how you use my name". Now, there are lots of people that don't really care about licensing, and for those who do, we won't try to influence you. If you're a software developer on either Linux or BSD that doesn't mean you're limited to the GPL or BSD licenses. There are other liberal licenses, even a WTFPL license, which is a "do what the **** you want license". Read them and make your own mind. Of course you are encouraged to use Free/Open licenses, but that doesn't mean that you can't make money out of what you wrote. After all, look at Red Hat' s success.

5. Conclusion

Unix history is a pretty convoluted trip that takes way more space than we have here. There are a lot of articles out there with lots of details and information. This was intended as a short trip to memory lane and we hope this was informative and useful to you.

Learning Linux Commands: sed

1. Introduction

Welcome to the second part of our series, a part that will focus on sed, the GNU version. As you will see, there are several variants of sed, which is available for quite a few platforms, but we will focus on GNU sed versions 4.x. Many of you have already heard about sed and already used it, mainly as a substitution tool. But that is just a segment of what sed can do, and we will do our best to show you as much as possible of what you can do with it. The name stands for Stream EDitor, and here "stream" can be a file, a pipe or simply stdin. We expect you to have basic Linux knowledge and if you already worked with regular expressions or at least know what a regexp is, the better. We don't have the space for a full tutorial on regular expressions, so instead we will only give you a basic idea and lots of sed examples. There are lots of documents that deal with the subject, and we'll even have some recommendations, as you will see in a minute.

2. Installation

There's not much to tell here, because chances are you have sed installed already, because it's used in various system scripts and an invaluable tool in the life of a Linux user that wants to be efficient. You can test what version you have by typing

$ sed --version

On my system, this command tells me I have GNU sed 4.2.1 installed, plus links to the home page and other useful stuff. The package is named simply 'sed' regardless of the distribution, but if Gentoo offers sed implicitly, I believe that means you can rest assured.

3. Concepts

Before we go further, we feel it's important to point out what exactly is it that sed does, because "stream editor" may not ring too many bells. sed takes the input text, does the specified operations on every line (unless otherwise specified) and prints the modified text. The specified operations can be append, insert, delete or substitute. This is not as simple as it may look: be forewarned that there are a lot of options and combinations that can make a sed command rather difficult to digest. So if you want to use sed, we recommend you learn the basics of regexps, and you can catch the rest as you go. Before we start the tutorial, we want to thank Eric Pement and others for inspiration and for what he's done for everyone who wants to learn and use sed.

4. Regular expressions

As sed commands/scripts tend to become cryptic, we feel that our readers must understand the basic concepts instead of blindly copying and pasting commands they don't know the meaning of. When one wants to understand what a regexp is, the key word is "matching". Or even better, "pattern matching". For example, in a report for your HR department you wrote the name of Nick when referring to the network architect. But Nick moved on and John came to take his place, so now you have to replace the word Nick with John. If the file is called report.txt, you could do

$ cat report.txt | sed 's/Nick/John/g' > report_new.txt

By default sed uses stdout, so you may want to use your shell's redirect operator, as in our example below. This is a most simple example, but we illustrated a few points: we match the pattern "Nick" and we substitute all instances with "John". Note that sed is case-sensitive, so be careful and check your output file to see if all the substitutions were made. The above could have been written also like this:

$ sed 's/Nick/John/g' report.txt > report_new.txt

OK, but where's the regular expressions, you ask? Well, we first wanted to get your feet wet with the concept of matching and here comes the interesting part.

If you aren't sure if you wrote "nick" by mistake instead of "Nick" and want to match that as well, you could use sed 's/Nick|nick/John/g'. The vertical bar has same meaning that you might know if you used C, that is, your expression will match Nick or nick. As you will see, the pipe can be used in other ways too, but its' meaning will remain. Other operators widely used in regexps are '?', that match zero or one instance of the preceding element (flavou?r will match flavor and flavour), '*' means zero or more and '+' matches one or more elements. '^' matches the start of the string, while '$' does the opposite. If you're a vi(m) user, some of these things might look familiar. After all, these utilities, together with awk or C have their roots in the early days of Unix. We won't insist anymore on the subject, as things will become simpler by reading examples, but what you should know is that there are various implementations of regexps: POSIX, POSIX Extended, Perl or various implementations of fuzzy regular expressions, guaranteed to give you a headache.

5. sed examples

Learning Linux sed command with examples
Linux command syntax	Linux command description

sed 's/Nick/John/g' report.txt	Replace every occurrence of Nick with John in report.txt
sed 's/Nick\|nick/John/g' report.txt	Replace every occurrence of Nick or nick with John.
sed 's/^/ /' file.txt >file_new.txt	Add 8 spaces to the left of a text for pretty printing.
sed -n '/Of course/,/attention you \ pay/p' myfile	Display only one paragraph, starting with "Of course" and ending in "attention you pay"
sed -n 12,18p file.txt	Show only lines 12-18 of file.txt
sed 12,18d file.txt	Show all of file.txt except for lines from 12 to 18
sed G file.txt	Double-space file.txt
sed -f script.sed file.txt	Write all commands in script.sed and execute them
sed '5!s/ham/cheese/' file.txt	Replace ham with cheese in file.txt except in the 5th line
sed '$d' file.txt	Delete the last line
sed '/[0-9]\{3\}/p' file.txt	Print only lines with three consecutive digits
sed '/boom/!s/aaa/bb/' file.txt	Unless boom is found replace aaa with bb
sed '17,/disk/d' file.txt	Delete all lines from line 17 to 'disk'
echo ONE TWO \| sed "s/one/unos/I"	Replaces one with unos in a case-insensitive manner, so it will print "unos TWO"
sed 'G;G' file.txt	Triple-space a file
sed 's/.$//' file.txt	A way to replace dos2unix :)
sed 's/^[ ^t]*//' file.txt	Delete all spaces in front of every line of file.txt
sed 's/[ ^t]*$//' file.txt	Delete all spaces at the end of every line of file.txt
sed 's/^[ ^t]//;s/[ ^]$//' file.txt	Delete all spaces in front and at the end of every line of file.txt
sed 's/foo/bar/' file.txt	Replace foo with bar only for the first instance in a line.
sed 's/foo/bar/4' file.txt	Replace foo with bar only for the 4th instance in a line.
sed 's/foo/bar/g' file.txt	Replace foo with bar for all instances in a line.
sed '/baz/s/foo/bar/g' file.txt	Only if line contains baz, substitute foo with bar
sed '/./,/^$/!d' file.txt	Delete all consecutive blank lines except for EOF
sed '/^$/N;/\n$/D' file.txt	Delete all consecutive blank lines, but allows only top blank line
sed '/./,$!d' file.txt	Delete all leading blank lines
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' \ file.txt	Delete all trailing blank lines
sed -e :a -e '/\\$/N; s/\\\n//; ta' \ file.txt	If a file ends in a backslash, join it with the next (useful for shell scripts)
sed '/regex/,+5/expr/'	Match regex plus the next 5 lines
sed '1~3d' file.txt	Delete every third line, starting with the first
sed -n '2~5p' file.txt	Print every 5th line starting with the second
sed 's/[Nn]ick/John/g' report.txt	Another way to write some example above. Can you guess which one?
sed -n '/RE/{p;q;}' file.txt	Print only the first match of RE (regular expression)
sed '0,/RE/{//d;}' file.txt	Delete only the first match
sed '0,/RE/s//to_that/' file.txt	Change only the first match
sed 's/^[^,]*,/9999,/' file.csv	Change first field to 9999 in a CSV file
s/^ $.[^ ]$ $/\|\1\|/; s/" , /"\|/g; : loop s/\| $[^",\|][^,\|]$ , /\|\1\|/g; s/\| , /\|\1\|/g; t loop s/ \|/\|/g; s/\| /\|/g; s/^\|$.$\|$/\1/;	sed script to convert CSV file to bar-separated (works only on some types of CSV, with embedded "s and commas)
sed ':a;s/$^\\|[^0-9.]$$[0-9]\+$\\ ([0-9]\{3\}\)/\1\2,\3/g;ta' file.txt	Change numbers from file.txt from 1234.56 form to 1.234.56
sed -r "s/\<(reg\|exp)[a-z]+/\U&/g"	Convert any word starting with reg or exp to uppercase
sed '1,20 s/Johnson/White/g' file.txt	Do replacement of Johnson with White only on lines between 1 and 20
sed '1,20 !s/Johnson/White/g' file.txt	The above reversed (match all except lines 1-20)
sed '/from/,/until/ { s/\/magenta/g; \ s/\/cyan/g; }' file.txt	Replace only between "from" and "until"
sed '/ENDNOTES:/,$ { s/Schaff/Herzog/g; \ s/Kraft/Ebbing/g; }' file.txt	Replace only from the word "ENDNOTES:" until EOF
sed '/./{H;$!d;};x;/regex/!d' file.txt	Print paragraphs only if they contain regex
sed -e '/./{H;$!d;}' -e 'x;/RE1/!d;\ /RE2/!d;/RE3/!d' file.txt	Print paragraphs only if they contain RE1, RE2 and RE3
sed ':a; /\\$/N; s/\\\n//; ta' file.txt	Join two lines in the first ends in a backslash
sed 's/14"/fourteen inches/g' file.txt	This is how you can use double quotes
sed 's/\/some\/UNIX\/path/\/a\/new\\ /path/g' file.txt	Working with Unix paths
sed 's/[a-g]//g' file.txt	Remove all characters from a to g from file.txt
sed 's/$.*$foo/\1bar/' file.txt	Replace only the last match of foo with bar
sed '1!G;h;$!d'	A tac replacement
sed '/\n/!G;s/$.$$.*\n$/&\2\1\ /;//D;s/.//'	A rev replacement
sed 10q file.txt	A head replacement
sed -e :a -e '$q;N;11,$D;ba' \ file.txt	A tail replacement
sed '$!N; /^$.*$\n\1$/!P; D' \ file.txt	A uniq replacement
sed '$!N; s/^$.*$\n\1$/\1/;\ t; D' file.txt	The opposite (or uniq -d equivalent)
sed '$!N;$!D' file.txt	Equivalent to tail -n 2
sed -n '$p' file.txt	... tail -n 1 (or tail -1)
sed '/regexp/!d' file.txt	grep equivalent
sed -n '/regexp/{g;1!p;};h' file.txt	Print the line before the one matching regexp, but not the one containing the regexp
sed -n '/regexp/{n;p;}' file.txt	Print the line after the one matching the regexp, but not the one containing the regexp
sed '/pattern/d' file.txt	Delete lines matching pattern
sed '/./!d' file.txt	Delete all blank lines from a file
sed '/^$/N;/\n$/N;//D' file.txt	Delete all consecutive blank lines except for the first two
sed -n '/^$/{p;h;};/./{x;/./p;}'\ file.txt	Delete the last line of each paragraph
sed 's/.\x08//g' file	Remove nroff overstrikes
sed '/^$/q'	Get mail header
sed '1,/^$/d'	Get mail body
sed '/^Subject: */!d; s///;q'	Get mail subject
sed 's/^/> /'	Quote mail message by inserting a "> " in front of every line
sed 's/^> //'	The opposite (unquote mail message)
sed -e :a -e 's/<[^>]*>//g;/	Remove HTML tags
sed '/./{H;d;};x;s/\n/={NL}=/g'\ file.txt \| sort \ \| sed '1s/={NL}=//;s/={NL}=/\n/g'	Sort paragraphs of file.txt alphabetically
sed 's@/usr/bin@&/local@g' path.txt	Replace /usr/bin with /usr/bin/local in path.txt
sed 's@^.*$@<<<&>>>@g' path.txt	Try it and see :)
sed 's/$\/[^:]$./\1/g' path.txt	Provided path.txt contains $PATH, this will echo only the first path on each line
sed 's/$[^:]$./\1/' /etc/passwd	awk replacement - displays only the users from the passwd file
echo "Welcome To The Suresh Stuff" \| sed \ 's/$\b[A-Z]$/$\1$/g' (W)elcome (T)o (T)he (S)uresh (S)tuff	Self-explanatory
sed -e '/^$/,/^END/s/hills/\ mountains/g' file.txt	Swap 'hills' for 'mountains', but only on blocks of text beginning with a blank line, and ending with a line beginning with the three characters 'END', inclusive
sed -e '/^#/d' /etc/services \| more	View the services file without the commented lines
sed '$s@$[^:]$:$[^:]$:$[^:]*\ $@\3:\2:\1@g' path.txt	Reverse order of items in the last line of path.txt
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}'\ -e h file.txt	Print 1 line of context before and after the line matching, with a line number where the matching occurs
sed '/regex/{x;p;x;}' file.txt	Insert a new line above every line matching regex
sed '/AAA/!d; /BBB/!d; /CCC/!d' file.txt	Match AAA, BBB and CCC in any order
sed '/AAA.BBB.CCC/!d' file.txt	Match AAA, BBB and CCC in that order
sed -n '/^.\{65\}/p' file.txt	Print lines 65 chars long or more
sed -n '/^.\{65\}/!p' file.txt	Print lines 65 chars long or less
sed '/regex/G' file.txt	Insert blank line below every line
sed '/regex/{x;p;x;G;}' file.txt	Insert blank line above and below
sed = file.txt \| sed 'N;s/\n/\t/'	Number lines in file.txt
sed -e :a -e 's/^.\{1,78\}$/\ &/;ta' file.txt	Align text flush right
sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e \ 's/$ *$\1/\1/' file.txt	Align text center

6. Conclusion

This is only a part of what can be told about sed, but this series is meant as a practical guide, so we hope it helps you discover the power of Unix tools and become more efficient in your work.