\documentclass[twocolumn]{article}
\title{Bourne Shell Programming in One Hour}
\author{Ben Pfaff $<$pfaffben@msu.edu$>$}
\date{1 Aug 1999}

\setlength{\textwidth}{6.5in}
\setlength{\oddsidemargin}{0pt}
\setlength{\evensidemargin}{0pt}
\setlength{\textheight}{8.5in}
\setlength{\topmargin}{0pt}

\begin{document}
\maketitle

\newcommand{\var}[1]{\textsl{#1}}

\section{Introduction}

Programming with the Bourne shell is similar to programming in a
conventional language.  If you've ever written code in C or Pascal, or
even BASIC or FORTRAN, you'll recognize many common features.  For
instance, the shell has variables, conditional and looping constructs,
functions, and more.

Shell programming is also different from conventional programming
languages.  For example, the shell itself doesn't provide much useful
functionality; instead, most work must be done by invoking external
programs.  As a result, the shell has powerful features for using
programs together in sequence to get work done.

This article examines the features of the POSIX shell, more commonly
known as the Bourne shell.  The most common Bourne shell
implementation on GNU/Linux systems is {\tt bash}, the ``Bourne again
shell.''  {\tt bash} incorporates several extensions to the standard
Bourne functionality; none of these will be explored by this article.
For a POSIX-compliant Bourne shell without extensions, I recommend
{\tt ash}.

This article is by no means comprehensive.  It just skims the surface
of many shell features.  I recommend referring to a good reference
book or manpage for more details on shell programming.

\section{Shell command basics}

You should already know how shell commands work at a basic level.  To
start out, the command line you typed is divided up into words.  The
first word is used as the command name, which is either understood by
the shell itself, or used as the name of an external program to run.
In either case, the rest of the words are used as arguments to the
command.

This basic description is fairly accurate, but there is a little more
going on behind the scenes.  The following aims to provide a brief
explanation of what goes on.

\subsection{Word expansion}

Before the shell executes a command, it performs ``word expansion,''
which is a kind of macro processing.  Word expansion has a number of
steps, named in the list below.  The steps are performed in order.

\begin{enumerate}
\item 
All of the following occur at the same time in a single pass across
the line.

\begin{itemize}
\item Variable substitution.
\item Arithmetic expansion.
\item Tilde expansion.
\item Command substitution.
\end{itemize}

\item 
Field splitting.

\item
Filename expansion.

\item
Quote removal.
\end{enumerate}

Each step is explained in more detail below.

\subsubsection{Variable substitution}

The shell has variables that you can set.  To set a shell variable,
use the syntax {\tt \var{name}=\var{value}}.  Note that there may not
be whitespace on either side of the equals sign.  Names of variables
defined this way may contain letters, digits, and underscore and may
not begin with a digit.

To reference the value of a variable, use the syntax {\tt
\$\var{name}} or {\tt \$\{\var{name}\}}.  The variable reference is
expanded like a macro into the command contents.  

There are more powerful ways to reference a variable; see
Fig.~\ref{fig:varref} on page~\pageref{fig:varref} for a few of the
more useful.

The shell has a number of built-in variables.  See
Fig.~\ref{fig:builtinvars} on page~\pageref{fig:builtinvars} for some
of the most commonly used.

\begin{figure}
\begin{description}

\item[\tt \$\{\var{name}:-\var{value}\}] 

If \var{name} is an existing variable with a nonempty value, then its
value is used.  Otherwise, \var{value} is used as a default value.

\item[\tt \$\{\var{name}:=\var{value}\}]

If \var{name} is an existing variable with a nonempty value, then its
value is used.  Otherwise, \var{value} is used as a default value and
variable \var{name} is assigned the specified \var{value}.

\item[\tt \$\{\var{name}:?{[}\var{message}{]}\}]

If \var{name} is an existing variable with a nonempty value, then its
value is used.  Otherwise, \var{message} is output on standard error
and the shell program stops execution.  If \var{message} is not given
then a default error message is used.
\end{description}

\caption{Useful variable references.}
\label{fig:varref}
\end{figure}

\begin{figure}
\begin{description}

\item[\tt \$0]

The name under which this shell program was invoked.

\item[{\tt \$1} \dots {\tt \$9}]

Command-line arguments passed to the shell program, numbered from left
to right.

\item[\tt \$*]

All the command-line arguments.

\item[\tt \$\#]

The number of command-line arguments.

\item[\tt \$?]

The exit status of the last command executed.  Typically, programs
return an exit status of zero on successful execution, nonzero
otherwise.

\item[\tt \$\$]

The process ID number of the executing shell.
\end{description}

\caption{Commonly used built-in shell variables.}
\label{fig:builtinvars}
\end{figure}

\subsubsection{Arithmetic expansion}

Constructions of the form {\tt \$((\var{expression}))} are treated as
arithmetic expressions.  First, \var{expression} is subjected to
variable subsitution, command substitution, and quote removal.
The result is treated as an arithmetic expression and evaluated.  The
entire construction is replaced by the value of the result.

For example:

\begin{verbatim}
$ a=1
$ a=$(($a + 1))
$ echo $a
2
\end{verbatim}

\subsubsection{Tilde expansion}

`{\tt \textasciitilde{}/}' at the beginning of a word is replaced by
the value of the {\tt HOME} variable, which is usually the currently
logged-in user's home directory.

The syntax {\tt \textasciitilde{}\var{username}/} at the beginning of
a word is replaced by the specified user's home directory.

You can disable this treatment by quoting the tilde ({\tt
\textasciitilde{}}); see section~\ref{sec:quoting} on
page~\pageref{sec:quoting} for more information on quoting.

\subsubsection{Command substitution}

Sometimes you want to execute a command and use its output as an
argument for another command.  For instance, you might want to view
detailed information on all the files with a {\tt .c} extension under
the current directory.  If you know about the {\tt xargs} command,
quoting, and pipes, you could do it this way:

\begin{verbatim}
find . -name \*.c -print | xargs ls -l
\end{verbatim}

With command substituion, invoking {\tt xargs} isn't
necessary:\footnote{However, if there are many, many {\tt .c} files
under the current directory, the first form is preferable because
there is a (system-dependent) limit on the maximum number of arguments
that can be passed to a single command, which the first form will
avoid hitting.}

\begin{verbatim}
ls -l `find . -name \*.c -print`
\end{verbatim}

In command substitution, backquotes are paired up and their contents
are treated as shell commands, which are run in a subshell.  The
output of the command is collected and substituted for the backquotes
and their contents.

\subsubsection{Field splitting}

After the substitutions above are performed, the shell scans the
substitutions' results breaks them into words at whitespace (mostly
spaces and tabs).  Quoting (see below) can be used to prevent this.

\subsubsection{Filename expansion}

After field splitting, each word that contains wildcard characters is
expanded in the usual way.  For instance, {\tt *a*} is replaced by all
files in the current directory that have an ``a'' in their name.
Quoting (see below) can be used to prevent filename expansion.

\subsection{Quoting}
\label{sec:quoting}

Sometimes you want to disable some of the shell word expansion
mechanisms above, or you want to group what would normally be multiple
space-separated words into a single ``word.''  Quoting takes care of
both of these.

Quoting can be done with single quotes ({\tt '}) or double quotes
({\tt "}):

\begin{itemize}
\item
When single quotes surround text, the contents are treated as a single
literal word.  No changes at all are made.  Single quotes cannot be
included in a word surrounded by single quotes.

\item
When double quotes surround text, the contents are subjected to
variable substitution, arithmetic substitution, and command
substitution.  In addition, the sequences {\tt \textbackslash{}\$},
{\tt \textbackslash{}`}, {\tt \textbackslash{}"}, and {\tt
\textbackslash{}\textbackslash{}} are replaced by their second
character.
\end{itemize}

In addition, single characters can be quoted by preceding them with a
backslash ({\tt \textbackslash}).

\subsection{Pipelines and redirections}

Pipelines are a key shell feature.  They allow the output of one
program to be used as the input for another.  For instance,
\begin{verbatim}find . -print | cut -b 3- | sort\end{verbatim} causes the
output of {\tt find} to be the input for {\tt cut}, whose output in
turn supplies the input for {\tt sort}.

You can also redirect input and output to a file with the redirection
operators.  The most common redirections are {\tt <}, which redirects
input, and {\tt >}, which redirects output.  See Fig.~\ref{fig:redir}
on page~\pageref{fig:redir} for a more complete list of redirections.

\begin{figure}
\begin{description}
\item[\tt >\var{file}]
Redirect output to \var{file}.  If \var{file} exists then its contents
are truncated.

\item[\tt <\var{file}]
Supply input from \var{file}.

\item[\tt >>\var{file}]
Append output to \var{file}.

\item[\tt 2>\&1]
Redirect error output to standard output.  Usually seen in a
construction like `{\tt >/dev/null 2>\&1}' which causes both regular
and error output to be redirected to {\tt /dev/null}.
\end{description}
 
\caption{Common types of redirection.}
\label{fig:redir}
\end{figure}

\section{Intermediate shell programming}

\subsection{The first line}

A shell program should begin with a line like the one below.

\begin{verbatim}
#! /bin/sh
\end{verbatim}

This line, which must be the first one in the file, means different
things to the shell and to the kernel:

\begin{itemize}
\item
To the shell, the octothorpe ({\tt \#}) character at the beginning of
the line tells it that the line is a comment, which it ignores.

\item 
To the kernel, the special combination {\tt \#!}\footnote{On some
kernels the entire sequence {\tt \#! /} is used.  For this reason,
never omit the space between {\tt !} and {\tt /}.}, called
sharp-bang, means that the file is a special executable to be
interpreted by the program whose name appears on the line.
\end{itemize}

You can pass a single command-line argument to the shell by putting it
after the shell's name.  Many kernels truncate the sharp-bang line
after the first 32 characters\footnote{The Linux limit is
approximately 128.}, so don't get too fancy.

To make full use of this feature, shell programs should have their
executable bit set.  You can do this from the shell prompt with the
command ``{\tt chmod a+x \var{filename}}'' or similar.

Shell programs should never be setuid or setgid.  Such programs are a
security risk with most Unix kernels, including Linux.

\subsection{Command return values}

Every command returns a value between 0 and 255.  This is separate
from any output produced.  The shell interprets a return value of zero
as success and a return value of nonzero as failure.

This return value is used by several shell constructs described below.

The character {\tt !} can be used as a command prefix to reverse the
sense of a command's result; i.e., a nonzero return value is
interpreted as zero, and vice versa.

\subsection{Lists}

Lists of commands can be formed with the {\tt \&\&} and {\tt ||}
operators:

\begin{itemize}
\item
When a pair of commands is separated by {\tt \&\&}, the first command
is executed.  If the command is successful (returns a zero result),
the second command is executed.

\item
When a pair of commands is separated by {\tt ||}, the first command is
executed.  If the command is unsuccessful (returns a zero result), the
second command is executed.
\end{itemize}

The value of a list is the value of the last command executed.

\subsection{Grouping commands}

Commands may be grouped together using the following syntaxes:

\begin{description}
\item[\tt (\var{commands}\dots{})]
Executes the specified \var{commands} in a subshell.  Commands
executed in this way, such as variable assignments, won't affect the
current shell.

\item[\tt \{\var{commands}\dots{}\}]
Executes \var{commands} under the current shell.  No subshell is
invoked.
\end{description}

\subsection{Testing conditions}

Besides the list operators above, conditions can be tested with the
{\tt if} command, which has the following syntax:

\begin{verse} \tt
if \var{condition} \\
then \var{commands}\dots \\
{[} elif \var{condition} \\
then \var{commands}\dots {]}\dots \\
{[} else \var{commands}\dots \\
fi
\end{verse}

If the first \var{condition}, which may be any command, is successful,
then the corresponding \var{commands} are executed.  Otherwise, each
\var{condition} on the {\tt elif} clauses is tested in turn, and if
any is successful, then its \var{commands} are executed.  If none of
the conditions is met, then the {\tt else} clause's \var{commands} are
executed, if any.

For example:

\begin{verbatim}
$ echo
$ if test $? = 0
> then echo 'Success!'
> else echo 'Failure!'
> fi
Success!
$ asdf
asdf: not found
$ if test $? = 0
> then echo 'Success!'
> else echo 'Failure!'
> fi
Failure!
\end{verbatim}

\subsection{Repeating an action conditionally}

The {\tt while} command is used to repeat an action as long as a
condition is true.  It has the following syntax:

\begin{verse} \tt
while \var{condition} \\
do \var{commands}\dots \\
done
\end{verse}

When a {\tt while} command is executed, the \var{condition} is first
executed.  If it is successful, then the \var{commands} are executed,
then it starts over with another test of the \var{condition}, and so
on.

\subsection{Iterating over a set of words}

To repeat an action for each word in a set, use the {\tt for} command,
which has the following syntax:

\begin{verse} \tt
for \var{variable} in \var{words}\dots \\
do \var{commands}\dots \\
done
\end{verse}

The \var{commands} specified are performed for each word in
\var{words} in the order given.  The example below shows how this
could be used, along with {\tt sed}, to rename each file in the
current directory whose name ends in {\tt .x} to the same name but
ending in {\tt .y}.

\begin{verbatim}
$ ls
a.x  b.x  c.x  d
$ for d in *.x
> do mv $d `echo $d | sed -e 's/\.x$/.y/;'`
> done
$ ls
a.y  b.y  c.y  d
\end{verbatim}

\subsection{Selecting one of several alternatives}

The {\tt case} statement can be used to select one alternative from
several using wildcard pattern matching.  It has the following syntax:

\begin{verse} \tt
case \var{word} in \\
\var{pattern}) \var{commands}\dots ;; \\
\dots \\
esac
\end{verse}

\var{word} is compared to each \var{pattern} in turn.  The
\var{commands} corresponding to the first matching \var{pattern} are
executed.  Multiple patterns may be specified for a single set of
commands by separating the patterns with a vertical bar ({\tt |}).

Each \var{pattern} may use shell wildcards for matching.  To match all
patterns as a final alternative, use the generic wildcard {\tt *},
which matches any string.

\subsection{Shell functions}

You can define your own shell functions using a function definition
command, which has the following syntax:

\begin{verse} \tt
\var{name} () \{ \\
  \var{commands}\dots{} \\
\}
\end{verse}

After defining a function, it may be executed like any other command.
Arguments are passed to the function in the built-in variables {\tt
\$0} \dots {\tt \$9}.  Commands inside functions have the same syntax
as those outside.

\section{Built-in shell commands}

The commands described below are built into the shell.  This list is
not comprehensive, but it describes the commands that are most
important for shell programming.

\subsection{\tt :}

This command does nothing and returns a value of zero.  It is used as
a placeholder.

\subsection{\tt cd \var{directory}}

Changes the current working directory to \var{directory}.

\subsection{\tt exec \var{program} \var{arguments}\dots{}}

Replaces the shell by the \var{program} (which must not be built-in),
passing it the given \var{arguments}.  \var{program} replaces the
shell rather than running as a subprocess; control will never return
to this shell.

\subsection{\tt exit \var{value}}

Exits the shell, returning the specified \var{value} to the program that
invoked it.  {\tt exit 0} is often the last line of a shell script.
If a shell program doesn't end with an explicit {\tt exit} command,
it returns the value returned by the last command that it executed.

\subsection{\tt export \var{names}...}

By default, shell variables are limited to the current shell.  But
when {\tt export} is applied to a variable, it is passed in the
environment to programs that are executed by the shell, including
subshells.

\subsection{\tt getopts \var{optstring} \var{name}}

Can be used to parse command-line arguments to a shell script.  Refer
to a shell reference manual for details.

\subsection{\tt read [ -p \var{prompt} ] \var{variables}\dots{}}

\var{prompt} is printed if given.  Then a line is read from the
shell's input.  The line is split into words, and the words are
assigned to the specified \var{variables} from left to right.  If
there are more words than variables, then all the remaining words,
along with the whitespace that separates them, is assigned to the last
variable in \var{variables}.

\subsection{\tt set}

The {\tt set} command can be used to modify the shell's execution
options and set the values of the numeric variables {\tt \$1} \dots
{\tt \$9}.  See a shell reference manual for details.

\subsection{\tt shift}

Shifts the shell's built-in numeric variables to the left; i.e., {\tt
\$2} becomes {\tt \$1}, {\tt \$3} becomes {\tt \$2}, and so on.  The
value of {\tt \$\#} is decremented.  If there are no (remaining)
numeric variables, nothing happens.

\section{Useful external commands}

Most of what goes on in a shell program is actually performed by
external programs.  Some of the most important are listed below, along
with their primary purposes.  To achieve proficiency in shell
programming you should learn to use each of these.  Unfortunately,
describing what each of them do in detail is far beyond the scope of
this article.

Most shells implement at least some of the programs listed below as
internal features.

\subsection{Shell utilities}

These programs are specifically for the use of shell programs.

\begin{description}
\item[basename] Extracts the last component of a filename.
\item[dirname] Extracts the directory part of a filename.
\item[echo] Writes its command-line arguments on standard output,
separated by spaces.
\item[expr] Performs mathematical operations.
\item[false] Always returns unsuccessfully.
\item[printf] Provided formatted output.
\item[pwd] Displays the current working directory.
\item[sleep] Waits for a specified number of seconds.
\item[test] Tests for the existence of files and other file
properties.
\item[true] Always returns successfully.
\item[yes] Repeatedly writes a string to standard output.
\item[{[}] An alias for the {\tt test} command.
\end{description}

\subsection{Text utilities}

These programs are for manipulation of text files.

\begin{description}
\item[awk] Programming language for text manipulation.
\item[cat] Writes files to standard output.
\item[cut] Outputs selected columns of a file.
\item[diff] Compare text files.
\item[grep] Searches files for patterns.
\item[head] Outputs the first part of a file.
\item[patch] Applies patches produced by {\tt diff}.
\item[sed] Stream EDitor for text manipulation.
\item[sort] Sorts lines of text based on specified fields.
\item[tail] Outputs the last part of a file.
\item[tr] Translates characters.
\item[uniq] Removes duplicate lines of text.
\item[wc] Counts words.
\end{description}

\subsection{File utilities}

These programs operate on files.

\begin{description}
\item[chgrp] Changes the group associated with a file.
\item[chmod] Changes a file's permissions.
\item[chown] Changes the owner of a file.
\item[du] Calculates disk storage used by a file.
\item[cp] Copies files.
\item[find] Finds files having specified attributes.
\item[ln] Creates links to a file.
\item[ls] Lists files in a directory.
\item[mkdir] Creates a directory.
\item[mv] Moves or renames files.
\item[rm] Deletes files.
\item[rmdir] Deletes directories.
\item[touch] Updates file timestamps.
\end{description}

\end{document}

