secure remote login to another computer
secure file transfer to another computer (interactive)
secure file transfer to another computer (non-interactive)
extremely powerful and smart file transfer (works both for local and remote computers —non-interactive)
Display which processes are running (non-interactive)
Display which processes are running (interactive)
Kill (abort) a process using the process ID
Kill (abort) a process using the process name
Set the cpu priority for a process
Set the disk usage priority for a process
Keep running after logging out
Run process in the background
Run a long process in the background and don’t hog system resources
[language=bash] nohup ionice -c2 -n7 nice -n 19 prog –progOpts
Create a zip file
Extract contents from a zip file
Compress a file with GNU zip
Decompress a file with GNU zip
Compress a file with bzip compression (makes smaller files)
Decompress a file with bzip
Create and extract tar archives
Common uses:
tar -czvf file.tar.gz directory
tar -xzvf file.tar.gz
Basic interactive calculator. Usually should invoke with the -l option
Reverse polish style interactive calculator
Add the first line of one file and the last of another echo "‘head -n1 numbers.txt‘ + ‘tail -n1 numbers2.txt‘" |bc -l Add the first 10 lines of a file (which contains one number per line) echo "‘head numbers.txt‘ + p" |dc
Most UNIX programs pay attention to environment variables, such as the language, timezone, and PATH. To see all currently set variables, type: [language=bash] export To change a variable, do: [language=bash] export PATH="/home/robfelty/bin:
You can also create and use your own variables. If you frequently connect to the server speech.psych.indiana.edu, you can store that in a variable, e.g. [language=bash] speech=speech.psych.indiana.edu ssh
Many UNIX programs, including the shell (we have been using the BASH shell), have files where one can store customizations between sessions. Common .rc files
.bashrc
.vimrc
.inputrc
Every time you open a new terminal, the .bashrc file is read.
A shell script uses the exact same syntax as the command line shell you use (we have been using BASH). In this way, you can group commands together, to reduce work.
[language=bash] #!/bin/bash # this script strips off any file extension from the argument, and runs the result through latex, bibtex, latex twice, dvips, ps2pdf, and then opens it with evince SEED=‘echo SEED bibtex
SEED latex -interaction=batchmode
SEED.dvi -o
SEED.ps evince
How might one improve this script?
[language=bash,numbers=left,name=backup,linewidth=65ex,lineskip=-2pt] #!/bin/bash # this script syncs my school computer onto an external hard disk using rsync
# define a few constants TARGET=’/media/disk’ OPTIONS=’ -avz –delete-after ’ UMOUNT=’FALSE’
echo "Executing incremental backup script"
# if /media/disk does not exist, create it, then mount the disk, and mark for unmounting if [ ! -d /media/disk ]; then echo "creating /media/disk and mounting" UMOUNT=’TRUE’ mkdir /media/disk mount /dev/sdd1 /media/disk fi
Mac, UNIX, and DOS (Windows) use different line ending characters, which can cause lots of problems
Mac
UNIX
DOS
Most Linux distros ship with the programs unix2dos etc. Mac does not. Instead use the scripts provided in the resources/utils directory.
(Open source version of pico).
Advantages:
user-friendly. Lists commands at bottom of screen.
small
Not very powerful
Not a default install on many UNIXes
Two-mode editor. This is my editor of choice. advantages:
small (in size and memory usage)
common (found on almost all UNIX systems by default)
powerful (great regular expression support, and nice syntax highlighting)
fast (your fingers never have to leave the home row. No mouse required)
steep learning curve
Editor of choice for many programmers. Swiss-army knife of editors. Advantages:
Great syntax highlighting
Single mode editor
Includes all sorts of tools (news readers, e-mail readers, version control interfaces, friendfeed interface)
Uses lots of memory
Not a default install on many UNIXes
Globs (wildcards) can be used by BASH, and by other programs (Microsoft Word & Excel) as shortcuts to match multiple expressions
Match zero or more characters.
Match any single character
Match any single character from the bracketed set. A range of characters can be specified with [ - ]
Match any single character NOT in the bracketed set.
A list (set)
An initial "." in a filename does not match a wildcard unless explicitly given in the pattern. In this sense filenames starting with "." are hidden. A "." elsewhere in the filename is not special.
Pattern operators can be combined
chapter[1-5].* could match chapter1.tex, chapter4.tex, chapter5.tex.old. It would not match chapter10.tex or chapter1
Delete all microsoft word documents in my home directory [language=bash] rm -f /*.doc Convert all microsoft word documents in my home directory to plain text [language=bash] for file in /*.doc; do antiword file .doc‘.txt; done Create all files a-c with extensions txt,tmp,foo,bar [language=bash] touch a,b,c.txt,tmp,foo,bar
Download l55practiceFiles.tar.gz and untar it
Move all files ending in .txt to a new directory txt
mkdir txt; mv *.txt txt
Copy files 10-19 to a new directory 10-19
mkdir 10-19; cp 1[0-9] 10-19
list permissions for files ending in .txt which do not contain numbers
ls -l [a-zA-Z].txt
OR
ls -l [!0-9].txt
Separate files into different directories according to their extension
mkdir {tmp,foo,bar,txt}
for file in *.{tmp,txt,foo,bar}; do mv $file `echo $file| cut -f 2 -d '.'` /$file; done
Finding the information you need from the databases will require the use of regular expressions
Regular expressions are a feature in many programming languages that allow one to search for a given string in a body of text, including the use of some special characters
Problem: I want to find all CVC words in the English CELEX database
Solution: grep -E ’
![]() |
Problem: I want to know how many words that start and end with the letter k
Solution: grep -iEc ’
k[a-z]*k
’ celex.cd
Special characters: . ? + * [] {} () | ^ $ \
matches any character
matches any of the characters within the brackets e.g. [a0] matches both a and 0
Several predefined shortcuts are also possible
matches all lowercase letters
matches all uppercase letters
matches all uppercase and lowercase letters
matches all numbers
Special characters: . ? + * [] {} () | ^ $ \
matches 1 or 0 of the preceding character, e.g. colou?r matches color and colour
matches 1 or more of the preceding character, e.g. bug +off matches bug off, bug off, but not bugoff
matches any number of the preceding character, e.g. colou*r matches color, colour, colouur and so on
used to specify the number of times a character should be matched. Ranges are also possible.
matches only aa
matches two lowercase letters, e.g. ab
matches 2–4 lowercase letters, e.g. al or foo
Special characters: . ? + * [] {} () | ^ $ \ By default, * and + are greedy, meaning that they match as much as possible. Often this is not the intended effect. I want to strip out html tags from a document. I use the following regular expression: <.*> This will match <span class=’foo’>. But it will also match <span class=’foo’>some text I don’t want to get rid of</span>
Solution: use negative character classes: <[^<>]*>
In Perl and python, you can use .*? and .+?
Special characters: . ? + * [] {} () | ^ $ \
() used to group sequences. Useful especially for backreferences (more on that later), and
| used as an or operator, e.g. x|y matches either x or y
(m|M)(in|ax)imum matches minimum, maximum, Minimum and Maximum
. ? + * [] {} () | ^ $ \
\1 is a backreference. You can use multiple backreferences of the form \n where n is the nth pair of parentheses in the expression. Say I want to find common typos involving duplicate words (such as a a or the the). I could write an expression like so (a|the) \1
which says “match either a or the followed by a space followed by whatever was matched in the parentheses”
Special characters: . ? + * [] {} () | ^ $ \
matches the beginning of the string
Within brackets, negates the pattern, e.g. [^xy] matches everything but x or y
matches the end of the string
is the escape character. When you want to use one of the special characters as a normal character, it must be preceded by \
grep will search a file on a line by line basis, and return any lines which contain the regular expression
In the case of CELEX, we will take advantage of the fact that fields are separated by \
Like many UNIX programs, grep has quite a few options available. For a complete list, type man grep
-E extended regular expressions — allows us to use all the special characters
-i ignore case
-c simply print the number of matches
-v invert match, i.e. return everything that does not match the expression
These can be used in conjunction with one another, e.g.
grep -icv ’dog’ file
returns the number of lines that do not contain the word dog from the file ‘file’.
Practice writing some regular expressions that will find the following from CELEX:
all words begin with ‘st’
\\st
all words that end in ‘ing’
ing\\
word that begin with ‘st’ and ending with ‘ing’
\\st[a-z]*ing\\
all monosyllabic words
\\\[[CV]+\]\\
all disyllabic words
\\\[[CV]+\]\[[CV]+\]\\
Find all words with a frequency of greater than 23
Not only can you use regular expressions to match strings, but you can also replace matched strings with other strings. The easiest way to do this is with the program sed. By default, sed prints out the entire input, replacing any patterns with the specified replacements The basic form is like so: [language=bash] sed ’s/match/replace/flags’ < infile > outfile Input: The blue man sat next to the green man. [language=bash] echo ’The blue man sat next to the green man.’ | sed ’s/man/woman/’ Output: The blue woman sat next to the green woman.
Backreferences can be used not only in patterns, but also in replacements. This allows one to use dynamic replacements. File replacement: I want to get rid of spaces in filenames, because they can cause problems with UNIX scripts. I can use sed. mv "foo bar.txt" foo_bar.txt for file in *; do mv "file|sed -r ’s/ /_/g’‘; done
Makes the following character lower case
Makes the following character upper case
Makes all following characters lower case
Makes all following characters upper case
[language=bash] echo "Minimum"|sed -r ’s/(in|ax)imum//’
Display the second column of courseBackground.txt
cut -f2 courseBackground.txt
Create a new file with only the first and third columns of courseBackground.txt
cut -f1,3 courseBackground.txt > courseBackground13.txt
Combine the courseBackground.txt with the new file you just created
paste courseBackground.txt courseBackground13.txt > combinedFile.txt
Sort the courseBackground.txt file by nickname, ignoring case (HINT: use -t $'\t')
sort -k 2,2f -t $'\t' courseBackground.txt
Count the number of entries in the Devil’s Dictionary
grep -Ec '[A-Z]{2,},' devilsDictionary.txt
Print out the all the entries in the Devil’s dictionary (not the definition)
grep -Ec '[A-Z]{2,},' devilsDictionary.txt | cut -f1 -d ’,’
Count the number of occurrences of the word the in the Devil’s Dictionary
grep -Eic '( |[a-z]|
)the([
a-z]|$)' devilsDictionary.txt
Count the number of indefinite articles in the Devil’s Dictionary
grep -Eic '( |[a-z]|
)(a|an)( |[
a-z] |$)' devilsDictionary.txt