This assignment will cover some more of the UNIX basics we have covered (including material up to September 8th). Some of the questions will ask you to use some of the files from practiceFiles, specifically, celex.txt and devilsDictionary.txt.
- Print out the entries (orthography only) from the celex.txt file which were taken from GOOGLE. Hint: You will need to use a pipe. (6 points)
- Print out the 50 most frequent words from the celex.txt file which were taken from GOOGLE. Hint: You will need to combine the answers from the last 2 questions. (9 points)
- Use unix commands to count the number of entries (not definitions) in the devil’s dictionary that begin with a vowel. Your output should be a single number. (7 points)
- Use unix commands to calculate the average number of letters per word for each entry (not the definitions) in the Devil’s Dictionary. The output should simply be a number. HINT: You will need to use subshells, and bc (10 points)
- Count the number of adjectives, nouns, and verbs in the devil’s dictionary. (10 points)
- Print out all the entries (not the definitions), which are not adjectives, nouns, or verbs. HINT: use grep more than once. (10 points)
- Write a unix pipeline which will print the number of words in the celex.txt file that contain a q not followed by a u (look only at the orthography of each entry). (8 points)
-
Extra credit
Write a unix pipeline which will print the total number of points in this assignment. Don’t include the points for the extra credit (3 extra points) (Hint: use dc)
Hi Rob,
Not sure what you mean by “entries (not definitions”. Is there a readme that goes along with this file?
Les
Les,
Here is an example.
cat n., a furry domesticated animal of the feline family
“cat” is the entry. Everything after the comma is the definition
Look at the last example in the GREP chapter of the notes for an example. I will also go over this in class tomorrow.
Does the backslash properly function as an escape character for the single quote mark, or must I (if even possible) use some other sorcery to grep for, say, a possessive? When I use a regular expression such as ‘^[A-Z\']*’ in the hopes of capturing something like “HUGH’S”, the terminal just stares back at me with eyes glazed and spittle dribbling down its chin.
I feel like this had to have been mentioned, so I apologize (if so) for asking again. Thanks in advance for enlightenment!
Backslash will not escape single or double quotes. You have two options,
1. use double quotes, e.g.
grep “don’t” file
2. Don’t use any quotes, but then you have to escape every special character
grep d\[ao\]n\’t celex.txt
Then to actually match special characters, you have to double escape them. e.g.
grep CVVCC\\]\\[CVVC celex.txt
All that having been said, it might be easier to use a negative character class.
Hi Rob,
For question 7 I’m able to limit my search to words that contain ‘q’ not followed by ‘u’ but when printing just the orthography I’m getting all words that have a string of q not followed by u somewhere in the definition or description.
i.e. “gendarme” is one of the words in the printed list, for which in its full entry is:
36631\gendarme\13\18700\6\’Zqn-d#m\[CVVC][CVVC]\[ZA~:n][dA:m]\Zandarm\3.6667.0000\HML\Z’an<d`arm
since "Zqn" fits the criteria of my search. Is this ok or do I need to get my pipeline to only print words from the entries?
Thanks,
Arrick
You should limit your search to only the orthography of the words.