Instructor |
Robert Felty, Ph.D. |
Class meeting times |
Tuesday and Thursday, 12:30-13:45, MUEN E118 |
Office |
292 Hellems |
Office Hours |
Tuesday, 10:00-11:00, Thursday, 3:00-4:00 p.m., or by appointment |
Overview
This course is designed to give linguists a practical foundation in programming, which will allow them to efficiently take advantage of existing tools, as well as create their own tools for a variety of linguistic tasks, including searching corpora and databases, compiling statistics from databases, analyzing experimental data, and preparing stimuli for experiments.
The course will focus primarily on the Python programming language, but will also cover some of the commonly used unix utilities which are handy for linguists.
Goals
By the end of the course, you should be able to:
-
Feel comfortable using a unix/linux/mac command line
-
Understand key programming concepts such as conditionals, iteration, recursion, functions, and objects
-
Be able to write your own programs which can help you to answer linguistic questions and solve everyday problems
-
Become acquainted with some computational and corpus linguistics topics such as part of speech tagging, concordance, and regular expressions
Textbooks
For most of the course we will use the NLTK book. We will also use the UNIX book some (and it is a good reference), and as well as the Think Python book occasionally. The NLTK book and the Think Python book are available for free online, though you can also purchase a paper copy if you wish.
-
NLTK - Natural Language Processing with Python also available online for free at http://www.nltk.org/book
-
Think Python available online for free
All readings should be done BEFORE class. I reserve the right to hold pop quizzes on the reading as part of the participation grade
Grading
The focus of this course is on practical applications. The grading also reflects this, in that the majority of the grade will come from homework assignments.
class participation |
10% |
homework assignments (11 total — one can be dropped) |
60% |
final presentation |
10% |
final paper/project |
20% |
Homework
The homework assignments will mostly be small programming problems of the sort that linguists frequently deal with. All homework assignments will be assigned on Thursday, and will be due the following Friday at 5 p.m., and should be submitted electronically. Topics for each assignment will be fully covered in class and/or readings by the Tuesday before the assignment is due at the latest. Assignments will be returned before class on Tuesdays. Homework should include all source code, with meaningful comments, and input and output where appropriate.
Since we will want to discuss homework solutions in class while it is still fresh in our minds, late homework will not be accepted.
Homework schedule
Due date |
Topic |
Sep 4 |
UNIX basics |
Sep 9 – DROP DEADLINE – NO TUITION ASSESSED |
|
Sep 11 |
More UNIX and Regular expressions |
Sep 18 |
Basic python and NLTK |
Sep 25 |
Conditionals and word frequency |
Oct 2 |
Functions and wordlists |
Oct 7 – DROP DEADLINE – NO PETITION NECESSARY |
|
Oct 9 |
Raw text processing |
Oct 16 |
Strings, unicode, and regular expressions |
Oct 23 |
Review of variables, strings and lists |
Oct 30 |
Advanced function techniques, and recursion |
Nov 6 |
Final Project topics due (not graded) |
Nov 13 |
Tagged corpora |
Nov 20 |
More on tagging |
Final presentation / project
DUE NO LATER THAN Wednesday, Dec. 16th, 4 p.m.
The culmination of the course will be a final project of your choosing. For this project, you should choose some task or problem relevant to your research interests, and write a program which solves this problem. You will be asked to give a short presentation outlining the problem and your solution to it, and finally will turn in a working program with thorough documentation.
Depending on the scope of the problem, it may be suitable to only solve a particular subset of the possible scenarios. Please begin to think about possible projects as soon as possible and discuss them with me.
Examples of possible projects:
-
Compare speech styles of Shakespeare characters in different translations
-
Compare the use of hedges in 2 different genres of text
-
Compute phonotactic probabilities from a lexicon
-
Write a program which performs a complex search of a linguistic corpus or database and computes some sort of statistics about it
-
Write a program which translates a corpus from one transcription/tagging system into another
-
Write a program which selects stimuli from a database for a psycholinguistic experiment based on a number of different criteria
-
Develop a custom GUI application to control psycholinguistic experiments
Calendar (tentative)
date |
topic |
reading |
---|---|---|
Tue Aug 25 |
Why programming and linguistics? Installing necessary programs |
none |
Thu Aug 27 |
Intro to UNIX — finding, reading, and writing to files |
UNIX Ch. 3 |
Tue Sep 1 |
More Unix intro – pipes |
UNIX Ch. 5 |
Thu Sep 3 |
Regular expressions |
|
Tue Sep 8 |
Common unix utilities |
UNIX Chs. 4, 7 |
Thu Sep 10 |
Version control and subversion |
|
Tue Sep 15 |
Getting started with Python and the NLTK |
NLTK 1-1.1 |
Thu Sep 17 |
Variables, Strings, and Lists |
1.2-1.3 |
Tue Sep 22 |
Conditionals |
NLTK 1.4-1.7 |
Thu Sep 24 |
Text Corpora and Frequency |
NLTK 2.1-2.2 |
Tue Sep 29 |
Functions and modules |
NLTK 2.3-2.4 |
Thu Oct 1 |
Semantic relations |
NLTK 2.5-2.6 |
Tue Oct 6 |
Processing raw text |
NLTK 3.1, Dive Into Python 10.6, ps. 143–146 |
Thu Oct 8 |
String operations |
NLTK 3.2 |
Tue Oct 13 |
Unicode and regular expressions |
NLTK 3.3-3.4 |
Thu Oct 15 |
Tokenizing and normalization |
NLTK 3.5-3.7 |
Tue Oct 22 |
Shell integration |
NLTK 3.9 |
Thu Oct 27 |
Programming review |
NLTK 4.1-4.3 |
Tue Oct 29 |
More on functions |
NLTK 4.4–4.5 |
Thu Nov 3 |
More on modules and algorithms |
NLTK 4.6-4.7 |
Tue Nov 5 |
Sample modules |
NLTK 4.8-4.9 |
Thu Nov 10 |
Tagged Corpora |
NLTK 5.1-5.3,Dive Into Python 6.1, ps. 64–66 |
Tue Nov 12 |
Automatic tagging |
NLTK 5.4-5.7 |
Thu Nov 17 |
Supervised classification |
NLTK 6.1-6.2 |
Tue Nov 19 |
Categorization Evaluation and decision trees |
NLTK 6.3-6.4 |
Tue Nov 24 |
ENJOY YOUR FALL BREAK |
|
Thu Nov 26 |
HAPPY THANKSGIVING |
|
Tue Dec 1 |
Bayes and Maximum entropy classifiers |
NLTK 6.5-6.7 |
Thu Dec 3 |
Information extraction and chunking |
7.1-7.2 |
Tue Dec 8 |
Student presentations |
none |
Thu Dec 10 |
Student presentations |
none |
Other Policies
Students with disabilities
If you qualify for accommodations because of a disability, please submit to me a letter from Disability Services in a timely manner so that your needs be addressed. Disability Services determines accommodations based on documented disabilities. Contact: 303-492-8671, Willard 322, and http://www.Colorado.EDU/disabilityservices
If you have a temporary medical condition or injury, see guidelines at http://www.colorado.edu/disabilityservices/go.cgi?select=temporary.html
Disability Services’ letters for students with disabilities indicate legally mandated reasonable accommodations. The syllabus statements and answers to Frequently Asked Questions can be found at http://www.colorado.edu/disabilityservices