Download as pdf

Instructor

Robert Felty, Ph.D.

Class meeting times

Tuesday and Thursday, 12:30-13:45, MUEN E118

Office

292 Hellems

Office Hours

Tuesday, 10:00-11:00, Thursday, 3:00-4:00 p.m., or by appointment

Overview

This course is designed to give linguists a practical foundation in programming, which will allow them to efficiently take advantage of existing tools, as well as create their own tools for a variety of linguistic tasks, including searching corpora and databases, compiling statistics from databases, analyzing experimental data, and preparing stimuli for experiments.

The course will focus primarily on the Python programming language, but will also cover some of the commonly used unix utilities which are handy for linguists.

Goals

By the end of the course, you should be able to:

  • Feel comfortable using a unix/linux/mac command line

  • Understand key programming concepts such as conditionals, iteration, recursion, functions, and objects

  • Be able to write your own programs which can help you to answer linguistic questions and solve everyday problems

  • Become acquainted with some computational and corpus linguistics topics such as part of speech tagging, concordance, and regular expressions

Textbooks

For most of the course we will use the NLTK book. We will also use the UNIX book some (and it is a good reference), and as well as the Think Python book occasionally. The NLTK book and the Think Python book are available for free online, though you can also purchase a paper copy if you wish.

All readings should be done BEFORE class. I reserve the right to hold pop quizzes on the reading as part of the participation grade

Grading

The focus of this course is on practical applications. The grading also reflects this, in that the majority of the grade will come from homework assignments.

class participation

10%

homework assignments (11 total — one can be dropped)

60%

final presentation

10%

final paper/project

20%

 

Homework

The homework assignments will mostly be small programming problems of the sort that linguists frequently deal with. All homework assignments will be assigned on Thursday, and will be due the following Friday at 5 p.m., and should be submitted electronically. Topics for each assignment will be fully covered in class and/or readings by the Tuesday before the assignment is due at the latest. Assignments will be returned before class on Tuesdays. Homework should include all source code, with meaningful comments, and input and output where appropriate.

Since we will want to discuss homework solutions in class while it is still fresh in our minds, late homework will not be accepted.

Homework schedule

Due date

Topic

Sep 4

UNIX basics

Sep 9 – DROP DEADLINE – NO TUITION ASSESSED

Sep 11

More UNIX and Regular expressions

Sep 18

Basic python and NLTK

Sep 25

Conditionals and word frequency

Oct 2

Functions and wordlists

Oct 7 – DROP DEADLINE – NO PETITION NECESSARY

Oct 9

Raw text processing

Oct 16

Strings, unicode, and regular expressions

Oct 23

Review of variables, strings and lists

Oct 30

Advanced function techniques, and recursion

Nov 6

Final Project topics due (not graded)

Nov 13

Tagged corpora

Nov 20

More on tagging

Final presentation / project

DUE NO LATER THAN Wednesday, Dec. 16th, 4 p.m.

The culmination of the course will be a final project of your choosing. For this project, you should choose some task or problem relevant to your research interests, and write a program which solves this problem. You will be asked to give a short presentation outlining the problem and your solution to it, and finally will turn in a working program with thorough documentation.

Depending on the scope of the problem, it may be suitable to only solve a particular subset of the possible scenarios. Please begin to think about possible projects as soon as possible and discuss them with me.

Examples of possible projects:

  • Compare speech styles of Shakespeare characters in different translations

  • Compare the use of hedges in 2 different genres of text

  • Compute phonotactic probabilities from a lexicon

  • Write a program which performs a complex search of a linguistic corpus or database and computes some sort of statistics about it

  • Write a program which translates a corpus from one transcription/tagging system into another

  • Write a program which selects stimuli from a database for a psycholinguistic experiment based on a number of different criteria

  • Develop a custom GUI application to control psycholinguistic experiments

Calendar (tentative)

Table 1: Course calendar, showing topics and readings

date

topic

reading

Tue Aug 25

Why programming and linguistics? Installing necessary programs

none

Thu Aug 27

Intro to UNIX — finding, reading, and writing to files

UNIX Ch. 3

Tue Sep 1

More Unix intro – pipes

UNIX Ch. 5

Thu Sep 3

Regular expressions

regex.pdf

Tue Sep 8

Common unix utilities

UNIX Chs. 4, 7

Thu Sep 10

Version control and subversion

 

Tue Sep 15

Getting started with Python and the NLTK

NLTK 1-1.1

Thu Sep 17

Variables, Strings, and Lists
Word frequency

1.2-1.3

Tue Sep 22

Conditionals
Natural Language Processing

NLTK 1.4-1.7

Thu Sep 24

Text Corpora and Frequency

NLTK 2.1-2.2

Tue Sep 29

Functions and modules
wordlists

NLTK 2.3-2.4

Thu Oct 1

Semantic relations

NLTK 2.5-2.6

Tue Oct 6

Processing raw text

NLTK 3.1, Dive Into Python 10.6, ps. 143–146

Thu Oct 8

String operations

NLTK 3.2

Tue Oct 13

Unicode and regular expressions

NLTK 3.3-3.4

Thu Oct 15

Tokenizing and normalization

NLTK 3.5-3.7

Tue Oct 22

Shell integration
More on Lists and Strings

NLTK 3.9

Thu Oct 27

Programming review

NLTK 4.1-4.3

Tue Oct 29

More on functions

NLTK 4.4–4.5

Thu Nov 3

More on modules and algorithms

NLTK 4.6-4.7

Tue Nov 5

Sample modules

NLTK 4.8-4.9

Thu Nov 10

Tagged Corpora
Python dictionaries
Handling exceptions

NLTK 5.1-5.3,Dive Into Python 6.1, ps. 64–66

Tue Nov 12

Automatic tagging

NLTK 5.4-5.7

Thu Nov 17

Supervised classification

NLTK 6.1-6.2

Tue Nov 19

Categorization Evaluation and decision trees

NLTK 6.3-6.4

Tue Nov 24

ENJOY YOUR FALL BREAK

 

Thu Nov 26

HAPPY THANKSGIVING

 

Tue Dec 1

Bayes and Maximum entropy classifiers

NLTK 6.5-6.7

Thu Dec 3

Information extraction and chunking

7.1-7.2

Tue Dec 8

Student presentations

none

Thu Dec 10

Student presentations

none

Other Policies

Students with disabilities

If you qualify for accommodations because of a disability, please submit to me a letter from Disability Services in a timely manner so that your needs be addressed. Disability Services determines accommodations based on documented disabilities. Contact: 303-492-8671, Willard 322, and http://www.Colorado.EDU/disabilityservices

If you have a temporary medical condition or injury, see guidelines at http://www.colorado.edu/disabilityservices/go.cgi?select=temporary.html

Disability Services’ letters for students with disabilities indicate legally mandated reasonable accommodations. The syllabus statements and answers to Frequently Asked Questions can be found at http://www.colorado.edu/disabilityservices