\documentclass[12pt, letterpaper]{article}
\newif\ifplastex
\plastexfalse
\usepackage{mathptmx}
\usepackage{ifpdf}
\usepackage{hyperref}
\usepackage{fullpage}
\headsep .2in
\usepackage{longtable}
\ifplastex%
\let\mylink\href
\renewcommand{\hfill}{foo}%
\newcommand{\mysep}{, }
\let\hfill\mysep
\else%
\newcommand{\foo}{bar}
\usepackage{setspace}
% we don't like widows
\widowpenalty=9999
\clubpenalty=9999
\raggedbottom
\makeatletter
\renewcommand\subsection{\@startsection{subsection}{3}{\z@}%
{-1.5ex\@plus -.2ex \@minus -.4ex}%
{.6ex \@plus .7ex \@minus .5ex}%
{\normalfont\large\bfseries}}
\makeatother
\usepackage{tweaklist}
\renewenvironment{description}
{\list{}{\labelwidth\z@ \itemindent-\leftmargin
\let\makelabel\descriptionlabel\descripthook}}
{\endlist}
\renewcommand{\itemhook}{\setstretch{1}
\setlength{\topsep}{0pt}%
\setlength{\parskip}{0pt}%
\setlength{\partopsep}{.5\baselineskip}%
\setlength{\parsep}{0pt}%
\setlength{\itemsep}{0pt}%
}
% enumerate environment lengths
\renewcommand{\enumhook}{\setstretch{1}
\setlength{\topsep}{0pt}%
\setlength{\parskip}{0pt}%
\setlength{\partopsep}{0pt}%
\setlength{\parsep}{0pt}%
\setlength{\itemsep}{0pt}%
}
% don't print out hyperlinks
\newcommand{\mylink}[2]{}
% put in fancy headers
\usepackage{fancyhdr}
\usepackage{lastpage}
\pagestyle{fancy}
\fancyhf{} % delete current setting for header and footer
\fancyhead[R]{Ling 5200 Fall 2009 syllabus\\last updated \today}
\makeatletter
\fancyhead[L]{\theauthor\\\texttt{robfelty@gmail.com}}
%\fancyhead[RE,RO]{\rightmark\qquad\thepage}
%\fancyhead[LE,LO]{\leftmark}
%\fancyfoot[L]{\small \texttt{http://robfelty.com/academic/cv.html}}
\fancyfoot[L]{}
\fancyfoot[R]{\bf \thepage~of \pageref{LastPage}}
\renewcommand{\headrulewidth}{0.5pt}
\renewcommand{\footrulewidth}{0pt}
\addtolength{\headheight}{16pt} % make space for the rule
\fancypagestyle{plain}{%
\fancyhead{} % get rid of headers on plain pages
\fancyfoot[C]{\thepage}
\renewcommand{\headrulewidth}{0pt} % and the line
}
\makeatother
\hypersetup{
colorlinks=true,
bookmarksnumbered,
bookmarkstype={toc},
bookmarksopen={true},
bookmarksopenlevel={1},
pdfstartview={FitH},
urlcolor={blue},
citecolor={black},
linkcolor={black},
pagecolor={black},
pdfpagemode={UseOutlines},
breaklinks=true,
%hyperindex=true,
%pagebackref=true
}
\fi
\setcounter{secnumdepth}{0}
%
% |
% Robert Felty\\
% Speech Research Laboratory\\
% Department of Psychological and Brain Sciences\\
% Indiana University \\
% 1101 E. 10th St.\\
% Bloomington, IN 47405--1301\\
% |
% Curriculum Vitae\\
% \\
% \\
%
%
% |
%
%
\input{../slides/macros.tex}
\title{Linguistics 5200 --- Introduction to Computational Corpus Linguistics}
\author{Robert Felty}
\date{Fall 2009}
\providecommand\theauthor{Robert Felty}
\newcommand\affiliation{University of Colorado}
\providecommand\address{%
\ifplastex\else%
\begin{minipage}{.5\textwidth}%
\fi
%Speech Research Laboratory\\
%Department of Psychological and Brain Sciences\\
\ifplastex\else%
%Indiana University
\end{minipage}%
\begin{minipage}{.5\textwidth}%
\flushright
\fi
%1101 E. 10th St.\\
%Bloomington, IN 47405--1301\\
\ifplastex\else%
\texttt{robfelty@gmail.com}
\end{minipage}%
\fi
}
\begin{document}
\ifplastex
% \begin{tabular}{l r}
% \theauthor & Syllabus\\
% \affiliation & \today
% \label{metainfo}
% \end{tabular}
\else
%\vspace*{-2em}
\maketitle
\fi
\noindent\begin{tabular}{l l}
\textbf{Instructor} & Robert Felty, Ph.D.\\
\textbf{Class meeting times} & Tuesday and Thursday, 12:30-13:45, MUEN E118 \\
\textbf{Office} & 292 Hellems\\
\textbf{Office Hours} & Tuesday, 10:00-11:00, Thursday, 3:00-4:00 p.m., or by
appointment \\
\ifplastex\else%
\textbf{Course website} & \url{http://robfelty.com/teaching/ling5200Fall2009}\\
\fi
\end{tabular}
\subsection{Overview}
This course is designed to give linguists a practical foundation in
programming, which will allow them to efficiently take advantage of existing
tools, as well as create their own tools for a variety of linguistic tasks,
including searching corpora and databases, compiling statistics from
databases, analyzing experimental data, and preparing stimuli for experiments.
The course will focus primarily on the Python programming language, but will
also cover some of the commonly used unix utilities which are handy for
linguists.
\subsection{Goals}
By the end of the course, you should be able to:
\begin{itemize}
\item Feel comfortable using a unix/linux/mac command line
\item Understand key programming concepts such as conditionals, iteration,
recursion, functions, and objects
\item Be able to write your own programs which can help you to answer
linguistic questions and solve everyday problems
\item Become acquainted with some computational and corpus linguistics
topics such as part of speech tagging, concordance, and regular expressions
\end{itemize}
\subsection{Textbooks}
For most of the course we will use the NLTK book. We will also use the UNIX
book some (and it is a good reference), and as well as the Think Python book
occasionally. The NLTK book and the Think Python book are available for free
online, though you can also purchase a paper copy if you wish.
\begin{itemize}
\item UNIX \dash
\href{http://www.amazon.com/Learning-UNIX-Operating-System-Fifth/dp/0596002610/ref=sr_1_3?ie=UTF8&s=books&qid=1248297870&sr=8-3}{Learning the UNIX operating system}
\item NLTK \dash
\href{http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1250282549&sr=8-1}{Natural Language Processing with Python}
also available online for free at \url{http://www.nltk.org/book}
\item \href{http://www.greenteapress.com/thinkpython/thinkpython.html}{Think
Python} available online for free
\end{itemize}
\textbf{All readings should be done BEFORE class. I reserve the right to hold
pop quizzes on the reading as part of the participation grade}
\subsection{Grading}
The focus of this course is on practical applications. The grading also
reflects this, in that the majority of the grade will come from homework
assignments.
\begin{tabular}{l r}
class participation & 10\%\\
homework assignments (11 total --- one can be dropped) & 60\%\\
final presentation & 10\%\\
final paper/project & 20\%\\
\label{grading}
\end{tabular}
\subsubsection{Homework}
The homework assignments will mostly be small programming problems of the sort
that linguists frequently deal with. All homework assignments will be assigned
on Thursday, and will be due the following Friday at 5 p.m., and should be
submitted electronically. Topics for each assignment will be fully covered in
class and/or readings by the Tuesday before the assignment is due at the
latest. Assignments will be returned before class on Tuesdays. Homework should
include all source code, with meaningful comments, and input and output where
appropriate.
Since we will want to discuss homework solutions in class while it is still
fresh in our minds, \textbf{late homework will not be accepted}.
\subsubsection{Homework schedule}
\begin{center}
\begin{tabular}{l l}
\multicolumn{1}{c}{\textbf{Due date}} & \multicolumn{1}{c}{\textbf{Topic}}\\
Sep 4 & UNIX basics\\
\multicolumn{2}{c}{\textbf{Sep 9 - DROP DEADLINE - NO TUITION ASSESSED}}\\
Sep 11 & More UNIX and Regular expressions\\
Sep 18 & Basic python and NLTK\\
Sep 25 & Conditionals and word frequency\\
Oct 2 & Functions and wordlists\\
\multicolumn{2}{c}{\textbf{Oct 7 - DROP DEADLINE - NO PETITION NECESSARY}}\\
Oct 9 & Raw text processing\\
Oct 16 & Strings, unicode, and regular expressions\\
Oct 23 & Review of variables, strings and lists\\
Oct 30 & Advanced function techniques, and recursion\\
Nov 6 & Final Project topics due (not graded)\\
Nov 13 & Tagged corpora\\
Nov 20 & More on tagging\\
\end{tabular}
\end{center}
\subsection{Final presentation / project}
\textbf{DUE NO LATER THAN Wednesday, Dec. 16th, 4 p.m.}
The culmination of the course will be a final project of your choosing. For
this project, you should choose some task or problem relevant to your research
interests, and write a program which solves this problem. You will be asked to
give a short presentation outlining the problem and your solution to it, and
finally will turn in a working program with thorough documentation.
Depending on the scope of the problem, it may be suitable to only solve a
particular subset of the possible scenarios. Please begin to think about
possible projects as soon as possible and discuss them with me.
Examples of possible projects:
\begin{itemize}
\item Compare speech styles of Shakespeare characters in different
translations
\item Compare the use of hedges in 2 different genres of text
\item Compute phonotactic probabilities from a lexicon
\item Write a program which performs a complex search of a linguistic corpus
or database and computes some sort of statistics about it
\item Write a program which translates a corpus from one
transcription/tagging system into another
\item Write a program which selects stimuli from a database for a
psycholinguistic experiment based on a number of different criteria
\item Develop a custom GUI application to control psycholinguistic
experiments
\end{itemize}
\subsection{Calendar (tentative)}
\begin{longtable}{l p{.4\textwidth} p{.3\textwidth}}
\caption{Course calendar, showing topics and readings}\\
\label{calendar}
\textbf{date} & \textbf{topic} & \textbf{reading} \\
\endfirsthead
\textbf{date} & \textbf{topic} & \textbf{reading} \\
\endhead
Tue Aug 25 & Why programming and linguistics? Installing necessary programs & none \\
Thu Aug 27 & Intro to UNIX --- finding, reading, and writing to files& UNIX
Ch. 3\\
Tue Sep 1 & More Unix intro - pipes & UNIX Ch. 5 \\
Thu Sep 3 & Regular expressions &
\href{http://robfelty.com/subversion/ling5200/regex/regex.pdf}{regex.pdf}\\
Tue Sep 8 & Common unix utilities & UNIX Chs. 4, 7\\
Thu Sep 10 & Version control and subversion& \\
Tue Sep 15 & Getting started with Python and the NLTK & NLTK 1-1.1\\
Thu Sep 17 & Variables, Strings, and Lists\newline Word frequency& 1.2-1.3 \\
Tue Sep 22 & Conditionals\newline Natural Language Processing & NLTK 1.4-1.7\\
Thu Sep 24 & Text Corpora and Frequency& NLTK 2.1-2.2 \\
Tue Sep 29 & Functions and modules\newline wordlists& NLTK 2.3-2.4 \\
Thu Oct 1 & Semantic relations& NLTK 2.5-2.6 \\
Tue Oct 6 & Processing raw text& NLTK 3.1,
\href{http://diveintopython.org/}{Dive Into Python} 10.6, ps. 143--146\\
Thu Oct 8 & String operations& NLTK 3.2 \\
Tue Oct 13 & Unicode and regular expressions & NLTK 3.3-3.4 \\
Thu Oct 15 & Tokenizing and normalization& NLTK 3.5-3.7 \\
Tue Oct 22 & Shell integration\newline More on Lists and Strings& NLTK 3.9\\
Thu Oct 27 & Programming review & NLTK 4.1-4.3 \\
Tue Oct 29 & More on functions& NLTK 4.4--4.5\\
Thu Nov 3 & More on modules and algorithms& NLTK 4.6-4.7 \\
Tue Nov 5 & Sample modules& NLTK 4.8-4.9 \\
Thu Nov 10 & Tagged Corpora\newline Python dictionaries\newline Handling
exceptions& NLTK 5.1-5.3,\href{http://diveintopython.org/}{Dive Into Python}
6.1, ps. 64--66 \\
Tue Nov 12 & Automatic tagging\newline & NLTK 5.4-5.7 \\
Thu Nov 17 & Supervised classification& NLTK 6.1-6.2 \\
Tue Nov 19 & Categorization Evaluation and decision trees& NLTK 6.3-6.4 \\
Tue Nov 24 & ENJOY YOUR FALL BREAK& \\
Thu Nov 26 &HAPPY THANKSGIVING & \\
Tue Dec 1 & Bayes and Maximum entropy classifiers & NLTK 6.5-6.7 \\
Thu Dec 3 & Information extraction and chunking & 7.1-7.2\\
Tue Dec 8 & Student presentations& none \\
Thu Dec 10 & Student presentations& none \\
\end{longtable}
\label{calendar}
\section{Other Policies}
\subsection{Students with disabilities}
If you qualify for accommodations because of a disability, please submit to
me a letter from Disability Services in a timely manner so that your needs
be addressed. Disability Services determines accommodations based on
documented disabilities. Contact: 303-492-8671, Willard 322, and
\url{http://www.Colorado.EDU/disabilityservices}
If you have a temporary medical condition or injury, see guidelines at
\url{http://www.colorado.edu/disabilityservices/go.cgi?select=temporary.html}
Disability Services' letters for students with disabilities indicate legally
mandated reasonable accommodations. The syllabus statements and answers to
Frequently Asked Questions can be found at
\url{http://www.colorado.edu/disabilityservices}
\end{document}