\documentclass[12pt, letterpaper]{article} \newif\ifplastex \plastexfalse \usepackage{mathptmx} \usepackage{ifpdf} \usepackage{hyperref} \usepackage{fullpage} \headsep .2in \usepackage{longtable} \ifplastex% \let\mylink\href \renewcommand{\hfill}{foo}% \newcommand{\mysep}{, } \let\hfill\mysep \else% \newcommand{\foo}{bar} \usepackage{setspace} % we don't like widows \widowpenalty=9999 \clubpenalty=9999 \raggedbottom \makeatletter \renewcommand\subsection{\@startsection{subsection}{3}{\z@}% {-1.5ex\@plus -.2ex \@minus -.4ex}% {.6ex \@plus .7ex \@minus .5ex}% {\normalfont\large\bfseries}} \makeatother \usepackage{tweaklist} \renewenvironment{description} {\list{}{\labelwidth\z@ \itemindent-\leftmargin \let\makelabel\descriptionlabel\descripthook}} {\endlist} \renewcommand{\itemhook}{\setstretch{1} \setlength{\topsep}{0pt}% \setlength{\parskip}{0pt}% \setlength{\partopsep}{.5\baselineskip}% \setlength{\parsep}{0pt}% \setlength{\itemsep}{0pt}% } % enumerate environment lengths \renewcommand{\enumhook}{\setstretch{1} \setlength{\topsep}{0pt}% \setlength{\parskip}{0pt}% \setlength{\partopsep}{0pt}% \setlength{\parsep}{0pt}% \setlength{\itemsep}{0pt}% } % don't print out hyperlinks \newcommand{\mylink}[2]{} % put in fancy headers \usepackage{fancyhdr} \usepackage{lastpage} \pagestyle{fancy} \fancyhf{} % delete current setting for header and footer \fancyhead[R]{Ling 5200 Fall 2009 syllabus\\last updated \today} \makeatletter \fancyhead[L]{\theauthor\\\texttt{robfelty@gmail.com}} %\fancyhead[RE,RO]{\rightmark\qquad\thepage} %\fancyhead[LE,LO]{\leftmark} %\fancyfoot[L]{\small \texttt{http://robfelty.com/academic/cv.html}} \fancyfoot[L]{} \fancyfoot[R]{\bf \thepage~of \pageref{LastPage}} \renewcommand{\headrulewidth}{0.5pt} \renewcommand{\footrulewidth}{0pt} \addtolength{\headheight}{16pt} % make space for the rule \fancypagestyle{plain}{% \fancyhead{} % get rid of headers on plain pages \fancyfoot[C]{\thepage} \renewcommand{\headrulewidth}{0pt} % and the line } \makeatother \hypersetup{ colorlinks=true, bookmarksnumbered, bookmarkstype={toc}, bookmarksopen={true}, bookmarksopenlevel={1}, pdfstartview={FitH}, urlcolor={blue}, citecolor={black}, linkcolor={black}, pagecolor={black}, pdfpagemode={UseOutlines}, breaklinks=true, %hyperindex=true, %pagebackref=true } \fi \setcounter{secnumdepth}{0} % % % Robert Felty\\ % Speech Research Laboratory\\ % Department of Psychological and Brain Sciences\\ % Indiana University \\ % 1101 E. 10th St.\\ % Bloomington, IN 47405--1301\\ % % Curriculum Vitae\\ % \\ % \\ % % % % % \input{../slides/macros.tex} \title{Linguistics 5200 --- Introduction to Computational Corpus Linguistics} \author{Robert Felty} \date{Fall 2009} \providecommand\theauthor{Robert Felty} \newcommand\affiliation{University of Colorado} \providecommand\address{% \ifplastex\else% \begin{minipage}{.5\textwidth}% \fi %Speech Research Laboratory\\ %Department of Psychological and Brain Sciences\\ \ifplastex\else% %Indiana University \end{minipage}% \begin{minipage}{.5\textwidth}% \flushright \fi %1101 E. 10th St.\\ %Bloomington, IN 47405--1301\\ \ifplastex\else% \texttt{robfelty@gmail.com} \end{minipage}% \fi } \begin{document} \ifplastex % \begin{tabular}{l r} % \theauthor & Syllabus\\ % \affiliation & \today % \label{metainfo} % \end{tabular} \else %\vspace*{-2em} \maketitle \fi \noindent\begin{tabular}{l l} \textbf{Instructor} & Robert Felty, Ph.D.\\ \textbf{Class meeting times} & Tuesday and Thursday, 12:30-13:45, MUEN E118 \\ \textbf{Office} & 292 Hellems\\ \textbf{Office Hours} & Tuesday, 10:00-11:00, Thursday, 3:00-4:00 p.m., or by appointment \\ \ifplastex\else% \textbf{Course website} & \url{http://robfelty.com/teaching/ling5200Fall2009}\\ \fi \end{tabular} \subsection{Overview} This course is designed to give linguists a practical foundation in programming, which will allow them to efficiently take advantage of existing tools, as well as create their own tools for a variety of linguistic tasks, including searching corpora and databases, compiling statistics from databases, analyzing experimental data, and preparing stimuli for experiments. The course will focus primarily on the Python programming language, but will also cover some of the commonly used unix utilities which are handy for linguists. \subsection{Goals} By the end of the course, you should be able to: \begin{itemize} \item Feel comfortable using a unix/linux/mac command line \item Understand key programming concepts such as conditionals, iteration, recursion, functions, and objects \item Be able to write your own programs which can help you to answer linguistic questions and solve everyday problems \item Become acquainted with some computational and corpus linguistics topics such as part of speech tagging, concordance, and regular expressions \end{itemize} \subsection{Textbooks} For most of the course we will use the NLTK book. We will also use the UNIX book some (and it is a good reference), and as well as the Think Python book occasionally. The NLTK book and the Think Python book are available for free online, though you can also purchase a paper copy if you wish. \begin{itemize} \item UNIX \dash \href{http://www.amazon.com/Learning-UNIX-Operating-System-Fifth/dp/0596002610/ref=sr_1_3?ie=UTF8&s=books&qid=1248297870&sr=8-3}{Learning the UNIX operating system} \item NLTK \dash \href{http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1250282549&sr=8-1}{Natural Language Processing with Python} also available online for free at \url{http://www.nltk.org/book} \item \href{http://www.greenteapress.com/thinkpython/thinkpython.html}{Think Python} available online for free \end{itemize} \textbf{All readings should be done BEFORE class. I reserve the right to hold pop quizzes on the reading as part of the participation grade} \subsection{Grading} The focus of this course is on practical applications. The grading also reflects this, in that the majority of the grade will come from homework assignments. \begin{tabular}{l r} class participation & 10\%\\ homework assignments (11 total --- one can be dropped) & 60\%\\ final presentation & 10\%\\ final paper/project & 20\%\\ \label{grading} \end{tabular} \subsubsection{Homework} The homework assignments will mostly be small programming problems of the sort that linguists frequently deal with. All homework assignments will be assigned on Thursday, and will be due the following Friday at 5 p.m., and should be submitted electronically. Topics for each assignment will be fully covered in class and/or readings by the Tuesday before the assignment is due at the latest. Assignments will be returned before class on Tuesdays. Homework should include all source code, with meaningful comments, and input and output where appropriate. Since we will want to discuss homework solutions in class while it is still fresh in our minds, \textbf{late homework will not be accepted}. \subsubsection{Homework schedule} \begin{center} \begin{tabular}{l l} \multicolumn{1}{c}{\textbf{Due date}} & \multicolumn{1}{c}{\textbf{Topic}}\\ Sep 4 & UNIX basics\\ \multicolumn{2}{c}{\textbf{Sep 9 - DROP DEADLINE - NO TUITION ASSESSED}}\\ Sep 11 & More UNIX and Regular expressions\\ Sep 18 & Basic python and NLTK\\ Sep 25 & Conditionals and word frequency\\ Oct 2 & Functions and wordlists\\ \multicolumn{2}{c}{\textbf{Oct 7 - DROP DEADLINE - NO PETITION NECESSARY}}\\ Oct 9 & Raw text processing\\ Oct 16 & Strings, unicode, and regular expressions\\ Oct 23 & Review of variables, strings and lists\\ Oct 30 & Advanced function techniques, and recursion\\ Nov 6 & Final Project topics due (not graded)\\ Nov 13 & Tagged corpora\\ Nov 20 & More on tagging\\ \end{tabular} \end{center} \subsection{Final presentation / project} \textbf{DUE NO LATER THAN Wednesday, Dec. 16th, 4 p.m.} The culmination of the course will be a final project of your choosing. For this project, you should choose some task or problem relevant to your research interests, and write a program which solves this problem. You will be asked to give a short presentation outlining the problem and your solution to it, and finally will turn in a working program with thorough documentation. Depending on the scope of the problem, it may be suitable to only solve a particular subset of the possible scenarios. Please begin to think about possible projects as soon as possible and discuss them with me. Examples of possible projects: \begin{itemize} \item Compare speech styles of Shakespeare characters in different translations \item Compare the use of hedges in 2 different genres of text \item Compute phonotactic probabilities from a lexicon \item Write a program which performs a complex search of a linguistic corpus or database and computes some sort of statistics about it \item Write a program which translates a corpus from one transcription/tagging system into another \item Write a program which selects stimuli from a database for a psycholinguistic experiment based on a number of different criteria \item Develop a custom GUI application to control psycholinguistic experiments \end{itemize} \subsection{Calendar (tentative)} \begin{longtable}{l p{.4\textwidth} p{.3\textwidth}} \caption{Course calendar, showing topics and readings}\\ \label{calendar} \textbf{date} & \textbf{topic} & \textbf{reading} \\ \endfirsthead \textbf{date} & \textbf{topic} & \textbf{reading} \\ \endhead Tue Aug 25 & Why programming and linguistics? Installing necessary programs & none \\ Thu Aug 27 & Intro to UNIX --- finding, reading, and writing to files& UNIX Ch. 3\\ Tue Sep 1 & More Unix intro - pipes & UNIX Ch. 5 \\ Thu Sep 3 & Regular expressions & \href{http://robfelty.com/subversion/ling5200/regex/regex.pdf}{regex.pdf}\\ Tue Sep 8 & Common unix utilities & UNIX Chs. 4, 7\\ Thu Sep 10 & Version control and subversion& \\ Tue Sep 15 & Getting started with Python and the NLTK & NLTK 1-1.1\\ Thu Sep 17 & Variables, Strings, and Lists\newline Word frequency& 1.2-1.3 \\ Tue Sep 22 & Conditionals\newline Natural Language Processing & NLTK 1.4-1.7\\ Thu Sep 24 & Text Corpora and Frequency& NLTK 2.1-2.2 \\ Tue Sep 29 & Functions and modules\newline wordlists& NLTK 2.3-2.4 \\ Thu Oct 1 & Semantic relations& NLTK 2.5-2.6 \\ Tue Oct 6 & Processing raw text& NLTK 3.1, \href{http://diveintopython.org/}{Dive Into Python} 10.6, ps. 143--146\\ Thu Oct 8 & String operations& NLTK 3.2 \\ Tue Oct 13 & Unicode and regular expressions & NLTK 3.3-3.4 \\ Thu Oct 15 & Tokenizing and normalization& NLTK 3.5-3.7 \\ Tue Oct 22 & Shell integration\newline More on Lists and Strings& NLTK 3.9\\ Thu Oct 27 & Programming review & NLTK 4.1-4.3 \\ Tue Oct 29 & More on functions& NLTK 4.4--4.5\\ Thu Nov 3 & More on modules and algorithms& NLTK 4.6-4.7 \\ Tue Nov 5 & Sample modules& NLTK 4.8-4.9 \\ Thu Nov 10 & Tagged Corpora\newline Python dictionaries\newline Handling exceptions& NLTK 5.1-5.3,\href{http://diveintopython.org/}{Dive Into Python} 6.1, ps. 64--66 \\ Tue Nov 12 & Automatic tagging\newline & NLTK 5.4-5.7 \\ Thu Nov 17 & Supervised classification& NLTK 6.1-6.2 \\ Tue Nov 19 & Categorization Evaluation and decision trees& NLTK 6.3-6.4 \\ Tue Nov 24 & ENJOY YOUR FALL BREAK& \\ Thu Nov 26 &HAPPY THANKSGIVING & \\ Tue Dec 1 & Bayes and Maximum entropy classifiers & NLTK 6.5-6.7 \\ Thu Dec 3 & Information extraction and chunking & 7.1-7.2\\ Tue Dec 8 & Student presentations& none \\ Thu Dec 10 & Student presentations& none \\ \end{longtable} \label{calendar} \section{Other Policies} \subsection{Students with disabilities} If you qualify for accommodations because of a disability, please submit to me a letter from Disability Services in a timely manner so that your needs be addressed. Disability Services determines accommodations based on documented disabilities. Contact: 303-492-8671, Willard 322, and \url{http://www.Colorado.EDU/disabilityservices} If you have a temporary medical condition or injury, see guidelines at \url{http://www.colorado.edu/disabilityservices/go.cgi?select=temporary.html} Disability Services' letters for students with disabilities indicate legally mandated reasonable accommodations. The syllabus statements and answers to Frequently Asked Questions can be found at \url{http://www.colorado.edu/disabilityservices} \end{document}