LG 467 Computers in Linguistics (2021/1)

Course information

Instructor:       Sakol Suethanapornkul
Email:              suesakol@staff.tu.ac.th
Office Hours:  W & TH from 1 p.m. to 4 p.m.
Room:             Online via Zoom
Time:               W 9:30-12:30
Credits:           3/48 hours


The course syllabus provides a general plan for the course; some modifications may be necessary in response to students’ needs and classroom interaction.

 

Course Description

แนวคิดพื้นฐานทั่วไปทางภาษาศาสตร์คอมพิวเตอร์ และความสำคัญของการใช้คลังข้อมูลทางภาษา ฝึกทักษะเบื้องต้นการใช้คอมพิวเตอร์ในการเก็บและวิเคราะห์ข้อมูลทางภาษาและภาษาศาสตร์

The course introduces students to computational linguistics (CL)/natural language processing (NLP), with a particular emphasis on concepts, models, and algorithms that aid the analysis of natural languages. The course aims to blend theoretical discussions with hands-on training in text processing using Python. We will begin by learning how computers encode written and spoken language. We will then examine the applications of CL/NLP methods to a range of linguistic phenomena. The topics we will cover include tokenization, part-of-speech tagging, parsing, corpus exploration, and others. Hands-on components of the course include learning basic programming in Python.

 

Course Objectives

By the end of this course, you should be able to do the following:

  1. demonstrate a basic understanding of how computers work with natural languages and support language-related tasks;
  2. recognize and describe key concepts in CL/NLP behind current real-world applications; and
  3. write programs in Python to manipulate and analyze language data

 

Classroom-based Expectations

  1. Collaboration is encouraged. Students are strongly encouraged to collaborate on their assignments. However, each student must submit their own original work (both homework assignments and programming exercises). Additionally, students who have worked with their friends must acknowledge their collaboration in their submission, unless stated otherwise in the assignment.

  2. Timely submission of assignments is expected. All work must be submitted before the due date as stated on the syllabus. I have the right to refuse acceptance of any late assignment. Please communicate with me early if you anticipate having trouble completing any assignment in a timely manner.

  3. Plagiarism is not tolerated. Plagiarizing other people’s work in an assignment results in an automatic zero for that assignment.

  4. Completion of weekly readings before class is critical. You are expected to complete readings before class and come to class prepared to ask questions and/or participate in class discussion.

 

Prerequisites

Successful completion of LG 211 Introduction to Linguistics (with C or above) is required. You do not need to have any programming experience to complete this course. You are expected, however, to bring your own laptop to class and ensure that it runs either Windows 10 or Mac OS 10.15 (Catalina) or above. Tablets or smartphones are generally not recommended. Talk to me if you need help with the university’s laptop assistance program. 🥰

 

Textbooks

  1. Dickinson, M. et al. (2013). Language and computers. West Sussex, UK: Wiley-Blackwell. [L & C for short]
  2. Jurafsky, D., & Martin, J. H. (2020). Speech and language processing (3rd ed. online draft). [J & M for short]
  3. Bird, S. et al. Natural language processing with Python (online edition for Python 3). [NLTK for short]

 

Course organization

In most cases, each class meeting will have lecture and practice components. During the first half of each class, we will cover topics presented in textbooks (L & C or J & M). As for the other half, you will receive hands-on training with Python and Natural Language Toolkit (NLTK).

Course management is done through this course website and MS Teams. You can obtain course materials (syllabus, slides, assignments, etc.) from this website. Assignment submissions, class announcements, and grades will be handled through MS Teams.

 

Assignments and Grading

Grading scale:

In this course, I assign grades based on how well students perform. The grading scheme outlines key letter grades:

Grades Points Grades Points
A 85-100% C 65-69.99
B+ 80-84.99 D+ 60-64.99
B 75-79.99 D 55-59.99
C+ 70-74.99 F 0-54.99

Grade breakdown:

Requirements Percent Points Note
Homework assignments 50% 10 points each, 6 in total
Programming exercises 40% 10 points each, 5-6 planned Exercise with lowest mark dropped1
Participation 5%
Attendance 5% Up to one unexcused absence allowed

Overview:

  • Homework (i.e., assignments and exercises) is assigned on an approximately weekly basis. Typically, it will be distributed on Wednesday after class and due Saturday night of the same week. Extensions may be granted only in special circumstances (e.g., family or medical emergencies). Please communicate these to me in advance whenever possible.

  • Both assignments and exercises will be weighted equally in their respective categories. The assignment with the lowest mark will be dropped.

  • To receive full credit for attendance and participation, you are expected to come to class on time and engage meaningfully with the class content, including participating in class discussion.

  • You are allowed no more than one unexcused absence without affecting your course grade. This does not include excused absences due to illness or religious observances.

 

Class Schedule2

Weeks Dates Topics Readings Assignments
1 08/11/21 Course introduction
[Introduction]
HW 1: Key terms
2 08/18/21 Encoding language & Python installation
[Encoding, Installation]
L & C Ch. 1
(Sec. 1.1 and 1.3)
HW 2: Unicode
3 08/25/21 Python basics: Strings and string methods
[The Basics]
[Python code: CL3.py]
NLTK Ch. 1 Exercise 1: Strings
4 09/01/21 Text normalization
[Tokenization, Control & Conditions]
[Python code: CL4.py]
L & C Ch. 3
(Sec. 3.3 and 3.4)
Exercise 2: Palindrome
5 09/08/21 Text normalization (cont.)
[Tokenization (2) & etc.]
[Python code: CL5.py]
NLTK Ch. 3
(Sec 3.1, 3.2, 3.6)
HW 3: Text normalization
6 09/15/21 Regular expressions (Regex) & FSA
[Regex]
[Python code: CL6.py]
J & M Ch. 2
(Sec. 2.1)
L & C Ch. 4
(Sec. 4.4)
Exercise 3: RE
7 09/22/21 Regex (cont.) & Corpora
[Regex (2), Corpus (1)]
[Python code: CL7.py]
No assignment
8 09/29/21 Mid-term examination week
9 10/06/21 Corpus exploration
[Corpus (2)]
[Python code: CL8.py]
HW 4: Frequency counts
(HW4.zip)
10 10/13/21 Holiday: King Bhumibol Memorial Day
11 10/20/21 Part-of-speech tagging
[POS (1)]
NLTK Ch. 5 HW 5: POS tagging with PTB
12 10/27/21 Part-of-speech tagging
[POS (2)]
[Python code: CL10.py]
J & M Ch. 8
(Secs. 8.4.1-8.4.4)
Exercise 4: POS tagging
13 11/03/21 Syntactic parsing
[Parsing (1)]
[Python code: CL11.py]
J & M Ch. 12
(Secs. 12.1, 12.2, 12.4)
HW 6: CFG rules
14 11/10/21 Mental Health Week
Relax, meditate, and catch up on your sleep!
❤️ No assignment
15 11/17/21 Syntactic parsing
[Parsing (2)]
[Python code: CL12.py]
J & M Ch. 14
(Secs. 14.1-14.3)
Exercise 5: Dependency
16 11/24/21 Biases in NLP & looking ahead
[Wrap-up]
No assignment

 

Resources

 

Footnotes


  1. This depends on the total number of exercises. I will communicate with you as soon as possible if there is any change to the calculation of the grading. ↩︎

  2. Last day of the semester falls on November 27, 2021. ↩︎