Instructor: Sakol Suethanapornkul
Email: suesakol@staff.tu.ac.th
Office Hours: W & TH from 1 p.m. to 4 p.m.
Room: Online via Zoom
Time: W 9:30-12:30
Credits: 3/48 hours
The course syllabus provides a general plan for the course; some modifications may be necessary in response to students’ needs and classroom interaction.
แนวคิดพื้นฐานทั่วไปทางภาษาศาสตร์คอมพิวเตอร์ และความสำคัญของการใช้คลังข้อมูลทางภาษา ฝึกทักษะเบื้องต้นการใช้คอมพิวเตอร์ในการเก็บและวิเคราะห์ข้อมูลทางภาษาและภาษาศาสตร์
The course introduces students to computational linguistics (CL)/natural language processing (NLP), with a particular emphasis on concepts, models, and algorithms that aid the analysis of natural languages. The course aims to blend theoretical discussions with hands-on training in text processing using Python. We will begin by learning how computers encode written and spoken language. We will then examine the applications of CL/NLP methods to a range of linguistic phenomena. The topics we will cover include tokenization, part-of-speech tagging, parsing, corpus exploration, and others. Hands-on components of the course include learning basic programming in Python.
By the end of this course, you should be able to do the following:
Collaboration is encouraged. Students are strongly encouraged to collaborate on their assignments. However, each student must submit their own original work (both homework assignments and programming exercises). Additionally, students who have worked with their friends must acknowledge their collaboration in their submission, unless stated otherwise in the assignment.
Timely submission of assignments is expected. All work must be submitted before the due date as stated on the syllabus. I have the right to refuse acceptance of any late assignment. Please communicate with me early if you anticipate having trouble completing any assignment in a timely manner.
Plagiarism is not tolerated. Plagiarizing other people’s work in an assignment results in an automatic zero for that assignment.
Completion of weekly readings before class is critical. You are expected to complete readings before class and come to class prepared to ask questions and/or participate in class discussion.
Successful completion of LG 211 Introduction to Linguistics (with C or above) is required. You do not need to have any programming experience to complete this course. You are expected, however, to bring your own laptop to class and ensure that it runs either Windows 10 or Mac OS 10.15 (Catalina) or above. Tablets or smartphones are generally not recommended. Talk to me if you need help with the university’s laptop assistance program. 🥰
In most cases, each class meeting will have lecture and practice components. During the first half of each class, we will cover topics presented in textbooks (L & C or J & M). As for the other half, you will receive hands-on training with Python and Natural Language Toolkit (NLTK).
Course management is done through this course website and MS Teams. You can obtain course materials (syllabus, slides, assignments, etc.) from this website. Assignment submissions, class announcements, and grades will be handled through MS Teams.
In this course, I assign grades based on how well students perform. The grading scheme outlines key letter grades:
Grades | Points | Grades | Points | |||||
---|---|---|---|---|---|---|---|---|
A | 85-100% | C | 65-69.99 | |||||
B+ | 80-84.99 | D+ | 60-64.99 | |||||
B | 75-79.99 | D | 55-59.99 | |||||
C+ | 70-74.99 | F | 0-54.99 |
Requirements | Percent | Points | Note |
---|---|---|---|
Homework assignments | 50% | 10 points each, 6 in total | |
Programming exercises | 40% | 10 points each, 5-6 planned | Exercise with lowest mark dropped1 |
Participation | 5% | ||
Attendance | 5% | Up to one unexcused absence allowed |
Homework (i.e., assignments and exercises) is assigned on an approximately weekly basis. Typically, it will be distributed on Wednesday after class and due Saturday night of the same week. Extensions may be granted only in special circumstances (e.g., family or medical emergencies). Please communicate these to me in advance whenever possible.
Both assignments and exercises will be weighted equally in their respective categories. The assignment with the lowest mark will be dropped.
To receive full credit for attendance and participation, you are expected to come to class on time and engage meaningfully with the class content, including participating in class discussion.
You are allowed no more than one unexcused absence without affecting your course grade. This does not include excused absences due to illness or religious observances.
Weeks | Dates | Topics | Readings | Assignments |
---|---|---|---|---|
1 | 08/11/21 | Course introduction [Introduction] |
HW 1: Key terms | |
2 | 08/18/21 | Encoding language & Python installation [Encoding, Installation] |
L & C Ch. 1 (Sec. 1.1 and 1.3) |
HW 2: Unicode |
3 | 08/25/21 | Python basics: Strings and string methods [The Basics] [Python code: CL3.py] |
NLTK Ch. 1 | Exercise 1: Strings |
4 | 09/01/21 | Text normalization [Tokenization, Control & Conditions] [Python code: CL4.py] |
L & C Ch. 3 (Sec. 3.3 and 3.4) |
Exercise 2: Palindrome |
5 | 09/08/21 | Text normalization (cont.) [Tokenization (2) & etc.] [Python code: CL5.py] |
NLTK Ch. 3 (Sec 3.1, 3.2, 3.6) |
HW 3: Text normalization |
6 | 09/15/21 | Regular expressions (Regex) & FSA [Regex] [Python code: CL6.py] |
J & M Ch. 2 (Sec. 2.1) L & C Ch. 4 (Sec. 4.4) |
Exercise 3: RE |
7 | 09/22/21 | Regex (cont.) & Corpora [Regex (2), Corpus (1)] [Python code: CL7.py] |
No assignment | |
8 | 09/29/21 | Mid-term examination week | ||
9 | 10/06/21 | Corpus exploration [Corpus (2)] [Python code: CL8.py] |
HW 4: Frequency counts (HW4.zip) |
|
10 | 10/13/21 | Holiday: King Bhumibol Memorial Day | ||
11 | 10/20/21 | Part-of-speech tagging [POS (1)] |
NLTK Ch. 5 | HW 5: POS tagging with PTB |
12 | 10/27/21 | Part-of-speech tagging [POS (2)] [Python code: CL10.py] |
J & M Ch. 8 (Secs. 8.4.1-8.4.4) |
Exercise 4: POS tagging |
13 | 11/03/21 | Syntactic parsing [Parsing (1)] [Python code: CL11.py] |
J & M Ch. 12 (Secs. 12.1, 12.2, 12.4) |
HW 6: CFG rules |
14 | 11/10/21 | Mental Health Week Relax, meditate, and catch up on your sleep! |
❤️ | No assignment |
15 | 11/17/21 | Syntactic parsing [Parsing (2)] [Python code: CL12.py] |
J & M Ch. 14 (Secs. 14.1-14.3) |
Exercise 5: Dependency |
16 | 11/24/21 | Biases in NLP & looking ahead [Wrap-up] |
No assignment |