Description

Course Format: Lectures will be conducted in person. Joining lectures remotely by Zoom may be possible but only in exceptional circumstances (e.g., an illness or an emergency). A seminar course on the evaluation of natural language processing systems. We will survey issues related to validity and reliability of NLP systems, focusing on current practices in the research community. What do researchers aim to capture in their measurements of NLP systems? Do current evaluations actually fulfill the measurement goals of the researchers? Are the evaluations reliable and trustworthy? The course will include student-led presentations and discussions, analyses of existing evaluations, and a final course project. Topics may include: • Existing evaluation paradigms and metric development • Dataset creation • Reliability and statistical testing • Documentation efforts • Measurement theory • Social science critiques of papers • Validity frameworks

Learning Outcomes

By the end of the course, students should have an in-depth understanding of the evaluation landscape in modern natural language processing research. They should understand the reasons why evaluations are performed in NLP. They should be able to critically analyze the strengths and weaknesses of evaluation approaches in existing literature. They should understand the trade-offs that must be made when deciding on the best evaluation approach, and be able to propose evaluations of their own.

Evaluation

In-class participation 5% Paper presentation 1 15% Paper critique 1 15% Paper presentation or critique 2 15% Midterm proposal 5% Final project presentation 15% Final project report 30% Students who receive unsatisfactory final grades will not have the option to submit additional work in order to improve their grades.

In-class Participation

Class attendance and participation are mandatory. Active engagement and discussions are expected. Paper Presentations Each student will present academic research papers in-class during the term. Presentations will include a summary of the paper, thoughts on its strengths and weaknesses, and leading a discussion among the class.

Paper Critique

Students will write a critique of an existing paper in NLP, focusing on its evaluation. They will summarize the paper in brief, then analyze it using an existing framework or approach to thinking about evaluation reliability, validity, or cost.

Final Project

Each student will individually conduct a final research project in the subject area of the course. A written proposal is expected in the middle of the term, followed by a final project presentation and a final report submission at the end of the term.

Late policy

Late submissions are accepted at the discretion of the instructor. Please inform the instructor in the event of unexpected circumstances that will affect your submission of coursework.

Plagiarism Policy

You must include your name and McGill ID number at the top of each program or module that you implement and submit. By doing so, you are certifying that the program or module is entirely your own, and represents only the result of your own efforts.

Work submitted for this course must represent your own efforts

You must not work in groups unless otherwise stated. You must not copy any other person’s work in any manner (electronically or otherwise), even if this work is in the public domain or you have permission from its author to use it and/or modify it in your own work. The only exceptions are for source code supplied by the instructor explicitly for an assignment, and for the final project, where usual research citation practices apply. Furthermore, you must not give a copy of your work to any other person.

Use of generative AI technologies.

Unless otherwise specified by the instructor, you may use generative AI technologies (e.g., Chat-GPT, Bard) only in an assistive manner (e.g., to help understand course content, to search for information, to help brainstorm ideas, or to edit writing). You may not use such technologies as the primary means with which to complete coursework, including your final project. All substantive uses of these tools must be acknowledged in your submissions.

The plagiarism policy is not meant to discourage interaction or discussion among students.

You are encouraged to discuss assignment questions with the instructor, TA (if any), and your fellow students. However, there is a difference between discussing ideas and working in groups or copying someone else’s solution. A good rule of thumb is that when you discuss assignments with your fellow students, you should not leave the discussion with written notes. Also, when you write your solution to an assignment, you should do it on your own.

We may use automated software similarity detection tools to compare your assignment submissions to

that of all other students registered in the course, and these tools are very effective at what they have been designed for. However, note that the main use of these tools is to determine which submissions should be manually checked for similarity by an instructor or TA; we will not accuse anyone of copying or working in groups based solely on the output of these tools.

You may also be asked to present and explain your submissions to an instructor at any time.

Students who put their name on programs or modules that are not entirely their own work will be referred to the appropriate university official who will assess the need for disciplinary action.

Language of Submission

“In accord with McGill University’s Charter of Students’ Rights, students in this course have the right to submit in English or in French any written work that is to be graded. This does not apply to courses in which acquiring proficiency in a language is one of the objectives.” « Conformément à la Charte des droits de l’étudiant de l’Université McGill, chaque étudiant a le droit de soumettre en français ou en anglais tout travail écrit devant être noté (sauf dans le cas des cours dont l’un des objets est la maîtrise d’une langue). »