Squirrel or Skunk? NLP Models Face the Long Tail of Language
Prof. Nathan Schneider
Georgetown University
The NLP Reading Group is delighted to have Prof. Nathan Schneider give a talk about NLP models and the long tail phenomenon of language.
Talk Description
Natural language is chock-full of rare events, which can challenge NLP models based on supervised learning. I will present advances in modeling and evaluation of “long tail” phenomena in grammar and meaning: retrieving rare senses of words (BlackboxNLP 2021); tagging words with complex syntactic categories (TACL 2021); and calibrating model confidence scores for sparse tagsets (Findings of EMNLP 2021).
This is joint work with Luke Gessler, Michael Kranzlein, Nelson Liu, Jakob Prange, and Vivek Srikumar.
Speaker Bio
Nathan Schneider is an annotation schemer and computational modeler for natural language. As Associate Professor of Linguistics and Computer Science at Georgetown University, he looks for synergies between practical language technologies and the scientific study of language. He specializes in broad-coverage linguistic analysis: designing linguistic representations of grammar and meaning, annotating them in corpora, and automating them with natural language processing techniques. A central focus in this research is the nexus between grammar and lexicon as manifested in multiword expressions and adpositions/case markers. Among his favorite acronyms are AMR, CCG, CxG, GUCL, SNACS, and UD. He is an NSF CAREER award recipient and has served the computational linguistics community in various ways, having chaired SemEval, the Linguistic Annotation Workshop, and GURT/SyntaxFest; served on the board of SIGLEX; and served as an action editor for TACL and ARR. He has inhabited UC Berkeley (BA in Computer Science and Linguistics), Carnegie Mellon University (Ph.D. in Language Technologies), and the University of Edinburgh (postdoc). Now a Hoya and leader of NERT, he continues to play with data and algorithms for linguistic meaning.
Logistics
Date: June 26th
Time: 11:00AM
Location: Auditorium 1 or via Zoom (See email)