At a CAS-hosted workshop, linguists debate approaches to turning text into information that computers can understand.
High on a hilltop overlooking the Oslofjord, the computational linguists were preparing to disagree with one another.
‘So far we agree with everything you have to say,’ Martha Palmer, professor of linguistics and computer science at the University of Colorado at Boulder, said half playfully, half warningly to a speaker during the first of a three-day workshop on approaches to turning sentences into pieces of information that computers can understand.
The agreement didn’t last. By Tuesday, attendees at the workshop were eagerly voicing their criticism and challenging speakers’ methods of dealing with grammatical phenomena.
‘I would flat out disagree with you on that,’ Jerry R. Hobbs, a research professor at the University of Southern California Information Sciences Institute, said during a presentation early on the workshop’s second day.
‘No, that’s just wrong,’ Dan Flickinger, a senior researcher at Stanford University’s Center for the Study of Language and Information, said later that same day.
This was an expected, even welcome turn of events.
The linguists had come to the hilltop -- more specifically a conference hotel outside of Oslo -- to present how they turn sentences into meaning representations. By doing this exercise in a collective setting, the linguists said they hoped they could help one another solve the nagging problems they encounter when analysing sentences that the human brain innately understands but a computer does not, which in turn could lead to new breakthroughs within the field of natural language processing.
The workshop was a part of the CAS project SynSem: From Form to Meaning - Integrating Linguistics and Computing. The project is one of three hosted at the Centre during the 2017/18 academic year, and is led by University of Oslo (UiO) professors Dag Trygve Truslew Haug and Stephan Oepen.
‘By bringing together cutting edge theoretical research into the syntax and semantics of human languages with the technological aspects of computational linguistics, this group of eminent scholars seeks to develop a sophisticated form of machine translation that will greatly benefit artificial intelligence in it various modes and appearances,’ Professor Vigdis Broch-Due, scientific director of CAS, said as she opened the workshop Monday morning.
Most of the workshop sessions followed the same pattern. First, one of the linguists would introduce his or her data set, a large collection of text in which sentences have been annotated with semantic structure (see the thumbnail for an example). Then the presenter would walk through a handful of example sentences, showing how the research team has assigned meaning to individual words and linked those meanings together to form the context of a sentence. Finally (and frequently throughout the presentations), the other linguists would interject with questions and critique.
‘We’re all kind of speaking somewhat different languages,’ Palmer said about the attendees and their research. The workshop, she added, ‘could help nudge us closer together.’
The linguists were frank about the strengths and weaknesses of their data sets and approaches to meaning representation.
‘“Many a linguist has run into this problem,”’ Flickinger said, using a very self-referential sentence as an example of a grammatical phenomenon that is difficult to teach a computer how to understand. ‘I have no idea how to build that “many a.”’
Other attendees chimed in with their own semantic pet peeves, like the ‘nearly’ in the sentence ‘She was nearly run over by a truck,’ the ‘that’ in ‘We knew that they snored,’ and words with spaces such as ‘car park’ or ‘toy car,’ among others.
On the final day of the workshop, Palmer chaired one of the closing sessions. As she flipped through her slideshow presentation, she eventually reached a slide that read ‘Our goal: a whole that is greater than the sum of its parts.’
Though referring to Palmer’s own work at the Center for Computational Language and EducAtion Research (CLEAR), the slide could be seen as an overarching summary of the workshop -- linguists coming together to improve the field of computational semantics by harnessing the strengths of their combined research.
Before heading back to their respective institutions, the linguists made preliminary plans to collaborate on a scholarly paper based on the workshop. Several of them also expressed interest in attending a follow-up workshop in a few years.
Back at CAS a few days after descending from the hilltop, Haug, the co-leader of the project, declared the workshop a success.
‘We saw lots of different approaches to semantic representations -- in other words, methods of representing meaning in a way that computers can understand,’ Haug said. ‘One of our goals was to those approaches together and compare them, and I feel like we absolutely were able to do that.’