Combining Lexical Resources in a Robust Broad-Coverage Semantic Parser John Dowding (UC/Santa Cruz) and Mathew Purver (Stanford University/CSLI) We describe an on-going effort to produce the lexicon for a robust broad-coverage semantic parser by combining syntactic and semantic information from several publically available lexical resources. This parser is motivated by a need to extract propositional content from human-human meetings, as part of DARPA's CALO project. Extracting this content requires a broad-coverage lexicon, since the meeting topics are not determined in advance. The parser is applied to highly errorful speech recognition results (30%-40% word error rates, so it must be robust. These speech recognition results are representated as Word Confusion Networks, each of which may encode a large number of potential utterance hypotheses, so the parser must be fast. For these reasons, we decided on an approach that would depend heavily on the lexicon, with a relatively impoverished set of grammatical rules, focusing on extract basic predicate-argument structure, with less attention paid to more varied syntactic forms. The resources we are currently using are COMLEX, VerbNet, WordNet, and NomLex. These resources each provide unique types of syntactic and semantic information: - COMLEX intends to provide detailed syntactic information for the 40,000 most common words of English. We extract from COMLEX lexical information for 4,200 adjectives (gradability and subcategorization), 5,665 verbs (subcategorization), 23,195 nouns (mass/count and temporality), and 3,120 adverbs (syntactic distribution), as well as most closed-class lexical categories. COMLEX also provides morphological varients for irregular forms. - VerbNet provides semantic information for 5,000 verbs. This information includes the verb class, verb frames, thematic roles, syntax-semantic mapping, and selectional restrictions. - From WordNet we identify another 15,539 nouns, and the semantic class information for all nouns. These semantic classes are hand-aligned to the selectional classes used in VerbNet, based on the upper ontology of EuroWordNet. - NOMLEX (and NOMLEXPLUS) provide syntactic information for nominalizations, and information for mapping the noun arguments to the corresponding verb syntactic positions. When combined with VerbNet's selectional restrictions on thematic roles, this provides additional selection for nominalizations. These lexical and grammar rules are converted to the Prolog-based format used in the Gemini framework, which includes a fast bottom-up robust parser in which syntactic and semantic information is applied interleaved. The semantic rules in this grammar produce a Minimal Recusion Semantics representation, motivated by a desire to make the semantic features extracted by the parser available as inputs to further machine learning algorithms for identifying higher-level semantic content, such as the action items that have been assigned, or decisions that have been made. This work is similar to prior work in (SPOT), XEROX, and Swift. It differs from prior work primarily in the inclusion of NOMLEX, and the mapping of nominalizations to verb frames. References VerbNet COMLEX WordNet NomLex Gemini MRS CALO ICSI Meeting Corpus SPOT Xerox Swift