DSS: Text Similarity Using Lexical Alignments of Form, Distributional Semantics and Grammatical Relations

Diana McCarthy, Spandana Gella, Siva Reddy

SemEval @ NAACL

Abstract

In this paper we present our systems for the STS task. Our systems are all based on a simple process of identifying the components that correspond between two sentences. Currently we use words (that is word forms), lemmas, distributional similar words and grammatical relations identified with a dependency parser. We submitted three systems. All systems only use open class words. Our first system (alignheuristic) tries to obtain a mapping between every open class token using all the above sources of information. Our second system (wordsim) uses a different algorithm and unlike alignheuristic, it does not use the dependency information. The third system (average) simply takes the average of the scores for each item from the other two systems to take advantage of the merits of both systems. For this reason we only provide a brief description of that. The results are promising, with Pearson’s coefficients on each individual dataset ranging from .3765 to .7761 for our relatively simple heuristics based systems that do not require training on different datasets. We provide some analysis of the results and also provide results for our data using Spearman’s, which as a nonparametric measure which we argue is better able to reflect the merits of the different systems (average is ranked between the others).