|
Search CALO Publications:
Author and Title
Charles Sutton and Andrew McCallum. An Introduction to Conditional Random Fields for Relational Learning.
Abstract
Relational data has two characteristics: first, statistical dependencies exist between the entities we wish to model, and second, each entity often has a rich set of features that can aid classification. For example, when classifying Web documents, the page’s text provides much information about the class label, but hyperlinks define a relationship between pages that can improve classification [Taskar et al., 2002]. Graphical models are a natural formalism for exploiting the dependence structure among entities. Traditionally, graphical models have been used to represent the
joint probability distribution p(y, x), where the variables y represent the attributes of the entities that we wish to predict, and the input variables x represent our observed knowledge about the entities. But modeling the joint distribution can lead to difficulties when using the rich local features that can occur in relational data, because it requires modeling the distribution p(x), which can include complex dependencies. Modeling these dependencies among inputs can lead to intractable models, but ignoring them can lead to reduced performance. A solution to this problem is to directly model the conditional distribution p(y|x), which is sufficient for classification. This is the approach taken by conditional random fields [Lafferty et al., 2001]. A conditional random field is simply a conditional
distribution p(y|x) with an associated graphical structure. Because the model is conditional, dependencies among the input variables x do not need to be explicitly represented, affording the use of rich, global features of the input. For example, in natural language tasks, useful features include neighboring words and word bigrams, prefixes and suffixes, capitalization, membership in domain-specific lexicons, and semantic information from sources such as WordNet. Recently there has been an explosion of interest in CRFs, with successful applications including text processing [Taskar et al., 2002, Peng and McCallum, 2004, Settles, 2005, Sha and Pereira, 2003], bioinformatics [Sato and Sakakibara, 2005, Liu et al., 2005], and computer vision [He et al., 2004, Kumar and Hebert, 2003].
This chapter is divided into two parts. First, we present a tutorial on current training and inference techniques for conditional random fields. We discuss the important special case of linear-chain CRFs, and then we generalize these to arbitrary graphical structures. We include a brief discussion of techniques for practical CRF implementations.
Second, we present an example of applying a general CRF to a practical relational learning problem. In particular, we discuss the problem of information extraction, that is, automatically building a relational database from information contained in unstructured text. Unlike linear-chain models, general CRFs can capture long distance dependencies between labels. For example, if the same name is mentioned more than once in a document, all mentions probably have the same label, and it
is useful to extract them all, because each mention may contain different complementary information about the underlying entity. To represent these long-distance dependencies, we propose a skip-chain CRF, a model that jointly performs segmentation and collective labeling of extracted mentions. On a standard problem of extracting speaker names from seminar announcements, the skip-chain CRF has better performance than a linear-chain CRF.
Download
|