Skip to main content

Information Extraction

Named Entity Recognition

Create semantic annotation of text using XML tags.

Approaches

Rule-Based

  • Uses lexicons (lists of words and phrases)
  • Rules also used to verify and find new entity names

Statistical

Use machine learning techniques to develop rules.

  • training data (manually annotated text)
  • Hidden Markov Model (HMM)

HMM for Extraction

Resolve ambiguity in a word using context (the words that around it)

a collection of transitions between states.

References