Information Extraction
Named Entity Recognition
Create semantic annotation of text using XML tags.
Approaches
Rule-Based
- Uses lexicons (lists of words and phrases)
- Rules also used to verify and find new entity names
Statistical
Use machine learning techniques to develop rules.
- training data (manually annotated text)
- Hidden Markov Model (HMM)
HMM for Extraction
Resolve ambiguity in a word using context (the words that around it)
a collection of transitions between states.