Search Engine Technology
This unit is about Web Search & Text Analysis.
Weeks | Content |
---|---|
Weeks 1-6 | Basics concepts for Information Retrieval(IR) and Web Search. |
Weeks 7-12 | Text Analysis & NLP
|
๐๏ธ Information Retrieval
We discuss "structured" and "unstructured" data. The applications and tasks that can be performed by search engines. And the main issues associated with information retrieval and search engine.
๐๏ธ text-statistics
๐๏ธ Text Processing
work in progress
๐๏ธ Information Extraction
work in progress
๐๏ธ Abstract Model of Ranking
abstract-model-of-ranking
๐๏ธ A More Concrete Ranking Model
a-more-concrete-model-of-ranking
๐๏ธ Inverted Indexes
4 items
๐๏ธ Auxiliary Structures
Inverted lists usually stored together in a single file for efficiency.
๐๏ธ index-construction
๐๏ธ Query Processing
Explore query processing techniques: document-at-a-time and term-at-a-time.
Part One: Web Search & IRโ
- Architecture of a search engine
- Basic concepts for text processing
- Information Retrieval
- Evaluate search results & IR models
Part Two: Text Analysisโ
- Supervised methods:
- Information filitering
- Text classification
- Relevant discovery
- Un-supervised methods:
- Text feature selection
- Topic modelling
- Sentiment analysis
- Document summarization
Why Do We Care?โ
- More than 80% of data that contain a large amount of knowledge is waitting for being extracted.
- There are many different types of data. They extends beyond structured data, including unstructured data:
- text
- audio
- video
- log files
HTML vs. XMLโ
HTML is a language for marking up text for presentation.
XML(eXtensible Markup Language) is a language for describing data/content. In other words, it does not describe how to present it. Therefore it make Internet data machine-readable.
Appendixโ
Readingsโ
Weekly Scheduleโ
Vocabularyโ
prong
noun
- A thin, pointed, projecting part, as of an antler or a fork or similar tool. A tine.
- A branch; a fork.
- The penis.
verb
- To pierce or poke with, or as if with, a prong
antler
noun
- A branching and bony structure on the head of deer, moose and elk, normally in pairs. They are grown and shed each year. (Compare with horn, which is generally not shed.)