This unit is about Web Search & Text Analysis.
|Weeks 1-6||Basics concepts for Information Retrieval(IR) and Web Search.|
|Weeks 7-12||Text Analysis & NLP |
📄️ Information Retrieval
We discuss "structured" and "unstructured" data. The applications and tasks that can be performed by search engines. And the main issues associated with information retrieval and search engine.
📄️ Text Processing
work in progress
📄️ Information Extraction
work in progress
📄️ Abstract Model of Ranking
📄️ A More Concrete Ranking Model
🗃️ Inverted Indexes
📄️ Auxiliary Structures
Inverted lists usually stored together in a single file for efficiency.
📄️ Query Processing
Explore query processing techniques: document-at-a-time and term-at-a-time.
Part One: Web Search & IR
- Architecture of a search engine
- Basic concepts for text processing
- Information Retrieval
- Evaluate search results & IR models
Part Two: Text Analysis
- Supervised methods:
- Information filitering
- Text classification
- Relevant discovery
- Un-supervised methods:
- Text feature selection
- Topic modelling
- Sentiment analysis
- Document summarization
Why Do We Care?
- More than 80% of data that contain a large amount of knowledge is waitting for being extracted.
- There are many different types of data. They extends beyond structured data, including unstructured data:
- log files
HTML vs. XML
HTML is a language for marking up text for presentation.
XML(eXtensible Markup Language) is a language for describing data/content. In other words, it does not describe how to present it. Therefore it make Internet data machine-readable.
Cannot find definitions for "prong".
Cannot find definitions for "antler".