- Speaker Diarisation answers the question "who spoke when"?
- For a corpus of audio, determine which clips of audio belong to the same speaker.
- Face Diarisation performs a similar task for faces
- For a corpus of video/images, group faces based on identities.
Face Diarisation Pipeline
For all frames to process
- Run face detection
- Extract cropped faces
- Encode cropped faces to obtain an embedding to represent the face
Finally, cluster embeddings.
Cannot find definitions for "corpus".