Sequential Topic Segmentation / Session Chapters

## Use Case
_Please provide a use case to help us understand your request in context_

YouTube has a "Video Chapters" feature that splits the timeline bar into chapters based off of timestamps found in the video description. Example:

![chapter-example](https://user-images.githubusercontent.com/17132317/114111212-4d9f5e00-988e-11eb-8034-efb1da87cdee.png)

Similarly, it would be incredibly useful to jump around a meeting video / transcript based off of the minutes items of the meeting.

## Solution
_Please describe your ideal solution_

Going to take a lot of work on the backend side and a bit of work on the front-end.

We could be fancy and train a topic model or use some sort of seeded clustering, and we likely will at some point but as a first past implementation, it may be interesting to see how far the following gets us:

Look for common phrases: "Moving on to...", "Call the roll", "Attendance", etc. and apply breakpoints there.
Additionally, parse all the minutes item attachments (docs, presentations, etc.) for every minutes item for an event and store the list of words UNIQUE to a specific minutes item. Then compare the transcript for those words. Find the breakpoints by taking a moving window sum of the counts of each of the unique words for a given minutes item against the transcript.

I.e.

```
"minutes_item_1": ["municipal", "broadband", "light"],
"minutes_item_2": ["it", "department"],
```

```
lets talk about the municipal broadband bill that would enable seattle city light to serve customers with broadband...
...
moving on to funding the seattle IT department...
...
```

The moving window word count would be able to see that at some point we switch from using specific words found in minutes item 1 to using specific words found in minutes item 2. If we can combine that with looking for the "section splitter sequences" ("moving on", "call the roll", etc) I think it may be a good first pass, fast and cheap chapter identifier.

Then store chapter indentifiers as annotations in the transcript for the frontend to parse.

## Alternatives
_Please describe any alternatives you've considered, even if you've dismissed them_

Topic modeling? Clustering?

Additionally, we should let whatever pipeline we create the ability to skip this if chapter starts are provided by user as Seattle Channel event descriptions have them in most cases now-a-days.

## Stakeholders
_Please add any individual person or team's that should be brought in for discussion on the project_

Frontend to actually make the video chapters viewer.
Backend for both pipeline and transcript mutation.

## Major Components
_Please add any major components that need to be done for this project_
* [ ] Function to get unique n-grams from minutes item attachments for all minutes items in event
* [ ] Function to apply moving window sum using unique n-grams
* [ ] Function to find "common" section splitter sequences"
* [ ] Function to weight and merge, moving window sum and section splitter sequences
* [ ] Function to store into transcript as annotation
* [ ] Pipeline to wrap the whole thing (may include as part of primary event processing with option to not?)
* [ ] Frontend video timeline to parse and use chapter annotations

## Dependencies
_Please add any other major or minor project dependencies here_
* [ ] **(required)** CouncilDataProject/cdp-roadmap#6

## Other Notes
_Please add any extra notes here_

My one concern is how to handle many-session events. We only store minutes items on the event level and not on the session level, but we will need to find a way to gracefull handle this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sequential Topic Segmentation / Session Chapters #9

Use Case

Solution

Alternatives

Stakeholders

Major Components

Dependencies

Other Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sequential Topic Segmentation / Session Chapters #9

Description

Use Case

Solution

Alternatives

Stakeholders

Major Components

Dependencies

Other Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions