[Feature] Enhancing face id Correlation for Video #5936

zacharyvmm · 2023-12-22T22:13:39Z

zacharyvmm
Dec 22, 2023

The feature

In my observations, it appears that the correlation with face id is established based on the first frame of the video. While this is a straightforward and easily implementable solution, there are some potential issues.

Notably, if the video doesn't commence with a person, there could be various reasons for this, including the camera being out of focus or the video starting by showcasing the floor (as is common in many family videos).

To address this, two potential solutions come to mind:

Generate multiple thumbnails for each video.
Allow users to manually associate videos and images in cases where the AI doesn't automatically pick up on the relevant content.

Ideally, the implementation would incorporate both solutions.

Platform

Server
Web
Mobile

mertalev · 2023-12-22T22:26:08Z

mertalev
Dec 22, 2023
Collaborator

We're considering (1) for both smart search and facial recognition. The advantages you've raised apply to both. Additionally, for smart search, it would mean you could search for a specific scene in a video.

Another possible feature could be to do machine learning on-the-fly when pausing a video, showing the detected faces for that frame. This could also be used to add OCR for a frame once that machine learning task has been added.

1 reply

adasium Mar 16, 2025

I think (2) is equally as important, if not more because:

AI might not detect faces
- I assume this can happen no matter the frame so I'd be left with missing faces and I don't want to meddle with face detection settings or searching for the split frame the person was visible or rerun jobs just to tag one photo when manual fix would be faster
I'd like to tag people that are not in the photo (for example the cameraman or people I was with when the video was taken but are not included in the frame)
- this would make searching for specific photos easier

DaanHend · 2024-03-14T15:48:35Z

DaanHend
Mar 14, 2024

Would be a nice additition to have facial recognition work on videos 💯

#7953

0 replies

aviv926 · 2024-03-29T12:52:49Z

aviv926
Mar 29, 2024
Collaborator

See also corresponding thread on Discord:
https://discord.com/channels/979116623879368755/1222898437490479104

0 replies

Azuraell · 2024-05-15T16:25:54Z

Azuraell
May 15, 2024

This feature would be amazing on Immich. I am currently searching for pictures of a deceased cherished relative, and it is so much work. Also I know I am missing a lot because I can't watch every single video.

0 replies

tit1 · 2024-06-19T16:43:56Z

tit1
Jun 19, 2024

One approach to doing this and IMO may be fairly simple is to use ffmpeg to extract screen shots from the video, and run those through your same systems you have in place for face recognition and smart search. The images could be removed after or saved as a stack of reference images for the video. The tricky part is determining the time interval for grabbing images. You could have this as as system setting changed by the user or a fixed rate that could be changed in the UI video by video.

1 reply

mrAndream Apr 5, 2025

Add face detection for entire videos:

Extract frames every X seconds (configurable).
Detect faces using existing models.
Cluster same faces into one person.
Cluster same faces timing into one time slots.
Save timestamps and coordinates.
Delete extracted frames after processing.

Notes
Use FFmpeg for frame extraction.
Extend existing face recognition system.

austinbaccus · 2025-01-13T20:10:12Z

austinbaccus
Jan 13, 2025

Any updates on this?

1 reply

marvil6 Jul 25, 2025

I would love this feature as well.

TKaluza · 2025-03-17T18:53:45Z

TKaluza
Mar 17, 2025

Maybe this is interesting:
https://arxiv.org/pdf/2502.03183

4 replies

bo0tzz Mar 17, 2025
Maintainer

That's a very neat approach! Do I understand correctly that it still samples frames by count or keyframe at the start, and so avoids generating an unused embedding for all/many frames?

DaanHend Mar 18, 2025

Interesting!

It seems like it creates vectors for each frame, then picks the ones that best represent the whole 'vector timeline.' Those key frames could be really useful for ML in Immich. Seems more efficient than processing every single frame with ML. If that vector conversion would be simple then...

bo0tzz Mar 18, 2025
Maintainer

It seems like it creates vectors for each frame

That's not my impression of the paper. Doing that would be very slow because generating the vector embedding is expensive.

mertalev Mar 18, 2025
Collaborator

It does generate embeddings with CLIP first. I think the point of that paper is to optimize for fewer frames passed to the VLLM, which is much slower than CLIP. This apparently increases the VLLM result quality as well. For our use, if you sample one frame every second and generate an embedding for it, then you're already done without needing anything in this paper.

ireun · 2025-05-23T19:06:05Z

ireun
May 23, 2025

I think what is needed for Immich has already been implemented for Jellyfin - it is called Trickplay, and it is used to help with seeking timeline. It creates a file with frames every x seconds, I think such image could be passed into the face detection/recognition.

Here is how it looks:

You may want to look at this (and related) commit jellyfin/jellyfin@ca7d1a1

https://deepwiki.com/search/explain-trickplay_863319de-15e3-4cc6-83d4-048c6773bff2

3 replies

tylers-username Jun 15, 2025

There's something more to consider—voice detection.

Often it's, in this example, Mom is or Dad holding the phone talking to the subject of the video but you never see them. The sentimental value and job to be done here, is that let's say Mom or Dad passes, you'd be able to find videos of them that would have otherwise remained buried.

ireun Sep 18, 2025

I've looked at jellyfin implementation and it seems to be pretty simple all through ffmpeg:

have a frame every N-seconds (https://ffmpeg.org/ffmpeg-filters.html#fps-1) simply fps=1/N
1.1. Or maybe have N frames for a video? (0:00:00, the end, and N-2 in between) though I think ^^^^ would be better
generate JPEGs using image2 encoder (https://ffmpeg.org/ffmpeg-formats.html#image2_002c-image2pipe)
Feed JPEGs to ML buffalo or what not

Example: ffmpeg.exe -i .\35f4037bcf569a1c63d7aeed2faf744c.mp4 -vf fps=1/1 -f image2 "out%08d.jpg"

More work would be needed probably to prepare api/model/ui to make it work nicely (show only one face for a person in UI? maybe lower max-distance for matching faces coming from one video?)

DankMemeGuy Oct 31, 2025

There's something more to consider—voice detection.

Often it's, in this example, Mom is or Dad holding the phone talking to the subject of the video but you never see them. The sentimental value and job to be done here, is that let's say Mom or Dad passes, you'd be able to find videos of them that would have otherwise remained buried.

I think this is unrelated to what this discussion is about. But using Whisper to transcribe audio so that Smart Search can allow you to search words would help, and I think you might be able to identify voices using Diarization: https://www.youtube.com/watch?v=aC4LYberXys

That all being said, the issues are this:

Face recognition only occurs for the first frame -> problem that requires every frame (or every nth) to be scanned
Videos don't have much scanning for anything deep -> problem that requires scanning so you can search the transcript, scene, speaker

The two aren't distinctly related.

unpleased · 2025-11-14T10:01:41Z

unpleased
Nov 14, 2025

Any update on this? It's been years

9 replies

bo0tzz Nov 14, 2025
Maintainer

@devasheeshG as long as you're the same person that was asking about this feature on Discord today then you're the only one, go for it!

btw he was just asking if there has been any update or not on this feature

If there are any updates it'll be plenty clear from discussion on this thread; if there are none, asking for "any updates" unnecessarily notifies everyone subscribed to this thread.

devasheeshG Nov 14, 2025

@devasheeshG as long as you're the same person that was asking about this feature on Discord today then you're the only one, go for it!

no, i was not asking for any updates on discord.

ok then, i'll look into it, draft a solution, and share a summary here, then you can confirm if that's good, then i'll just go ahead and implement that

If there are any updates it'll be plenty clear from discussion on this thread; if there are none, asking for "any updates" unnecessarily notifies everyone subscribed to this thread.

ig he was thinking maybe this is already implemented bts; i also thought so initially lmao

eWOOD29 Dec 8, 2025

@devasheeshG as long as you're the same person that was asking about this feature on Discord today then you're the only one, go for it!

no, i was not asking for any updates on discord.

ok then, i'll look into it, draft a solution, and share a summary here, then you can confirm if that's good, then i'll just go ahead and implement that

If there are any updates it'll be plenty clear from discussion on this thread; if there are none, asking for "any updates" unnecessarily notifies everyone subscribed to this thread.

ig he was thinking maybe this is already implemented bts; i also thought so initially lmao

Any progress on implementing this? I would love this feature.

bo0tzz Dec 8, 2025
Maintainer

@eWOOD29 that question was already asked and answered above. https://justinmayer.com/posts/any-updates/

ireun Dec 8, 2025

@bo0tzz maybe it's time to start marking some comments here as 'off-topic'

Uh oh!

[Feature] Enhancing face id Correlation for Video #5936

Uh oh!

The feature

Platform

Replies: 9 comments · 19 replies

Uh oh!

mertalev Dec 22, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

aviv926 Mar 29, 2024 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bo0tzz Mar 17, 2025 Maintainer

Uh oh!

Uh oh!

bo0tzz Mar 18, 2025 Maintainer

Uh oh!

Uh oh!

mertalev Mar 18, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bo0tzz Nov 14, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

bo0tzz Dec 8, 2025 Maintainer

Uh oh!

Replies: 9 comments 19 replies

mertalev
Dec 22, 2023
Collaborator

aviv926
Mar 29, 2024
Collaborator

bo0tzz Mar 17, 2025
Maintainer

bo0tzz Mar 18, 2025
Maintainer

mertalev Mar 18, 2025
Collaborator

bo0tzz Nov 14, 2025
Maintainer

bo0tzz Dec 8, 2025
Maintainer