Skip to content

This repository showcases how to leverage the capabilities of LLMs to analyze and extract insights from video files or video URLs, including their audio content, offering several configurable parameters, such as the duration for splitting the video, the number of frames to extract per second, frame resizing, and prompts.

License

Notifications You must be signed in to change notification settings

john-carroll-sw/video-analysis-with-gpt-4o

Β 
Β 

πŸ“Ή Video Analysis with GPT-4o

GitHub Repository

A Streamlit application that leverages AI vision capabilities to analyze video content, extract insights, and enable interactive conversations about visual content.

Video Analysis With LLMs

✨ Features

🎬 Upload

  • Upload local video files (MP4, AVI, MOV) for AI-powered analysis
  • Use a convenient sample video for quick testing and demonstration
  • Analyze videos from URLs (YouTube, etc.) with automatic metadata extraction
  • Reuse previous analyses to save time and processing resources
  • Configure detailed processing parameters for customized analysis

πŸ” Analyze

  • Automated segmentation of videos for detailed frame-by-frame analysis
  • Advanced vision model integration for sophisticated visual understanding
  • Optional audio transcription to incorporate spoken content into analysis
  • Adjustable analysis parameters (segment length, frame rate) for performance optimization
  • Save frames for later reference and review
  • Analyze specific time ranges within longer videos

πŸ’¬ Chat

  • Discuss analysis results with the AI in natural language
  • Ask detailed questions about specific visual content, scenes, or objects
  • Cross-reference insights across different video segments
  • Explore patterns and observations with AI-assisted interpretation

πŸš€ Getting Started

Prerequisites

  • Python 3.8 or higher
  • OpenAI API access with GPT-4o vision capabilities
  • Authentication service (optional)

Installation

  1. Clone the repository:

    git clone https://github.com/john-carroll-sw/video-analysis-with-gpt-4o.git
    cd video-analysis-with-gpt-4o
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use: venv\Scripts\activate
  3. Install the required dependencies:

    pip install -r requirements.txt
  4. Set up your .env file with your Azure OpenAI credentials:

    # Copy the sample environment file and edit it with your credentials
    cp .env.sample .env
    # Now edit the .env file with your preferred editor
    nano .env  # or use any text editor you prefer

Running the Application

Run the Streamlit application:

streamlit run Video_Analysis.py

Open your web browser to http://localhost:8501 to use the application.

πŸ“– Usage Guide

Video Upload

  1. Select your video source (File or URL) in the sidebar
  2. For file upload:
    • Click "Use Sample Video" for quick testing without uploading, OR
    • Upload your own video file through the file uploader
  3. For URL analysis:
    • Paste a YouTube or other video URL in the input field
    • Note: YouTube has protective measures against webscraping that may block access
  4. Review detailed video information in the expandable section
  5. If the video was analyzed previously, choose to load the existing analysis or re-analyze

Video Analysis

  1. Configure analysis parameters in the sidebar:
    • Segment interval: Duration of each video segment for analysis
    • Frames per second: Rate at which frames are captured (0.1-30)
    • Audio transcription: Enable to include spoken content in analysis
    • Frame resize ratio: Reduce image size for processing efficiency
    • Temperature: Adjust AI creativity level
  2. Customize system and user prompts to guide the analysis direction
  3. Optionally specify a time range to analyze only part of the video
  4. Click "Continue to Analysis" to begin processing
  5. View segment-by-segment analysis results as they are generated
  6. Compare AI insights with visual content for each segment

Chat Interface

  1. Navigate to the Chat tab after analysis is complete
  2. Ask open-ended questions about the video content
  3. Request specific information about scenes, objects, or activities
  4. Compare different segments or request summary insights
  5. The AI will reference analyzed frames to provide context-aware responses

πŸ” Authentication

The application includes an optional authentication system that:

  1. Secures access using an external authentication service
  2. Automatically detects if running locally or in a deployed environment
  3. Properly handles login/logout flows and session management
  4. Can be enabled or disabled based on your requirements

Configuring Authentication

To enable authentication:

  1. Set VITE_AUTH_ENABLED=true in your .env file
  2. Configure VITE_AUTH_URL to point to your authentication service
  3. Set FRONTEND_URL if deploying to a custom domain

To disable authentication:

  1. Set VITE_AUTH_ENABLED=false in your .env file

🧰 How It Works

The application uses a sophisticated multi-stage approach:

  1. Video Processing: Videos are segmented into manageable chunks and frames are extracted at specified intervals.

  2. Frame Analysis: The AI vision model examines frames from each segment to understand visual content.

  3. Optional Audio Transcription: If enabled, the audio is transcribed to provide additional context.

  4. AI Analysis: The extracted frames and transcriptions are analyzed using AI models with customized prompts.

  5. Interactive Interface: Results are presented in a segment-by-segment view with the option to chat about insights.

πŸš€ Deployment

You can deploy this application to a server using the included deployment script:

Quick Deployment

For standard deployment with default settings:

./deployment/deploy.sh cool-app-name

Custom Deployment

For deployment with specific parameters:

./deployment/deploy.sh \
  --env-file .env \
  --dockerfile deployment/Dockerfile \
  --context . \
  --entry-file Video_Analysis.py \
  cool-app-name

The deployment script parameters:

  • --env-file: Path to your environment file with API keys and configuration
  • --dockerfile: Path to the Dockerfile for containerization
  • --context: Build context for Docker
  • --entry-file: Main Python file to run
  • cool-app-name: Name for your deployed application (required)

After deployment, you'll receive a URL where your application is hosted.

πŸ“ Project Structure

video-analysis-with-gpt-4o/
β”œβ”€β”€ Video_Analysis.py         # Main application entry point
β”œβ”€β”€ components/               # UI components
β”‚   β”œβ”€β”€ upload.py             # Video upload functionality
β”‚   β”œβ”€β”€ analyze.py            # Analysis component
β”‚   └── chat.py               # Chat interface
β”œβ”€β”€ models/                   # Data models
β”‚   └── session_state.py      # Session state management
β”œβ”€β”€ utils/                    # Utility functions
β”‚   β”œβ”€β”€ api_clients.py        # API client initialization
β”‚   β”œβ”€β”€ auth.py               # Authentication handling
β”‚   β”œβ”€β”€ analysis_cache.py     # Previous analysis caching
β”‚   β”œβ”€β”€ logging_utils.py      # Logging configuration
β”‚   └── video_processing.py   # Video handling utilities
β”œβ”€β”€ media/                    # Media assets
β”‚   β”œβ”€β”€ microsoft.png         # Brand assets
β”‚   └── sample-video-circuit-board.mp4 # Sample video
β”œβ”€β”€ config.py                 # Configuration settings
β”œβ”€β”€ deployment/               # Deployment scripts
β”œβ”€β”€ requirements.txt          # Project dependencies
β”œβ”€β”€ .env                      # Environment variables (not tracked)
β”œβ”€β”€ CONTRIBUTING.md           # Contribution guidelines
└── README.md                 # Project documentation

βš™οΈ Configuration Options

The sidebar provides numerous configuration options:

  • Audio Transcription: Enable to transcribe and analyze video audio
  • Segment Interval: Set the duration of analysis segments (in seconds)
  • Frames Per Second: Control how many frames are extracted (0.1-30)
  • Frame Resize Ratio: Optionally reduce frame size for processing
  • Temperature: Adjust AI response creativity (0.0-1.0)
  • Custom Prompts: Modify system and user prompts for tailored analysis
  • Time Range: Analyze only specific portions of longer videos

⚠️ Notes

  • YouTube and other media sites have protective measures against web scraping that may block video access
  • For more reliable results, consider downloading videos and uploading the files directly
  • Processing large videos may take significant time and API resources
  • Adjust the frame rate and segment interval to balance between analysis detail and processing time

🀝 Contributing

Please see the CONTRIBUTING.md file for details on how to contribute to this project.

πŸ“„ License

This project is licensed under the MIT License

πŸ™ Acknowledgements

  • Azure OpenAI for providing the powerful AI models
  • Streamlit for the simple web application framework
  • OpenCV for video processing capabilities

About

This repository showcases how to leverage the capabilities of LLMs to analyze and extract insights from video files or video URLs, including their audio content, offering several configurable parameters, such as the duration for splitting the video, the number of frames to extract per second, frame resizing, and prompts.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.2%
  • Shell 11.6%
  • PowerShell 3.3%
  • Dockerfile 0.9%