A Streamlit application that leverages AI vision capabilities to analyze video content, extract insights, and enable interactive conversations about visual content.
- Upload local video files (MP4, AVI, MOV) for AI-powered analysis
- Use a convenient sample video for quick testing and demonstration
- Analyze videos from URLs (YouTube, etc.) with automatic metadata extraction
- Reuse previous analyses to save time and processing resources
- Configure detailed processing parameters for customized analysis
- Automated segmentation of videos for detailed frame-by-frame analysis
- Advanced vision model integration for sophisticated visual understanding
- Optional audio transcription to incorporate spoken content into analysis
- Adjustable analysis parameters (segment length, frame rate) for performance optimization
- Save frames for later reference and review
- Analyze specific time ranges within longer videos
- Discuss analysis results with the AI in natural language
- Ask detailed questions about specific visual content, scenes, or objects
- Cross-reference insights across different video segments
- Explore patterns and observations with AI-assisted interpretation
- Python 3.8 or higher
- OpenAI API access with GPT-4o vision capabilities
- Authentication service (optional)
-
Clone the repository:
git clone https://github.com/john-carroll-sw/video-analysis-with-gpt-4o.git cd video-analysis-with-gpt-4o -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use: venv\Scripts\activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your
.envfile with your Azure OpenAI credentials:# Copy the sample environment file and edit it with your credentials cp .env.sample .env # Now edit the .env file with your preferred editor nano .env # or use any text editor you prefer
Run the Streamlit application:
streamlit run Video_Analysis.pyOpen your web browser to http://localhost:8501 to use the application.
- Select your video source (File or URL) in the sidebar
- For file upload:
- Click "Use Sample Video" for quick testing without uploading, OR
- Upload your own video file through the file uploader
- For URL analysis:
- Paste a YouTube or other video URL in the input field
- Note: YouTube has protective measures against webscraping that may block access
- Review detailed video information in the expandable section
- If the video was analyzed previously, choose to load the existing analysis or re-analyze
- Configure analysis parameters in the sidebar:
- Segment interval: Duration of each video segment for analysis
- Frames per second: Rate at which frames are captured (0.1-30)
- Audio transcription: Enable to include spoken content in analysis
- Frame resize ratio: Reduce image size for processing efficiency
- Temperature: Adjust AI creativity level
- Customize system and user prompts to guide the analysis direction
- Optionally specify a time range to analyze only part of the video
- Click "Continue to Analysis" to begin processing
- View segment-by-segment analysis results as they are generated
- Compare AI insights with visual content for each segment
- Navigate to the Chat tab after analysis is complete
- Ask open-ended questions about the video content
- Request specific information about scenes, objects, or activities
- Compare different segments or request summary insights
- The AI will reference analyzed frames to provide context-aware responses
The application includes an optional authentication system that:
- Secures access using an external authentication service
- Automatically detects if running locally or in a deployed environment
- Properly handles login/logout flows and session management
- Can be enabled or disabled based on your requirements
To enable authentication:
- Set
VITE_AUTH_ENABLED=truein your.envfile - Configure
VITE_AUTH_URLto point to your authentication service - Set
FRONTEND_URLif deploying to a custom domain
To disable authentication:
- Set
VITE_AUTH_ENABLED=falsein your.envfile
The application uses a sophisticated multi-stage approach:
-
Video Processing: Videos are segmented into manageable chunks and frames are extracted at specified intervals.
-
Frame Analysis: The AI vision model examines frames from each segment to understand visual content.
-
Optional Audio Transcription: If enabled, the audio is transcribed to provide additional context.
-
AI Analysis: The extracted frames and transcriptions are analyzed using AI models with customized prompts.
-
Interactive Interface: Results are presented in a segment-by-segment view with the option to chat about insights.
You can deploy this application to a server using the included deployment script:
For standard deployment with default settings:
./deployment/deploy.sh cool-app-nameFor deployment with specific parameters:
./deployment/deploy.sh \
--env-file .env \
--dockerfile deployment/Dockerfile \
--context . \
--entry-file Video_Analysis.py \
cool-app-nameThe deployment script parameters:
--env-file: Path to your environment file with API keys and configuration--dockerfile: Path to the Dockerfile for containerization--context: Build context for Docker--entry-file: Main Python file to runcool-app-name: Name for your deployed application (required)
After deployment, you'll receive a URL where your application is hosted.
video-analysis-with-gpt-4o/
βββ Video_Analysis.py # Main application entry point
βββ components/ # UI components
β βββ upload.py # Video upload functionality
β βββ analyze.py # Analysis component
β βββ chat.py # Chat interface
βββ models/ # Data models
β βββ session_state.py # Session state management
βββ utils/ # Utility functions
β βββ api_clients.py # API client initialization
β βββ auth.py # Authentication handling
β βββ analysis_cache.py # Previous analysis caching
β βββ logging_utils.py # Logging configuration
β βββ video_processing.py # Video handling utilities
βββ media/ # Media assets
β βββ microsoft.png # Brand assets
β βββ sample-video-circuit-board.mp4 # Sample video
βββ config.py # Configuration settings
βββ deployment/ # Deployment scripts
βββ requirements.txt # Project dependencies
βββ .env # Environment variables (not tracked)
βββ CONTRIBUTING.md # Contribution guidelines
βββ README.md # Project documentation
The sidebar provides numerous configuration options:
- Audio Transcription: Enable to transcribe and analyze video audio
- Segment Interval: Set the duration of analysis segments (in seconds)
- Frames Per Second: Control how many frames are extracted (0.1-30)
- Frame Resize Ratio: Optionally reduce frame size for processing
- Temperature: Adjust AI response creativity (0.0-1.0)
- Custom Prompts: Modify system and user prompts for tailored analysis
- Time Range: Analyze only specific portions of longer videos
- YouTube and other media sites have protective measures against web scraping that may block video access
- For more reliable results, consider downloading videos and uploading the files directly
- Processing large videos may take significant time and API resources
- Adjust the frame rate and segment interval to balance between analysis detail and processing time
Please see the CONTRIBUTING.md file for details on how to contribute to this project.
This project is licensed under the MIT License
- Azure OpenAI for providing the powerful AI models
- Streamlit for the simple web application framework
- OpenCV for video processing capabilities
