H5ai Downloader (Go Version)

A Go rewrite of the h5ai downloader with concurrent download support and additional features.

Features

Concurrent Downloads: Use multiple goroutines for parallel file downloads
Export Only Mode: Save URLs to file instead of downloading
Flexible Output: Control directory structure and output files
Caching: HTTP response caching to avoid redundant requests
Progress Tracking: Track download progress and resume interrupted downloads
Multiple URL Support: Process single URLs or files containing multiple URLs

Installation

# Build from source
go build -o h5ai_downloader

# Or run directly
go run main.go [options]

Usage

Basic Usage

# Download from a single URL to default directory (./files)
./h5ai_downloader -url "http://example.com/files/" -depth 3 -workers 8

# Download to custom directory
./h5ai_downloader -url "http://example.com/files/" -output "./downloads" -workers 4

# Download from multiple URLs in a file
./h5ai_downloader -file urls.txt -depth 2 -workers 4

Export Only Mode

# Export URLs to default file (urls.txt)
./h5ai_downloader -url "http://example.com/files/" -export-only

# Export URLs to custom file with flat structure
./h5ai_downloader -url "http://example.com/files/" -export-only -flat -output my_urls.txt

# Export with directory structure preserved
./h5ai_downloader -url "http://example.com/files/" -export-only -output detailed_urls.txt

Command Line Options

Option	Short	Description	Default
`--url`	`-u`	Single URL to scrape	-
`--file`	`-f`	File containing URLs to scrape	-
`--depth`	`-d`	Maximum depth for scraping	4
`--workers`		Number of concurrent download workers	4
`--export-only`		Save URLs to file instead of downloading	false
`--flat`		Skip directory structure	false
`--output`		Output directory for downloads OR filename for export	`./files` (download) / `urls.txt` (export)

Input File Format

When using the --file option, create a text file with one URL per line. Optionally specify custom depth:

http://example1.com/files/
http://example2.com/data/ 5
http://example3.com/docs/ 2

Features Comparison

Feature	Python Version	Go Version
Basic h5ai crawling	✅	✅
Download tracking	✅	✅
HTTP caching	✅	✅
Multiple URLs	✅	✅
Concurrent downloads	❌	✅
Export-only mode	❌	✅
Flat export option	❌	✅
Custom output file	❌	✅
Worker pool control	❌	✅

Performance

The Go version offers significant performance improvements:

Concurrent Downloads: Download multiple files simultaneously using configurable worker pools
Better Memory Usage: More efficient memory management compared to Python
Faster Startup: No interpreter overhead
Built-in HTTP Client: Optimized HTTP handling without external dependencies

Architecture

Core Components

Cache System: Stores HTTP responses in .gob files for quick retrieval
URL Collector: Thread-safe collection of downloadable URLs during crawling
Download Tracker: Persistent tracking of completed downloads to enable resuming
Worker Pool: Configurable number of goroutines for concurrent downloads

Directory Structure

├── main.go              # Main application code
├── go.mod              # Go module definition
├── url_cache/          # HTTP response cache (created automatically)
├── downloaded_db/      # Download completion tracking (created automatically)
└── [downloaded files]  # Downloaded content preserving directory structure

Examples

Example 1: Basic Download to Custom Directory

./h5ai_downloader -url "http://files.example.com/" -depth 2 -workers 8 -output "./my_downloads"

Example 2: Export URLs Only

./h5ai_downloader -url "http://files.example.com/" -export-only -output backup_urls.txt

Example 3: Multiple URLs with Different Depths

Create sites.txt:

http://site1.com/files/ 3
http://site2.com/data/ 5
http://site3.com/docs/

Run:

./h5ai_downloader -file sites.txt -workers 6 -output "./downloads"

Example 4: Flat Export (URLs only, no directory info)

./h5ai_downloader -url "http://files.example.com/" -export-only -flat -output flat_urls.txt

Notes

Output Parameter Behavior

The -output parameter has dual functionality:

Download Mode (default): Specifies the output directory where files will be downloaded
- Default: ./files
- Example: -output "./my_downloads" creates directory structure under my_downloads/
Export Mode (-export-only): Specifies the filename for the exported URL list
- Default: urls.txt
- Example: -output "backup_urls.txt" creates a file named backup_urls.txt

Directory Structure

When flat=false (default): Maintains the original directory structure from the server
When flat=true: Downloads all files to the output directory root (no subdirectories)
The Go version maintains compatibility with the Python version's cache and download tracking
Default worker count is 4, but can be adjusted based on your system and network capacity
Export-only mode is useful for creating backup lists or processing URLs with external tools
The flat option in export mode outputs just the URLs without directory structure information

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

H5ai Downloader (Go Version)

Features

Installation

Usage

Basic Usage

Export Only Mode

Command Line Options

Input File Format

Features Comparison

Performance

Architecture

Core Components

Directory Structure

Examples

Example 1: Basic Download to Custom Directory

Example 2: Export URLs Only

Example 3: Multiple URLs with Different Depths

Example 4: Flat Export (URLs only, no directory info)

Notes

Output Parameter Behavior

Directory Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

fahidsarker/h5ai_downloader

Folders and files

Latest commit

History

Repository files navigation

H5ai Downloader (Go Version)

Features

Installation

Usage

Basic Usage

Export Only Mode

Command Line Options

Input File Format

Features Comparison

Performance

Architecture

Core Components

Directory Structure

Examples

Example 1: Basic Download to Custom Directory

Example 2: Export URLs Only

Example 3: Multiple URLs with Different Depths

Example 4: Flat Export (URLs only, no directory info)

Notes

Output Parameter Behavior

Directory Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages