Skip to content

Conversation

@Umang01-hash
Copy link
Member

Pull Request Template

Description:

  • Added GCS file provider in pkg/datasource/file/gcs .
  • Implemented Create , Remove , ReadDir , Open ,Stat and MakeDir using cloud.google.com/go/storage .
  • Simulates directories using object key prefixes .

Checklist:

  • I have formatted my code using goimport and golangci-lint.
  • All new code is covered by unit tests.
  • This PR does not decrease the overall code coverage.
  • I have reviewed the code comments and documentation for clarity.

Fixes #2013

NOTE: This PR is a continuation of PR #2117. Most of the work here belongs to Suryakantdsa. I only extended it to include common logger and metrics for File Systems.

Suryakantdsa and others added 30 commits October 3, 2025 23:53
@Umang01-hash
Copy link
Member Author

@Suryakantdsa I have updated this PR with the Seek, ReadAt and WriteAt method implemented by you.

I think we have refactored almost all of the review comments given on PR #2013. And moreover GCS implementation is now using these common components. (logger, metrics).

@akshat-kumar-singhal Can you take a final review on this.

@Suryakantdsa
Copy link

Suryakantdsa commented Oct 28, 2025

Thank you so much @Umang01-hash for taking the time to review and implement the required GCS changes. Really appreciate your help!

Copy link
Contributor

@akshat-kumar-singhal akshat-kumar-singhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in a good state structurally - some improvements have been shared. Summarising:

  • Common capabilities should not be present in the file provider (gcs) package
  • Establish consistency in the logging pattern in terms of log level and avoid redundancy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code here can be promoted to the file package as it's a common implementation, agnostic to the file provider

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved common utility functions (GetBucketName, GetObjectName to file/fs.go since they're provider-agnostic.

However, the actual directory operation implementations (Mkdir, ReadDir, Stat, etc.) remain in gcs/fs_dir.go because they contain GCS-specific logic (client calls, error handling, observability with GCS context).

@Umang01-hash
Copy link
Member Author

Umang01-hash commented Oct 31, 2025

Further in this PR I have made the follwing changes:

We’ve introduced a new StorageProvider interface to unify low-level storage operations across multiple backends like GCS, S3, FTP, and SFTP. This abstraction allows us to plug in new storage types with minimal disruption to the core logic.

To streamline file and directory operations, we’ve added two key components:

  • common_fs.go: Handles shared directory-level operations such as Mkdir, MkdirAll, ReadDir, Stat, RemoveAll, ChDir, and Getwd.

  • common_file.go: Manages common file-level operations like Open, Create, Delete, Copy, and Move.

Both layers delegate actual storage actions to the StorageProvider interface, which defines:

// StorageProvider is the unified interface for all cloud storage providers (GCS, S3, FTP, SFTP).
// It abstracts low-level storage operations, allowing common implementations for directory operations,
// metadata handling, and observability.
//
// All providers (GCS, S3, FTP, SFTP) implement this interface to integrate with GoFr's file system.
type StorageProvider interface {
	Connect(ctx context.Context) error
	Health(ctx context.Context) error
	Close() error

	NewReader(ctx context.Context, name string) (io.ReadCloser, error)
	NewRangeReader(ctx context.Context, name string, offset, length int64) (io.ReadCloser, error)
	NewWriter(ctx context.Context, name string) io.WriteCloser

	DeleteObject(ctx context.Context, name string) error
	CopyObject(ctx context.Context, src, dst string) error
	StatObject(ctx context.Context, name string) (*ObjectInfo, error)

	ListObjects(ctx context.Context, prefix string) ([]string, error)
	ListDir(ctx context.Context, prefix string) ([]ObjectInfo, []string, error)
}

Each backend implements this interface via its own storage_adapter.go, making the system modular, extensible, and easier to test.

┌─────────────────────────────────────────┐
│        USER CODE (GoFr App)             │
└────────────┬────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────┐
│    common_fs.go (Directory Operations)  │
│    • Mkdir, MkdirAll, ReadDir, Stat    │
│    • RemoveAll, ChDir, Getwd           │
└────────────┬────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────┐
│    common_file.go (File Operations)     │
│    • Open, Create, Delete, Copy, Move  │
│    • ValidateSeekOffset, ObserveOp     │
└────────────┬────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────┐
│    StorageProvider Interface            │
│    • Connect, NewReader, NewWriter      │
│    • DeleteObject, StatObject           │
│    • ListObjects, ListDir               │
└────────────┬────────────────────────────┘
             │
     ┌───────┴───────┬─────────┬──────────┐
     ▼               ▼         ▼          ▼
┌─────────┐    ┌─────────┐ ┌───────┐ ┌────────┐
│ GCS     │    │ S3      │ │ FTP   │ │ SFTP   │
│ storage_│    │ storage_│ │storage│ │storage_│
│ adapter │    │ adapter │ │adapter│ │adapter │
└─────────┘    └─────────┘ └───────┘ └────────┘

So updated GCS implementation includes:

  • storage_adapter.go - Implements StorageProvider (low-level GCS SDK calls)
  • fs.go - Connection lifecycle (authentication, retry, embeds CommonFileSystem)
  • file.go - Stateful file operations (tracks open readers/writers, file position)

Reason: StorageProvider is stateless (no auth, no file state), so we need:

  • fs.go for connection management + embedding common directory ops
  • file.go for stateful file operations (Read/Write/Seek/Close)

I request all reviewers to have a look at the architecture and code after that i can complete the PR with updated tests and documentation. Please let me know if any changes in current implementation is required.

@akshat-kumar-singhal @Suryakantdsa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Google Cloud Storage (GCS) integration as File System

4 participants