Skip to content

Commit f10b322

Browse files
authored
Docs for adding lakeFS Iceberg REST catalog as a Dremio source (#9652)
* docs for adding lakeFS IRC as a Dremio source * fixes * unify docs style * document permissions required for direct storage access
1 parent a97be56 commit f10b322

File tree

1 file changed

+87
-8
lines changed

1 file changed

+87
-8
lines changed

docs/src/integrations/dremio.md

Lines changed: 87 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,97 @@ description: This section shows how you can start using lakeFS with Dremio, a ne
88
[Dremio](https://www.dremio.com/) is a next-generation data lake engine that liberates your data with live,
99
interactive queries directly on cloud data lake storage, including S3 and lakeFS.
1010

11-
## Configuration
11+
12+
## Iceberg REST Catalog
13+
14+
lakeFS Iceberg REST Catalog allow you to use lakeFS as a [spec-compliant](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml) Apache [Iceberg REST catalog](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/iceberg/main/open-api/rest-catalog-open-api.yaml),
15+
allowing Dremio to manage and access tables using a standard REST API.
16+
17+
![lakeFS Iceberg REST Catalog](../assets/img/lakefs_iceberg_rest_catalog.png)
18+
19+
This is the recommended way to use lakeFS with Dremio, as it allows lakeFS to stay completely outside the data path: data itself is read and written by Dremio executors, directly to the underlying object store. Metadata is managed by Iceberg at the table level, while lakeFS keeps track of new snapshots to provide versioning and isolation.
20+
21+
[Read more about using the Iceberg REST Catalog](./iceberg.md#iceberg-rest-catalog).
22+
23+
### Configuration
24+
25+
To configure Dremio to work with the Iceberg REST Catalog, you need to configure the [Iceberg REST Catalog in Dremio](https://docs.dremio.com/current/data-sources/lakehouse-catalogs/iceberg-rest-catalog/).
26+
27+
1. On the Datasets page, to the right of **Sources** in the left panel, click `+`
28+
1. In the **Add Data Source** dialog, under Lakehouse Catalogs, select **Iceberg REST Catalog** Source. The New Iceberg REST Catalog Source dialog box appears, which contains the following tabs:
29+
1. In **General**
30+
- Enter a name for your Iceberg REST Catalog source, specify the endpoint URI (i.e. `https://lakefs.example.com/iceberg/api`)
31+
- Uncheck "Use vended credentials"
32+
1. In **Advanced Options** → Catalog Properties, add the following key-value pairs (left = key, right = value):
33+
34+
| Key | Value | Notes |
35+
| --------------------------------- | -------------------------------------------------------- |---------------------------------------------------------|
36+
| `oauth2-server-uri` | `https://lakefs.example.com/iceberg/api/v1/oauth/tokens` | Your lakeFS OAuth2 token endpoint (not the catalog URL). |
37+
| `credential` | `<lakefs_access_key>:<lakefs_secret_key>` | Your lakeFS credentials. |
38+
| `fs.s3a.aws.credentials.provider` | `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` | Use static AWS credentials. |
39+
| `fs.s3a.access.key` | `<aws_access_key_id>` | AWS key with read/write access to your data bucket. |
40+
| `fs.s3a.secret.key` | `<aws_secret_access_key>` | AWS secret key. |
41+
| `dremio.s3.list.all.buckets` | `false` | Avoid listing all buckets during initialization. |
42+
43+
1. Click **Save** to create the Iceberg REST Catalog source.
44+
45+
#### Data Bucket Permissions
46+
47+
The lakeFS Iceberg Catalog manages table metadata, while Dremio reads and writes data files directly from your underlying
48+
storage (for example, Amazon S3).
49+
50+
You must ensure that the IAM role or user Dremio uses has read/write access to your data bucket.
51+
The following AWS IAM policy provides the required permissions for direct access:
52+
53+
```json
54+
{
55+
"Version": "2012-10-17",
56+
"Statement": [
57+
{
58+
"Sid": "DremioIcebergAccess",
59+
"Effect": "Allow",
60+
"Action": "s3:*",
61+
"Resource": [
62+
"arn:aws:s3:::<lakefs_repo_storage_namespace>/_managed/",
63+
"arn:aws:s3:::<lakefs_repo_storage_namespace>/_managed/*"
64+
]
65+
},
66+
{
67+
"Sid": "BucketLevelRequiredForDremio",
68+
"Effect": "Allow",
69+
"Action": [
70+
"s3:GetBucketLocation",
71+
"s3:ListBucket"
72+
],
73+
"Resource": "arn:aws:s3:::<lakefs_repo_storage_namespace_bucket_name>"
74+
}
75+
]
76+
}
77+
```
78+
79+
!!! tip
80+
To learn more about the Iceberg REST Catalog, see the [Iceberg REST Catalog](./iceberg.md#iceberg-rest-catalog) documentation.
81+
82+
## Using Dremio with the S3 Gateway
83+
84+
Alternatively, you can use the S3 Gateway to read and write data to lakeFS from Dremio.
85+
86+
While flexible, this approach requires lakeFS to be involved in the data path, which can be less efficient than the Iceberg
87+
REST Catalog approach, since lakeFS has to proxy all data operations through the lakeFS server. This is particularly true
88+
for large data sets where network bandwidth might incur some overhead.
89+
90+
### Configuration
1291

1392
Starting from version 3.2.3, Dremio supports Minio as an [experimental S3-compatible plugin](https://docs.dremio.com/current/sonar/data-sources/object/s3/#configuring-s3-for-minio).
1493
Similarly, you can connect lakeFS with Dremio.
1594

1695
Suppose you already have both lakeFS and Dremio deployed, and want to use Dremio to query your data in the lakeFS repositories.
1796
You can follow the steps listed below to configure on Dremio UI:
1897

19-
1. click _Add Data Lake_.
20-
1. Under _File Stores_, choose _Amazon S3_.
21-
1. Under _Advanced Options_, check _Enable compatibility mode (experimental)_.
22-
1. Under _Advanced Options_ > _Connection Properties_, add `fs.s3a.path.style.access` and set the value to `true`.
23-
1. Under _Advanced Options_ > _Connection Properties_, add `fs.s3a.endpoint` and set lakeFS S3 endpoint to the value.
24-
1. Under the _General_ tab, specify the _access_key_id_ and _secret_access_key_ provided by lakeFS server.
25-
1. Click _Save_, and now you should be able to browse lakeFS repositories on Dremio.
98+
1. click **Add Data Lake**.
99+
1. Under **File Stores**, choose **Amazon S3**.
100+
1. Under **Advanced Options**, check **Enable compatibility mode (experimental)**.
101+
1. Under **Advanced Options** > **Connection Properties**, add `fs.s3a.path.style.access` and set the value to `true`.
102+
1. Under **Advanced Options** > **Connection Properties**, add `fs.s3a.endpoint` and set lakeFS S3 endpoint to the value.
103+
1. Under the **General** tab, specify the **access_key_id** and **secret_access_key** provided by lakeFS server.
104+
1. Click **Save**, and now you should be able to browse lakeFS repositories on Dremio.

0 commit comments

Comments
 (0)