Skip to content

Commit 90673a5

Browse files
matea16gitbudacolinbarryas51340andrejtonev
authored
Memgraph 3.7 (#1428)
* init * Add the LOAD PARQUET release note * Add most of the memgraph release notes up until today * feat: Add docs for `frequencies_as_map` MAGE function (#1461) Co-authored-by: Matea Pesic <[email protected]> * feat: Add support for memgraph-lab chart secrets (#1462) * feat: Add support for secrets to memgraph-lab chart * fix: Typo --------- Co-authored-by: Matea Pesic <[email protected]> * feat: Add libStorageAccessMode to the standalone chart (#1463) Co-authored-by: Matea Pesic <[email protected]> * New fine-grained label based access (#1452) * chore: Empty commit just to create PR * doc: Update syntax for multi-tenancy setup * docs: Changes for new fine-grained access permissions * docs: Add example for combining rules * docs: Add example for global permissions being overriden * docs: Tidy LBAC docs * docs: Add link to combining rules * docs: Add migration to v3.7 LBAC guide * docs: Apply some minor formatting * docs: Fix minor typos * docs: Add Enterprise to migration page * Apply suggestions from code review --------- Co-authored-by: Matea Pesic <[email protected]> * feat: Add support for side containers to HA chart (#1464) Co-authored-by: Matea Pesic <[email protected]> * Update info regarding .old backups (#1465) * old dir info * Apply suggestions from code review --------- Co-authored-by: Matea Pesic <[email protected]> * Make memgraph release notes up to date * Make MAGE release notes up to date * Update debugging docs for heaptrack (#1466) * Run memgraph docker in gdb (#1467) Co-authored-by: Matea Pesic <[email protected]> * Make release notes up-to-date * Update the breaking changes * feat: Support loading Parquet files from the local disk and from s3 (#1448) * docs: Describe LOAD PARQUET clause usage * docs: Add more details * docs: Add details about LoadParquet clause * docs: Remove TODOs * docs: Add command-line args and runtime flags * docs: Add Parquet example * docs: Document cast to int64_t * docs: Fix People->Person * update structure * docs: Add LOAD PARQUET clause, fix the link * docs: Add page * Update pages/querying/clauses/load-parquet.mdx --------- Co-authored-by: matea16 <[email protected]> Co-authored-by: Matea Pesic <[email protected]> * update release date * SSO - Allow self signed identity server (#1471) * sso docs * Update pages/database-management/authentication-and-authorization/auth-system-integrations.mdx --------- Co-authored-by: Matea Pesic <[email protected]> * Mlbac migration upgrade (#1472) * Add updated docs * Update example * Added example scenario * Add disclaimer * Apply suggestions from code review * fix typo --------- Co-authored-by: Matea Pesic <[email protected]> Co-authored-by: matea16 <[email protected]> --------- Co-authored-by: Marko Budiselic <[email protected]> Co-authored-by: colinbarry <[email protected]> Co-authored-by: Andi Skrgat <[email protected]> Co-authored-by: andrejtonev <[email protected]> Co-authored-by: Dr Matt James <[email protected]> Co-authored-by: Ivan Milinović <[email protected]> Co-authored-by: Josipmrden <[email protected]>
1 parent d556607 commit 90673a5

File tree

22 files changed

+1192
-102
lines changed

22 files changed

+1192
-102
lines changed

pages/advanced-algorithms/available-algorithms.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ This table shows the mapping between APOC functions/procedures and their Memgrap
139139
| apoc.coll.removeAll | Removes defined elements from the input list | [collections.remove_all()](/advanced-algorithms/available-algorithms/collections#remove_all) |
140140
| apoc.coll.contains | Verifies the existence of an input value in an input list | [collections.contains()](/advanced-algorithms/available-algorithms/collections#contains) |
141141
| apoc.coll.flatten | Returns flattened list of inputs provided | [collections.flatten()](/advanced-algorithms/available-algorithms/collections#flatten) |
142+
| apoc.coll.frequenciesAsMap | Returns a map of frequencies of the items in the collection | [collections.frequencies_as_map()](/advanced-algorithms/available-algorithms/collections#frequencies_as_map) |
142143
| apoc.coll.pairs | Creates pairs of neighbor elements within an input list | [collections.pairs()](/advanced-algorithms/available-algorithms/collections#pairs) |
143144
| apoc.coll.toSet | Converts the input list to a set | [collections.to_set()](/advanced-algorithms/available-algorithms/collections#to_set) |
144145
| apoc.coll.sum | Calculates the sum of listed elements | [collections.sum()](/advanced-algorithms/available-algorithms/collections#sum) |

pages/advanced-algorithms/available-algorithms/collections.mdx

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -331,6 +331,39 @@ RETURN collections_module.flatten(input_list) as result
331331
+---------------------------------------------------------+
332332
```
333333

334+
### `frequencies_as_map()`
335+
336+
Returns a map of frequencies of the items in the collection.
337+
338+
<Callout type="info">
339+
This function is equivalent to **apoc.coll.frequenciesAsMap**.
340+
</Callout>
341+
342+
{<h4 className="custom-header"> Input: </h4>}
343+
344+
- `coll: List[Any]` ➡ The collection whose item frequencies will be counted.
345+
346+
{<h4 className="custom-header"> Output: </h4>}
347+
348+
- `Map[String, Integer]` ➡ A map where keys are string representations of the
349+
items, and values are their frequencies.
350+
351+
{<h4 className="custom-header"> Usage: </h4>}
352+
353+
The following query will count the frequency of each element in the list:
354+
355+
```cypher
356+
RETURN collections.frequencies_as_map([1, 1, 2, 1, 3, 4, 1, 3]) AS result;
357+
```
358+
359+
```plaintext
360+
+---------------------------------------------------------+
361+
| result |
362+
+---------------------------------------------------------+
363+
| {"1": 4, "2": 1, "3": 2, "4": 1} |
364+
+---------------------------------------------------------+
365+
```
366+
334367
### `pairs()`
335368

336369
Creates pairs of neighbor elements within an input list.

pages/clustering/high-availability.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -619,7 +619,8 @@ the cluster:
619619
synchronize its state.
620620
- The replica's old durability files will be preserved in a `.old` directory in
621621
`data_directory/snapshots` and `data_directory/wal` folders, allowing admins
622-
to manually recover data if needed.
622+
to manually recover data if needed. The `.old` directory is reused for subsequent
623+
recovery operations, meaning **only a single backup is maintained at any time**.
623624

624625
Depending on the replication mode used, there are different levels of data loss
625626
that can happen upon the failover. With the default `SYNC` replication mode,

pages/data-migration.mdx

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ instance. Whether your data is structured in files, relational databases, or
1515
other graph databases, Memgraph provides the flexibility to integrate and
1616
analyze your data efficiently.
1717

18-
Memgraph supports file system imports like CSV files, offering efficient and
18+
Memgraph supports file system imports like Parquet and CSV files, offering efficient and
1919
structured data ingestion. **However, if you want to migrate directly from
2020
another data source, you can use the [`migrate`
2121
module](/advanced-algorithms/available-algorithms/migrate)** from Memgraph MAGE
@@ -37,6 +37,11 @@ that leverages the LLM to automate the process of modeling and migration.
3737

3838
## File types
3939

40+
### Parquet files
41+
42+
Parquet files can be imported efficiently from the local disk and from s3:// using the
43+
[LOAD PARQUET clause](/querying/clauses/load-parquet).
44+
4045
### CSV files
4146

4247
CSV files provide a simple and efficient way to import tabular data into Memgraph
@@ -268,4 +273,4 @@ nonsense or sales pitch, just tech.
268273
/>
269274
</Cards>
270275

271-
<CommunityLinks/>
276+
<CommunityLinks/>

pages/data-migration/_meta.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
export default {
22
"best-practices": "Best practices",
33
"csv": "CSV",
4+
"parquet": "PARQUET",
45
"json": "JSON",
56
"cypherl": "CYPHERL",
67
"migrate-from-neo4j": "Migrate from Neo4j",

pages/data-migration/best-practices.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -572,4 +572,4 @@ For more information about `Delta` objects, check the
572572
information on the [IN_MEMORY_TRANSACTIONAL storage mode](/fundamentals/storage-memory-usage#in-memory-transactional-storage-mode-default).
573573

574574

575-
<CommunityLinks/>
575+
<CommunityLinks/>

pages/data-migration/parquet.mdx

Lines changed: 287 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,287 @@
1+
---
2+
title: Import data from Parquet files
3+
description: Leverage Parquet files in Memgraph operations. Our detailed guide simplifies the process for an enhanced graph computing journey.
4+
---
5+
6+
import { Callout } from 'nextra/components'
7+
import { Steps } from 'nextra/components'
8+
import { Tabs } from 'nextra/components'
9+
import {CommunityLinks} from '/components/social-card/CommunityLinks'
10+
11+
# Import data from Parquet file
12+
13+
The data from Parquet files can be imported using the [`LOAD PARQUET` Cypher clause](#load-parquet-cypher-clause) from the local disk
14+
and from the s3.
15+
16+
## `LOAD PARQUET` Cypher clause
17+
18+
The `LOAD PARQUET` clause uses a background thread that reads the Parquet file
19+
in column batches, assembles them into row batches of 64K rows and places those
20+
batches into a queue. The main thread then pulls each batch from the queue and
21+
processes it row by row. For every row, it binds the parsed values to the
22+
specified variables and either populates the database (if it is empty) or
23+
appends the new rows to an existing dataset.
24+
25+
26+
### `LOAD PARQUET` clause syntax
27+
28+
The syntax of the `LOAD PARQUET` clause is:
29+
30+
```cypher
31+
LOAD PARQUET FROM <parquet-location> ( WITH CONFIG configs=configMap ) ? AS <variable-name>
32+
```
33+
34+
- `<parquet-location>` is a string that specifies where the Parquet file is
35+
located.<br/>
36+
If the path **does not** start with `s3://`, it is treated as a local file
37+
path. If it **does** start with `s3://`, Memgraph retrieves the file from the
38+
S3-compatible storage using the provided URI. There are no restrictions on the
39+
file’s location within your local file system, as long as the path is valid
40+
and the file exists. If you are using Docker to run Memgraph, you will need to
41+
[copy the files from your local directory into
42+
Docker](/getting-started/first-steps-with-docker#copy-files-from-and-to-a-docker-container)
43+
container where Memgraph can access them. <br/>
44+
45+
- `<configs>` Represents an optional configuration map through which you can
46+
specify configuration options: `aws_region`, `aws_access_key`,
47+
`aws_secret_key` and `aws_endpoint_url`.
48+
- `<aws_region>`: The region in which your S3 service is being located
49+
- `<aws_access_key>`: Access key used to connect to S3 service
50+
- `<aws_secret_key>`: Secret key used to connect S3 service
51+
- `<aws_endpoint_url`>: Optional configuration parameter. Can be used to set
52+
the URL of the S3 compatible storage.
53+
- `<variable-name>` is a symbolic name representing the variable to which the
54+
contents of the parsed row will be bound to, enabling access to the row
55+
contents later in the query. The variable doesn't have to be used in any
56+
subsequent clause.
57+
58+
### `LOAD PARQUET` clause specificities
59+
60+
When using the `LOAD PARQUET` clause please keep in mind:
61+
62+
- **Type handling:** <br/>
63+
The parser reads each value using its native Parquet type, so you should
64+
receive the same data type inside Memgraph. The following types are supported:
65+
**BOOL, INT8, INT16, INT32, INT64, UINT8, UINT16, UINT32, UINT64, HALF_FLOAT,
66+
FLOAT, DOUBLE, STRING, LARGE_STRING, STRING_VIEW, DATE32, DATE64, TIME32,
67+
TIME64, TIMESTAMP, DURATION, DECIMAL128, DECIMAL256, BINARY, LARGE_BINARY,
68+
FIXED_SIZE_BINARY, LIST, MAP.** <br/>
69+
Any unsupported types are automatically stored as strings. Note that
70+
`UINT64_T` values are cast to `INT64_T` because Memgraph does not support
71+
unsigned 64-bit integers, and the Cypher standard only defines 64-bit signed
72+
integers.
73+
74+
- **Authentication parameters:** <br/>
75+
Parameters for accessing S3-compatible storage (`aws_region`,
76+
`aws_access_key`, `aws_secret_key`, and `aws_endpoint_url`) can be provided in
77+
three ways:
78+
79+
1. Directly in the `LOAD PARQUET` query using the `WITH CONFIG` clause.
80+
2. Through environment variables: `AWS_REGION`, `AWS_ACCESS_KEY`,
81+
`AWS_SECRET_KEY`, and `AWS_ENDPOINT_URL`.
82+
3. Through run-time database settings, using: `SET DATABASE SETTING <key> TO
83+
<value>;` The corresponding setting keys are: `aws.access_key`,
84+
`aws.region`, `aws.secret_key`, and `aws.endpoint_url`.
85+
86+
87+
- **The `LOAD PARQUET` clause is not a standalone clause**, meaning a valid query
88+
must contain at least one more clause, for example:
89+
90+
```cypher
91+
LOAD PARQUET FROM "/people.parquet" AS row
92+
CREATE (p:Person) SET p += row;
93+
```
94+
95+
In this regard, the following query will throw an exception:
96+
97+
```cypher
98+
LOAD PARQUET FROM "/file.parquet" AS row;
99+
```
100+
101+
**Adding a `MATCH` or `MERGE` clause before `LOAD PARQUET`** allows you to
102+
match certain entities in the graph before running `LOAD PARQUET`, optimizing
103+
the process as matched entities do not need to be searched for every row in
104+
the `PARQUET` file.
105+
106+
But, the `MATCH` or `MERGE` clause can be used prior the `LOAD PARQUET` clause
107+
only if the clause returns only one row. Returning multiple rows before
108+
calling the `LOAD PARQUET` clause will cause a Memgraph runtime error.
109+
110+
- **The `LOAD PARQUET` clause can be used at most once per query**, so queries
111+
like the one below will throw an exception:
112+
113+
```cypher
114+
LOAD PARQUET FROM "/x.parquet" AS x
115+
LOAD PARQUET FROM "/y.parquet" AS y
116+
CREATE (n:A {p1 : x, p2 : y});
117+
```
118+
119+
### Increase import speed
120+
121+
You can significantly increase data-import speed when using the `LOAD PARQUET`
122+
clause by taking advantage of indexing, batching, and analytical storage mode.
123+
124+
#### 1. Create indexes
125+
126+
`LOAD PARQUET` can establish relationships much faster if
127+
[indexes](/fundamentals/indexes) on nodes or node properties are created *after*
128+
loading the associated nodes:
129+
130+
```cypher
131+
CREATE INDEX ON :Node(id);
132+
```
133+
134+
If `LOAD PARQUET` is **merging** existing data rather than creating new records,
135+
then create the indexes **before** running the import.
136+
137+
#### 2. Use Periodic commits
138+
139+
The `USING PERIODIC COMMIT <BATCH_SIZE>` construct optimizes memory allocation
140+
and can improve import speed by **25–35%** based on our benchmarks.
141+
142+
```cypher
143+
USING PERIODIC COMMIT 1024
144+
LOAD PARQUET FROM "/x.parquet" AS x
145+
CREATE (n:A {p1: x, p2: y});
146+
```
147+
148+
#### 3. Switch to analytical storage mode
149+
150+
Import performance can also improve by switching Memgraph to [analytical storage
151+
mode](/fundamentals/storage-memory-usage#storage-modes), which relaxes ACID
152+
guarantees except for manually created snapshots. Once the import is complete,
153+
you can switch back to transactional mode to restore full ACID guarantees.
154+
155+
Switch storage modes within a session:
156+
157+
```
158+
STORAGE MODE IN_MEMORY_{TRANSACTIONAL|ANALYTICAL};
159+
```
160+
161+
#### 4. Run Imports in Parallel
162+
163+
When using `IN_MEMORY_ANALYTICAL` mode and storing nodes and relationships in
164+
separate Parquet files, you can run multiple concurrent `LOAD PARQUET` queries
165+
to accelerate the import even further.
166+
167+
For best performance:
168+
169+
1. Split node and relationship data into smaller files.
170+
2. Run all `LOAD PARQUET` statements that **create nodes** first.
171+
3. Then run all `LOAD PARQUET` statements that **create relationships**.
172+
173+
174+
### Usage example
175+
176+
In this example, we will import multiple Parquet files with distinct graph
177+
objects. The data is split across four files, each file contains nodes of a
178+
single label or relationships of a single type.
179+
180+
<Steps>
181+
182+
{<h3 className="custom-header">Parquet files</h3>}
183+
184+
- [`people_nodes.parquet`](s3://download.memgraph.com/asset/docs/people_nodes.parquet) is used to create nodes labeled `:Person`.<br/> The file contains the following data:
185+
```parquet
186+
id,name,age,city
187+
100,Daniel,30,London
188+
101,Alex,15,Paris
189+
102,Sarah,17,London
190+
103,Mia,25,Zagreb
191+
104,Lucy,21,Paris
192+
```
193+
- [`restaurants_nodes.parquet`](s3://download.memgraph.com/asset/docs/restaurants_nodes.parquet) is used to create nodes labeled `:Restaurants`.<br/> The file contains the following data:
194+
```parquet
195+
id,name,menu
196+
200,Mc Donalds,Fries;BigMac;McChicken;Apple Pie
197+
201,KFC,Fried Chicken;Fries;Chicken Bucket
198+
202,Subway,Ham Sandwich;Turkey Sandwich;Foot-long
199+
203,Dominos,Pepperoni Pizza;Double Dish Pizza;Cheese filled Crust
200+
```
201+
202+
- [`people_relationships.parquet`](s3://download.memgraph.com/asset/docs/people_relationships.parquet) is used to connect people with the `:IS_FRIENDS_WITH` relationship.<br/> The file contains the following data:
203+
```parquet
204+
first_person,second_person,met_in
205+
100,102,2014
206+
103,101,2021
207+
102,103,2005
208+
101,104,2005
209+
104,100,2018
210+
101,102,2017
211+
100,103,2001
212+
```
213+
- [`restaurants_relationships.parquet`](s3://download.memgraph.com/asset/docs/restaurants_relationships.parquet) is used to connect people with restaurants using the `:ATE_AT` relationship.<br/> The file contains the following data:
214+
```parquet
215+
PERSON_ID,REST_ID,liked
216+
100,200,true
217+
103,201,false
218+
104,200,true
219+
101,202,false
220+
101,203,false
221+
101,200,true
222+
102,201,true
223+
```
224+
225+
{<h3 className="custom-header">Import nodes</h3>}
226+
227+
Each row will be parsed as a map, and the
228+
fields can be accessed using the property lookup syntax (e.g. `id: row.id`). Files can be imported directly from s3 or can be downloaded and then accessed from the local disk.
229+
230+
The following query will load row by row from the file, and create a new node
231+
for each row with properties based on the parsed row values:
232+
233+
```cypher
234+
LOAD PARQUET FROM "s3://download.memgraph.com/asset/docs/people_nodes.parquet" AS row
235+
CREATE (n:Person {id: row.id, name: row.name, age: row.age, city: row.city});
236+
```
237+
238+
In the same manner, the following query will create new nodes for each restaurant:
239+
240+
```cypher
241+
LOAD PARQUET FROM "s3://download.memgraph.com/asset/docs/restaurants_nodes.parquet" AS row
242+
CREATE (n:Restaurant {id: row.id, name: row.name, menu: row.menu});
243+
```
244+
245+
{<h3 className="custom-header">Create indexes</h3>}
246+
247+
Creating an [index](/fundamentals/indexes) on a property used to connect nodes
248+
with relationships, in this case, the `id` property of the `:Person` nodes,
249+
will speed up the import of relationships, especially with large datasets:
250+
251+
```cypher
252+
CREATE INDEX ON :Person(id);
253+
```
254+
255+
{<h3 className="custom-header">Import relationships</h3>}
256+
The following query will create relationships between the people nodes:
257+
258+
```cypher
259+
LOAD PARQUET FROM "s3://download.memgraph.com/asset/docs/people_relationships.parquet" AS row
260+
MATCH (p1:Person {id: row.first_person})
261+
MATCH (p2:Person {id: row.second_person})
262+
CREATE (p1)-[f:IS_FRIENDS_WITH]->(p2)
263+
SET f.met_in = row.met_in;
264+
```
265+
266+
The following query will create relationships between people and restaurants where they ate:
267+
268+
```cypher
269+
LOAD PARQUET FROM "s3://download.memgraph.com/asset/docs/restaurants_relationships.parquet" AS row
270+
MATCH (p1:Person {id: row.PERSON_ID})
271+
MATCH (re:Restaurant {id: row.REST_ID})
272+
CREATE (p1)-[ate:ATE_AT]->(re)
273+
SET ate.liked = ToBoolean(row.liked);
274+
```
275+
276+
{<h3 className="custom-header">Final result</h3>}
277+
Run the following query to see how the imported data looks as a graph:
278+
279+
```
280+
MATCH p=()-[]-() RETURN p;
281+
```
282+
283+
![](/pages/data-migration/csv/load_csv_restaurants_relationships.png)
284+
285+
</Steps>
286+
287+
<CommunityLinks/>

pages/database-management/authentication-and-authorization/_meta.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
export default {
22
"users": "Users",
33
"role-based-access-control": "Role-based access control",
4+
"mlbac-migration-guide": "Migrating to v3.7 LBAC",
45
"multiple-roles": "Multiple roles per user and multi-tenant roles",
56
"auth-system-integrations": "Auth system integrations",
67
"impersonate-user": "Impersonate user",

0 commit comments

Comments
 (0)