Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
loadtest:
strategy:
matrix:
kind: ['csv_agg', 'csv_agg_delim', 'csv_agg_delim_bom', 'postgrest']
kind: ['postgrest', 'csv_agg', 'csv_agg_options']
name: Loadtest
runs-on: ubuntu-24.04
steps:
Expand All @@ -55,7 +55,7 @@ jobs:
authToken: ${{ secrets.CACHIX_AUTH_TOKEN }}

- name: Run loadtest
run: nix-shell --run "./bench/loadtest.sh ${{ matrix.kind }}" >> "$GITHUB_STEP_SUMMARY"
run: nix-shell --run "pg_csv-loadtest ${{ matrix.kind }}" >> "$GITHUB_STEP_SUMMARY"

coverage:

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ pgbench_log.*
.history
pg_csv--*.sql
!pg_csv--*--*.sql
tags
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ else
endif

EXTENSION = pg_csv
EXTVERSION = 0.4
EXTVERSION = 1.0

DATA = $(wildcard sql/*--*.sql)

Expand Down Expand Up @@ -68,7 +68,7 @@ $(BUILD_DIR)/$(EXTENSION).$(SHARED_EXT): $(EXTENSION).$(SHARED_EXT)
sql/$(EXTENSION)--$(EXTVERSION).sql: sql/$(EXTENSION).sql
cp $< $@

$(EXTENSION).control:
$(EXTENSION).control: $(EXTENSION).control.in
sed "s/@EXTVERSION@/$(EXTVERSION)/g" $(EXTENSION).control.in > $@

PGXS := $(shell $(PG_CONFIG) --pgxs)
Expand Down
132 changes: 97 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,17 @@
[![Coverage Status](https://coveralls.io/repos/github/PostgREST/pg_csv/badge.svg)](https://coveralls.io/github/PostgREST/pg_csv)
[![Tests](https://github.com/PostgREST/pg_csv/actions/workflows/ci.yaml/badge.svg)](https://github.com/PostgREST/pg_csv/actions)

Postgres has CSV support on the [COPY](https://www.postgresql.org/docs/current/sql-copy.html) command, but `COPY` has problems:

- It uses a special protocol, so it doesn't work with other standard features like [prepared statements](https://www.postgresql.org/docs/current/sql-prepare.html), [pipeline mode](https://www.postgresql.org/docs/current/libpq-pipeline-mode.html#LIBPQ-PIPELINE-USING) or [pgbench](https://www.postgresql.org/docs/current/pgbench.html).
- Is not composable. You can't use COPY inside CTEs, subqueries, view definitions or as function arguments.

`pg_csv` offers flexible CSV processing as a solution.

- Includes a CSV aggregate that composes with SQL expressions.
- Native C extension, almost 2 times faster than SQL queries that try to output CSV (see our [CI results](https://github.com/PostgREST/pg_csv/actions/runs/17367407744)).
- No dependencies except Postgres.

## Installation

Clone this repo and run:
Expand All @@ -20,70 +31,121 @@ create extension pg_csv;

## csv_agg

Aggregate that builds a CSV as per [RFC 4180](https://www.ietf.org/rfc/rfc4180.txt), quoting as required.
Aggregate that builds a CSV respecting [RFC 4180](https://www.ietf.org/rfc/rfc4180.txt), quoting as required.

```sql
create table projects as
select *
from (
values
(1, 'Death Star OS', 1),
(2, 'Windows 95 Rebooted', 1),
(3, 'Project "Comma,Please"', 2),
(4, 'Escape ""Plan""', 2),
(NULL, 'NULL & Void', NULL)
) as _(id, name, client_id);
```

```psql
```sql
select csv_agg(x) from projects x;
csv_agg
-------------------
id,name,client_id+
1,Windows 7,1 +
2,Windows 10,1 +
3,IOS,2 +
4,OSX,2 +
5,Orphan,
csv_agg
--------------------------------
id,name,client_id +
1,Death Star OS,1 +
2,Windows 95 Rebooted,1 +
3,"Project ""Comma,Please""",2+
4,"Escape """"Plan""""",2 +
,NULL & Void,
(1 row)
```

### Custom Delimiter

You can use a custom delimiter.
Custom delimiters can be used to produce different formats like pipe-separated values, tab-separated values or semicolon-separated values.

```psql
```sql
select csv_agg(x, csv_options(delimiter := '|')) from projects x;
csv_agg
-------------------
id|name|client_id+
1|Windows 7|1 +
2|Windows 10|1 +
3|IOS|2 +
4|OSX|2 +
5|Orphan|
csv_agg
-----------------------------
id|name|client_id +
1|Death Star OS|1 +
2|Windows 95 Rebooted|1 +
3|Open Source Lightsabers|2+
4|Galactic Payroll System|2+
7|Bugzilla Revival|3
(1 row)

select csv_agg(x, csv_options(delimiter := E'\t')) from projects x;
csv_agg
-----------------------------------
id name client_id +
1 Death Star OS 1 +
2 Windows 95 Rebooted 1+
3 Open Source Lightsabers 2+
4 Galactic Payroll System 2+
7 Bugzilla Revival 3
(1 row)
```

> [!NOTE]
> Newline, carriage return and double quotes are not supported as delimiters to maintain the integrity of the separated values format.
> - Newline, carriage return and double quotes are not supported as delimiters to maintain the integrity of the separated values format.
> - The delimiter can only be a single char, if a longer string is specified only the first char will be used.
> - Why use a `csv_options` constructor function instead of extra arguments? Aggregates don't support named arguments in postgres, see a discussion on https://github.com/PostgREST/pg_csv/pull/2#issuecomment-3155740589.

### BOM

You can include a byte-order mark (BOM) to make the CSV compatible with Excel.

```psql
```sql
select csv_agg(x, csv_options(bom := true)) from projects x;

csv_agg
-------------------
id,name,client_id+
1,Windows 7,1 +
2,Windows 10,1 +
3,IOS,2 +
4,OSX,2 +
5,Orphan,
1,Death Star OS,1
2,Windows 95 Rebooted,1
3,Open Source Lightsabers,2
4,Galactic Payroll System,2
5,Bugzilla Revival,3
(1 row)
```

### Header

You can omit or include the CSV header.

```psql
```sql
select csv_agg(x, csv_options(header := false)) from projects x;
csv_agg
-------------------
1,Windows 7,1 +
2,Windows 10,1 +
3,IOS,2 +
4,OSX,2 +
5,Orphan,

csv_agg
-----------------------------
1,Death Star OS,1 +
2,Windows 95 Rebooted,1 +
3,Open Source Lightsabers,2+
4,Galactic Payroll System,2+
7,Bugzilla Revival,3
(1 row)
```

### Null string

NULL values are represented by an empty string by default. This can be changed with the `nullstr` option.

```sql
SELECT csv_agg(x, csv_options(nullstr:='<NULL>')) AS body
FROM projects x;

body
--------------------------------
id,name,client_id +
1,Death Star OS,1 +
2,Windows 95 Rebooted,1 +
3,"Project ""Comma,Please""",2+
4,"Escape """"Plan""""",2 +
<NULL>,NULL & Void,<NULL>
(1 row)
```

## Limitations

- For large bulk exports and imports, `COPY ... CSV` should still be preferred as its faster due to streaming support.
2 changes: 1 addition & 1 deletion bench/csv_agg.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
\set lim random(1000, 2000)

select csv_agg(t) from (
select * from student_emotion_assessments limit :lim
select * from orders_customers limit :lim
) as t;
5 changes: 0 additions & 5 deletions bench/csv_agg_delim.sql

This file was deleted.

5 changes: 0 additions & 5 deletions bench/csv_agg_delim_bom.sql

This file was deleted.

5 changes: 5 additions & 0 deletions bench/csv_agg_options.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
\set lim random(1000, 2000)

select csv_agg(t, csv_options(delimiter:='|', bom:=true, header:=false, nullstr:='<NULL>')) from (
select * from orders_customers limit :lim
) as t;
Loading
Loading