Skip to content

Commit 5529351

Browse files
committed
remove some todos
1 parent 818287a commit 5529351

File tree

11 files changed

+825
-839
lines changed

11 files changed

+825
-839
lines changed

README.md

Lines changed: 15 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,17 @@
66

77
`funsql` is a python library to write SQL queries in a way that is more composable.
88

9-
SQL is tricky to write in a modular fashion since it is a DSL with its own grammar. The straightforward way to compose SQL query fragments then must rely on string interpolation/concatenation, extended with a templating language like Jinja. FunSQL exposes the full expressive power of SQL by implementing the SQL verbs _(FROM, WHERE, GROUP BY, ...)_ as regular python objects with compositional semantics. This approach is particularly useful for building applications that programmatically construct SQL queries.
9+
SQL is tricky to write in a modular fashion since it is a DSL with its own grammar. The straightforward way to compose SQL query fragments then must rely on string interpolation/concatenation, extended with a templating language like Jinja.
1010

11-
This implementation closely follows the original Julia library [FunSQL.jl](https://github.com/MechanicalRabbit/FunSQL.jl/). Thanks to the original authors, Clark Evans and Kyrylo Simonov, who have been refining the idea for some time; you should check their previous work [here](https://querycombinators.org/). Here is a [presentation](https://www.youtube.com/watch?v=rGWwmuvRUYk) talking about `FunSQL.jl` from Juliacon.
11+
FunSQL exposes the full expressive power of SQL by implementing the SQL verbs _(FROM, WHERE, GROUP BY, ...)_ as regular python objects with compositional semantics. Specifically when you need to construct SQL queries programmatically, the pipeline style of composing queries can be very useful.
1212

13-
Please continue below for notes on how to use the python library, and how FunSQL works.
13+
This implementation closely follows the original Julia library `FunSQL.jl`. Thanks to the original authors, Clark Evans and Kyrylo Simonov, who have been refining the idea for some time; you should check their previous work [here](https://querycombinators.org/).
14+
1. Presentation from JuliaCon talking about FunSQL - [youtube](https://www.youtube.com/watch?v=rGWwmuvRUYk) | [slides](https://github.com/MechanicalRabbit/FunSQL.jl/files/7465997/FunSQL-JuliaCon2021.pdf)
15+
2. Julia library repo - [FunSQL.jl](https://github.com/MechanicalRabbit/FunSQL.jl/)
16+
17+
18+
Please continue below for notes on using the python library, and how FunSQL works.
1419

15-
<br/>
1620

1721
## Contents
1822

@@ -171,39 +175,29 @@ The `docs` directory has more notes on how the compiler works, and the debugging
171175

172176
## More notes
173177

174-
<details>
175-
<summary>Supported SQL subset? </summary>
176-
178+
**Supported SQL subset?**
177179

178180
Window functions, nested queries, lateral joins, CTEs. are all supported. Aggregation queries like Cube/Rollup, Grouping Sets, etc. haven't been implemented yet.
179181
FunSQL is oblivious to the specific UDF/aggregate functions supported by database engines, if they fit the `Fun` node syntax, FunSQL can include it in the output SQL query.
180-
</details>
181-
182182

183-
<details>
184-
<summary>Supported database engines? </summary>
185183

184+
**Supported database engines?**
186185

187186
FunSQL is not a database connector and only produces the SQL query string. Currently, it can produce queries in the Sqlite/Postgres dialect. Maybe MySQL, but I have never used it.
188187

189188
As noted above, FunSQL models the shape of the data, and its namespace through different tabular operations. After resolving column references, and verifying the query is legitimate, FunSQL compiles the input tree of SQL nodes to a tree of SQL clause objects. These directly translate to SQL text, only abstracting over spaces and dialect specific punctuation.
190189

191-
However, SQL dialects are plenty and projects like [Apache Calcite](https://calcite.apache.org/) already exist, that can write to different SQL dialects. A better idea is to compile the FunSQL query treee to the relational node structure `Calcite` works with. That would let us support the popular database engines (and I can delete 1000 lines from the code).
190+
However, SQL dialects are plenty and projects like [Apache Calcite](https://calcite.apache.org/) already exist, that can write to different variants of SQL. A better idea is to compile the FunSQL query treee to the relational node structure `Calcite` works with. That would let us support the popular database engines (and I can delete 1000 lines from the code).
192191

193192
The blocker is that `Calcite` is a Java library; I have never written Java, and don't know how to compile it to a native extension that is usable from python without installing a JVM. When projects like [Substrait](https://substrait.io/) are further along, it might be a good idea to use that as a backend instead.
194-
</details>
195193

196-
<details>
197-
<summary>Supported languages? </summary>
198194

195+
**Supported languages?**
199196

200197
This repository implements a python library, while the original implementation of FunSQL is in Julia. The core idea of tracking column references and data shape is not a lot of code and easy enough to port. Once we can integrate with the Substrait/Calcite projects, I intend to write a Rust implementation, so individual language bindings are even shorter.
201198

202-
</details>
203-
204-
<details>
205-
<summary>Similar projects? </summary>
206199

200+
**Similar projects?**
207201

208202
There are multiple libraries/languages that make writing SQL easier. The comparison below is not fully accurate since I haven't used the non-python tools significantly.
209203

@@ -222,7 +216,7 @@ There are multiple libraries/languages that make writing SQL easier. The compari
222216
Pypika converts a data structure assembled in python to a SQL query string, and shares the scope of FunSQL. However, it is a thin wrapper around SQL expressions and doesn't model the semantics of SQL operations, resulting in incorrect output.
223217

224218
```python
225-
from pypika import Query, Table
219+
from pypika import Query, Table
226220
c = Table("customers")
227221
q1 = Query.from_(c).limit(100).where(c.city == "Mumbai").select(c.name)
228222
q2 = Query.from_(c).where(c.city == "Mumbai").limit(100).select(c.name)
@@ -235,10 +229,8 @@ There are multiple libraries/languages that make writing SQL easier. The compari
235229

236230
* Other projects: [Malloy](https://github.com/looker-open-source/malloy) is a super cool project that models relational data and queries against it, using a single language. Queries are constructed as resuable fragments that can be composed/nested arbitrarily, and get compiled to SQL at execution time.
237231

238-
FunSQL operators are similar in that they can be arbitrarily composed, though it doesn't implement the NEST operator yet. It should be possible to use FunSQL for implementing a watered down version of Malloy in the language of your choice, though Malloy is pretty comprehensive (database connectors, built in graphing, tracking lineage) and you should use it.
239-
</details>
232+
FunSQL operators are similar in that they can be arbitrarily composed, though it doesn't implement the NEST operator yet. It should be fun to use FunSQL for implementing a watered down version of Malloy in the language of your choice. Though Malloy is pretty comprehensive (database connectors, built in graphing, tracking lineage) and you should use it!
240233

241-
<br>
242234

243235
## Installation
244236

0 commit comments

Comments
 (0)