Skip to content

Commit 695f52f

Browse files
committed
clarify readme
1 parent 5529351 commit 695f52f

File tree

4 files changed

+837
-830
lines changed

4 files changed

+837
-830
lines changed

README.md

Lines changed: 38 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -59,11 +59,11 @@ visit_occurence = SQLTable(
5959
people_in_grp = From(person) >> Where(Fun("between", Get.year_of_birth, 1930, 1940))
6060
people_in_il = people_in_grp >> Join(
6161
From(location) >> Where(Fun("=", Get.state, "IL")) >> As(S.loc),
62-
on=Fun("=", Get.location_id, Get.loc.location_id),
62+
on = Fun("=", Get.location_id, Get.loc.location_id),
6363
)
6464
people_visits = people_in_il >> Join(
6565
From(visit_occurence) >> Group(Get.person_id) >> As(S.visit_grp),
66-
on=Fun("=", Get.person_id, Get.visit_grp.person_id),
66+
on = Fun("=", Get.person_id, Get.visit_grp.person_id),
6767
left=True,
6868
)
6969
people_last_visits = people_visits >> Select(
@@ -106,43 +106,46 @@ LEFT JOIN (
106106

107107
<br>
108108

109-
FunSQL models the SQL semantics as a set of operations on tabular data. SQL clauses like `FROM`, `WHERE`, and `JOIN` are represented using instances of `From`, `Where`, and `Join` classes, and they are applied in sequence by connecting them with the `>>` operator. Note the absence of a FunSQL counterpart to nested `SELECT` clauses; when necessary, FunSQL automatically adds nested subqueries and
110-
threads column references and aggregate expressions through them.
111-
112-
Scalar expressions are represented using:
109+
FunSQL models the SQL semantics as a set of operations on tabular data. SQL clauses like `FROM`, `WHERE`, and `JOIN` are represented using instances of `From`, `Where`, and `Join` classes, and they are applied in sequence by connecting them with the `>>` operator. Scalar expressions are represented as:
113110
* `Get.person_id` is a reference to a column.
114111
* `Get.loc.person_id` refers to a column fenced by `As(S.loc)`. Aliasing helps disambiguate column references.
115112
* `Fun.between` and `Fun("==", ...)` is how FunSQL represents SQL functions and operators.
116113
* `Agg.max` is a notation for aggregate functions.
117114

118-
FunSQL queries and their intermediate components are first-class python objects. So, they can be constructed independently, passed around as values, and freely composed together.
115+
This doesn't look unlike many pipelined query languages. There are a few things of note however.
119116

120-
You'd also note writing expressions isn't particularly convenient; `Fun("between", Get.year_of_birth, 1930, 1940)` is too verbose for a data manipulation DSL. While part of the reason is, operator overloading might surface bugs I haven't thought through, it also illustrates the usefulness of FunSQL being just a python library; you can build your own abstractions!
117+
* FunSQL queries and their intermediate components are first-class python objects. So, they can be constructed independently, passed around as values, and freely composed together.
121118

122-
<br>
119+
* Note the absence of a FunSQL counterpart to nested `SELECT` clauses; Or that the `Group` operation didn't ask you to specify the corresponding aggregation at the same place.
123120

124-
<details>
125-
<summary>Writing your own primitives</summary>
121+
This helps a lot with code sharing across queries. When necessary, FunSQL automatically adds nested subqueries and threads column references and aggregate expressions through them.
126122

127-
```python
128-
# A left-join operator, for when passing an extra arg is tedious
129-
def LeftJoin(*args, **kwargs):
130-
return Join(*args, left=True, **kwargs)
131-
132-
# shorthand for an equality expression
133-
def eq(a, b):
134-
return Fun("=", a, b)
135-
136-
# this can directly be subbed as arguments in a `Select` node
137-
def get_stats(col):
138-
return [
139-
Agg.max(col) >> As("max_val"),
140-
Agg.min(col) >> As("min_val"),
141-
Agg.mean(col) >> As("mean_val"),
142-
Agg.stddev(col) >> As("stddev_val"),
143-
]
144-
```
145-
</details>
123+
* You'd also note writing expressions isn't particularly convenient; `Fun("between", Get.year_of_birth, 1930, 1940)` is too verbose for a data manipulation DSL.
124+
125+
While part of the reason is, operator overloading might surface bugs I haven't thought through, it also illustrates the usefulness of FunSQL being just a python library; you can build your own abstractions!
126+
127+
<details>
128+
<summary>Writing your own primitives</summary>
129+
130+
```python
131+
# A left-join operator, for when passing an extra arg is tedious
132+
def LeftJoin(*args, **kwargs):
133+
return Join(*args, left=True, **kwargs)
134+
135+
# shorthand for an equality expression
136+
def eq(a, b):
137+
return Fun("=", a, b)
138+
139+
# this can directly be subbed as arguments in a `Select` node
140+
def get_stats(col):
141+
return [
142+
Agg.max(col) >> As("max_val"),
143+
Agg.min(col) >> As("min_val"),
144+
Agg.mean(col) >> As("mean_val"),
145+
Agg.stddev(col) >> As("stddev_val"),
146+
]
147+
```
148+
</details>
146149

147150
<br>
148151

@@ -155,7 +158,9 @@ The [funsql-examples](https://github.com/ananis25/funsql-examples/) repository a
155158

156159
## Concept
157160

158-
Writing a FunSQL query is much like assmembling the logical query plan in a SQL engine; `Where`, `Join`, `Select` _functions_ correspond to `FILTER`, `JOIN`, `PROJECTION` nodes in a query plan. The useful bit FunSQL improves at, is allowing column references (including aggregates) to be specified as late as possible. When a query is rendered, FunSQL goes over the full query pipeline and asserts if it is valid. Consider a segment of the example query above, where we want to query over visits made by each patient.
161+
Writing a FunSQL query is much like assmembling the logical query plan in a SQL engine; `Where`, `Join`, `Select` _functions_ correspond to `FILTER`, `JOIN`, `PROJECTION` nodes in a query plan. The useful bit FunSQL improves at, is allowing column references (including aggregates) to be specified as late as possible. When a query is rendered, FunSQL goes over the full query pipeline and asserts if it is valid.
162+
163+
Consider a segment of the example query above, where we want to query over visits made by each patient.
159164

160165
```python
161166
q = (
@@ -211,6 +216,8 @@ There are multiple libraries/languages that make writing SQL easier. The compari
211216

212217
ORMs simplify interaction with databases by letting us define language constructs like python classes mapping to database tables, and then writing queries by calling methods on them. I would expect the SQLAlchemy core library can be used to build queries incrementally, but haven't delved into it much.
213218

219+
Note however that ORMs are great at something, **static typing**. Static analysis at build-time/editing eliminates a ton of bugs. FunSQL, in contrast, relies on runtime execution to ascertain that the query is legitimate and all variable references can be resolved. It is thus more suited for analytic contexts, like running notebooks where any errors are immediately surfaced.
220+
214221
* Query Builders: [PyPika](https://github.com/kayak/pypika).
215222

216223
Pypika converts a data structure assembled in python to a SQL query string, and shares the scope of FunSQL. However, it is a thin wrapper around SQL expressions and doesn't model the semantics of SQL operations, resulting in incorrect output.

0 commit comments

Comments
 (0)