You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FunSQL models the SQL semantics as a set of operations on tabular data. SQL clauses like `FROM`, `WHERE`, and `JOIN` are represented using instances of `From`, `Where`, and `Join` classes, and they are applied in sequence by connecting them with the `>>` operator. Note the absence of a FunSQL counterpart to nested `SELECT` clauses; when necessary, FunSQL automatically adds nested subqueries and
110
-
threads column references and aggregate expressions through them.
111
-
112
-
Scalar expressions are represented using:
109
+
FunSQL models the SQL semantics as a set of operations on tabular data. SQL clauses like `FROM`, `WHERE`, and `JOIN` are represented using instances of `From`, `Where`, and `Join` classes, and they are applied in sequence by connecting them with the `>>` operator. Scalar expressions are represented as:
113
110
*`Get.person_id` is a reference to a column.
114
111
*`Get.loc.person_id` refers to a column fenced by `As(S.loc)`. Aliasing helps disambiguate column references.
115
112
*`Fun.between` and `Fun("==", ...)` is how FunSQL represents SQL functions and operators.
116
113
*`Agg.max` is a notation for aggregate functions.
117
114
118
-
FunSQL queries and their intermediate components are first-class python objects. So, they can be constructed independently, passed around as values, and freely composed together.
115
+
This doesn't look unlike many pipelined query languages. There are a few things of note however.
119
116
120
-
You'd also note writing expressions isn't particularly convenient; `Fun("between", Get.year_of_birth, 1930, 1940)` is too verbose for a data manipulation DSL. While part of the reason is, operator overloading might surface bugs I haven't thought through, it also illustrates the usefulness of FunSQL being just a python library; you can build your own abstractions!
117
+
* FunSQL queries and their intermediate components are first-class python objects. So, they can be constructed independently, passed around as values, and freely composed together.
121
118
122
-
<br>
119
+
* Note the absence of a FunSQL counterpart to nested `SELECT` clauses; Or that the `Group` operation didn't ask you to specify the corresponding aggregation at the same place.
123
120
124
-
<details>
125
-
<summary>Writing your own primitives</summary>
121
+
This helps a lot with code sharing across queries. When necessary, FunSQL automatically adds nested subqueries and threads column references and aggregate expressions through them.
126
122
127
-
```python
128
-
# A left-join operator, for when passing an extra arg is tedious
129
-
defLeftJoin(*args, **kwargs):
130
-
return Join(*args, left=True, **kwargs)
131
-
132
-
# shorthand for an equality expression
133
-
defeq(a, b):
134
-
return Fun("=", a, b)
135
-
136
-
# this can directly be subbed as arguments in a `Select` node
137
-
defget_stats(col):
138
-
return [
139
-
Agg.max(col) >> As("max_val"),
140
-
Agg.min(col) >> As("min_val"),
141
-
Agg.mean(col) >> As("mean_val"),
142
-
Agg.stddev(col) >> As("stddev_val"),
143
-
]
144
-
```
145
-
</details>
123
+
* You'd also note writing expressions isn't particularly convenient; `Fun("between", Get.year_of_birth, 1930, 1940)` is too verbose for a data manipulation DSL.
124
+
125
+
While part of the reason is, operator overloading might surface bugs I haven't thought through, it also illustrates the usefulness of FunSQL being just a python library; you can build your own abstractions!
126
+
127
+
<details>
128
+
<summary>Writing your own primitives</summary>
129
+
130
+
```python
131
+
# A left-join operator, for when passing an extra arg is tedious
132
+
defLeftJoin(*args, **kwargs):
133
+
return Join(*args, left=True, **kwargs)
134
+
135
+
# shorthand for an equality expression
136
+
defeq(a, b):
137
+
return Fun("=", a, b)
138
+
139
+
# this can directly be subbed as arguments in a `Select` node
140
+
defget_stats(col):
141
+
return [
142
+
Agg.max(col) >> As("max_val"),
143
+
Agg.min(col) >> As("min_val"),
144
+
Agg.mean(col) >> As("mean_val"),
145
+
Agg.stddev(col) >> As("stddev_val"),
146
+
]
147
+
```
148
+
</details>
146
149
147
150
<br>
148
151
@@ -155,7 +158,9 @@ The [funsql-examples](https://github.com/ananis25/funsql-examples/) repository a
155
158
156
159
## Concept
157
160
158
-
Writing a FunSQL query is much like assmembling the logical query plan in a SQL engine; `Where`, `Join`, `Select`_functions_ correspond to `FILTER`, `JOIN`, `PROJECTION` nodes in a query plan. The useful bit FunSQL improves at, is allowing column references (including aggregates) to be specified as late as possible. When a query is rendered, FunSQL goes over the full query pipeline and asserts if it is valid. Consider a segment of the example query above, where we want to query over visits made by each patient.
161
+
Writing a FunSQL query is much like assmembling the logical query plan in a SQL engine; `Where`, `Join`, `Select` _functions_ correspond to `FILTER`, `JOIN`, `PROJECTION` nodes in a query plan. The useful bit FunSQL improves at, is allowing column references (including aggregates) to be specified as late as possible. When a query is rendered, FunSQL goes over the full query pipeline and asserts if it is valid.
162
+
163
+
Consider a segment of the example query above, where we want to query over visits made by each patient.
159
164
160
165
```python
161
166
q = (
@@ -211,6 +216,8 @@ There are multiple libraries/languages that make writing SQL easier. The compari
211
216
212
217
ORMs simplify interaction with databases by letting us define language constructs like python classes mapping to database tables, and then writing queries by calling methods on them. I would expect the SQLAlchemy core library can be used to build queries incrementally, but haven't delved into it much.
213
218
219
+
Note however that ORMs are great at something, **static typing**. Static analysis at build-time/editing eliminates a ton of bugs. FunSQL, in contrast, relies on runtime execution to ascertain that the query is legitimate and all variable references can be resolved. It is thus more suited for analytic contexts, like running notebooks where any errors are immediately surfaced.
Pypika converts a data structure assembled in python to a SQL query string, and shares the scope of FunSQL. However, it is a thin wrapper around SQL expressions and doesn't model the semantics of SQL operations, resulting in incorrect output.
0 commit comments