-
Notifications
You must be signed in to change notification settings - Fork 212
Make str.format more Pythonic
#3412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
`str.format` has a number of surprising behaviours that aren't Pythonic:
* Numerical replacement fields use a different syntax to named
replacement fields (i.e. `${0}`).
* Automatic and manual replacement field names can be used within the
same format string.
* Replacement field names are passed through if the corresponding
keyword argument is undefined.
* Format strings are allowed to contain unbalanced delimiters.
Some of these were introduced in thought-machine#3146, along with some regressions that
quietly broke substitution and delimiter escaping (see thought-machine#3356).
Make Please's `str.format` implementation more Pythonic by aligning it
with [Bazel's](https://bazel.build/rules/lib/core/string#format):
* Permit the use of by-order, by-position or by-name replacement field
referencing, but do not allow them to be combined in the same format
string.
* Parse all occurrences of `{` and `}` in the format string as if they
are delimiters, except for those escaped via `{{` and `}}` - do not
pass them through to the return value, and raise errors if they appear
inappropriately.
* Perform more robust checks on replacement field references.
The new error messages are intentionally reminiscent of those returned
by Python's `str.format`, and are identical in some cases.
The correctness and robustness of the new implementation comes at the
cost of performance - it is hard to compare them fairly against the
degenerate case in `src/parse/asp/builtins_bench_test.go`, but in the
typical case, the new implementation is about 75% slower and performs
four times more allocations:
```
cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12 5117605 257.0 ns/op 80 B/op 2 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12 512914 2474 ns/op 7568 B/op 3 allocs/op
PASS
cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12 3261982 454.1 ns/op 176 B/op 8 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12 358680 9921 ns/op 7712 B/op 12 allocs/op
PASS
```
| self = self[start+2:] | ||
| continue | ||
| } | ||
| // We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "field name" the right word here (given it may be {} or {1})? Looks like"replacement field" is the terminology used by python
| return newPyInt(strings.LastIndex(string(self), string(needle))) | ||
| } | ||
|
|
||
| // strFormat implements the str.format function. It interpolates a format string using the arguments passed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check, is this used both for f-strings and explicit calls to .format()?
| buf.WriteString(args[arg].String()) | ||
| arg++ | ||
| } else if val, present := s.locals[key]; present { | ||
| // Extract the replacement field's name, excluding the delimiters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Extract the replacement field's name, excluding the delimiters. | |
| // Extract the replacement field's name or index if present, excluding the delimiters. |
| // be more of them than there are positional arguments (although there may ultimately be fewer). Internally, the | ||
| // first positional argument is the format string - the positional arguments from the caller's perspective are | ||
| // shifted one to the right in args. | ||
| s.Assert(fieldName == "", "cannot switch from automatic field numbering to manual field specification") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned that this error message will be very unclear to somebody who is unfamiliar with how this is implemented - I think a better phrasing would be something like "cannot mix replacement field specification types in one format string", and possibly include examples of the mismatched replacement fields
| case strFormatByName: | ||
| // With named replacement fields, output the string value of the keyword argument with the given name. | ||
| s.Assert(fieldName != "", "must use named replacement fields with keyword arguments") | ||
| val, exists := s.locals[fieldName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are keyword arguments implemented as locals?
| s, err := parseString(fmt.Sprintf(`ret = "%s".format(%s)`, test.FormatStr, test.Args)) | ||
| if test.Error == "" { | ||
| assert.NotNil(t, s) | ||
| assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we should assert.NoError first
| "Dangling opening delimiter at start of string": { | ||
| FormatStr: `{{} {} {}`, | ||
| Args: `"one", "two", "three"`, | ||
| Error: "single '}' encountered in format string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include the position in this error?
| // Find the corresponding "}" delimiter... | ||
| end := strings.IndexByte(self[start+1:], '}') | ||
| // ...and if there isn't one, this must be a malformed format string. | ||
| s.Assert(end != -1, "single '{' encountered in format string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| s.Assert(end != -1, "single '{' encountered in format string") | |
| s.Assert(end != -1, "unmatched and unescaped '{' encountered in format string") |
| continue | ||
| } | ||
| // We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter. | ||
| s.Assert(self[start] == '{', "single '}' encountered in format string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| s.Assert(self[start] == '{', "single '}' encountered in format string") | |
| s.Assert(self[start] == '{', "unmatched and unescaped '}' encountered in format string") |
| "Numerical field name": { | ||
| FormatStr: `1={one} 2={2} 0={zero}`, | ||
| Args: `zero="a", one="b", two="c"`, | ||
| Error: "unspecified keyword argument '2'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to special-case this to get the more meaningful error about mixing format types
str.formathas a number of surprising behaviours that aren't Pythonic:${0}).Some of these were introduced in #3146, along with some regressions that quietly broke substitution and delimiter escaping (see #3356).
Make Please's
str.formatimplementation more Pythonic by aligning it with Bazel's:{and}in the format string as if they are delimiters, except for those escaped via{{and}}- do not pass them through to the return value, and raise errors if they appear inappropriately.The new error messages are intentionally reminiscent of those returned by Python's
str.format, and are identical in some cases.The correctness and robustness of the new implementation comes at the cost of performance - it is hard to compare them fairly against the degenerate case in
src/parse/asp/builtins_bench_test.go, but in the typical case, the new implementation is about 75% slower and performs four times more allocations: