Skip to content

Conversation

@chrisnovakovic
Copy link
Contributor

str.format has a number of surprising behaviours that aren't Pythonic:

  • Numerical replacement fields use a different syntax to named replacement fields (i.e. ${0}).
  • Automatic and manual replacement field names can be used within the same format string.
  • Replacement field names are passed through if the corresponding keyword argument is undefined.
  • Format strings are allowed to contain unbalanced delimiters.

Some of these were introduced in #3146, along with some regressions that quietly broke substitution and delimiter escaping (see #3356).

Make Please's str.format implementation more Pythonic by aligning it with Bazel's:

  • Permit the use of by-order, by-position or by-name replacement field referencing, but do not allow them to be combined in the same format string.
  • Parse all occurrences of { and } in the format string as if they are delimiters, except for those escaped via {{ and }} - do not pass them through to the return value, and raise errors if they appear inappropriately.
  • Perform more robust checks on replacement field references.

The new error messages are intentionally reminiscent of those returned by Python's str.format, and are identical in some cases.

The correctness and robustness of the new implementation comes at the cost of performance - it is hard to compare them fairly against the degenerate case in src/parse/asp/builtins_bench_test.go, but in the typical case, the new implementation is about 75% slower and performs four times more allocations:

cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12            5117605               257.0 ns/op            80 B/op          2 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12          512914              2474 ns/op            7568 B/op          3 allocs/op
PASS

cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12            3261982               454.1 ns/op           176 B/op          8 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12          358680              9921 ns/op            7712 B/op         12 allocs/op
PASS

`str.format` has a number of surprising behaviours that aren't Pythonic:

* Numerical replacement fields use a different syntax to named
  replacement fields (i.e. `${0}`).
* Automatic and manual replacement field names can be used within the
  same format string.
* Replacement field names are passed through if the corresponding
  keyword argument is undefined.
* Format strings are allowed to contain unbalanced delimiters.

Some of these were introduced in thought-machine#3146, along with some regressions that
quietly broke substitution and delimiter escaping (see thought-machine#3356).

Make Please's `str.format` implementation more Pythonic by aligning it
with [Bazel's](https://bazel.build/rules/lib/core/string#format):

* Permit the use of by-order, by-position or by-name replacement field
  referencing, but do not allow them to be combined in the same format
  string.
* Parse all occurrences of `{` and `}` in the format string as if they
  are delimiters, except for those escaped via `{{` and `}}` - do not
  pass them through to the return value, and raise errors if they appear
  inappropriately.
* Perform more robust checks on replacement field references.

The new error messages are intentionally reminiscent of those returned
by Python's `str.format`, and are identical in some cases.

The correctness and robustness of the new implementation comes at the
cost of performance - it is hard to compare them fairly against the
degenerate case in `src/parse/asp/builtins_bench_test.go`, but in the
typical case, the new implementation is about 75% slower and performs
four times more allocations:

```
cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12            5117605               257.0 ns/op            80 B/op          2 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12          512914              2474 ns/op            7568 B/op          3 allocs/op
PASS

cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12            3261982               454.1 ns/op           176 B/op          8 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12          358680              9921 ns/op            7712 B/op         12 allocs/op
PASS
```
self = self[start+2:]
continue
}
// We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "field name" the right word here (given it may be {} or {1})? Looks like"replacement field" is the terminology used by python

return newPyInt(strings.LastIndex(string(self), string(needle)))
}

// strFormat implements the str.format function. It interpolates a format string using the arguments passed to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check, is this used both for f-strings and explicit calls to .format()?

buf.WriteString(args[arg].String())
arg++
} else if val, present := s.locals[key]; present {
// Extract the replacement field's name, excluding the delimiters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Extract the replacement field's name, excluding the delimiters.
// Extract the replacement field's name or index if present, excluding the delimiters.

// be more of them than there are positional arguments (although there may ultimately be fewer). Internally, the
// first positional argument is the format string - the positional arguments from the caller's perspective are
// shifted one to the right in args.
s.Assert(fieldName == "", "cannot switch from automatic field numbering to manual field specification")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned that this error message will be very unclear to somebody who is unfamiliar with how this is implemented - I think a better phrasing would be something like "cannot mix replacement field specification types in one format string", and possibly include examples of the mismatched replacement fields

case strFormatByName:
// With named replacement fields, output the string value of the keyword argument with the given name.
s.Assert(fieldName != "", "must use named replacement fields with keyword arguments")
val, exists := s.locals[fieldName]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are keyword arguments implemented as locals?

s, err := parseString(fmt.Sprintf(`ret = "%s".format(%s)`, test.FormatStr, test.Args))
if test.Error == "" {
assert.NotNil(t, s)
assert.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should assert.NoError first

"Dangling opening delimiter at start of string": {
FormatStr: `{{} {} {}`,
Args: `"one", "two", "three"`,
Error: "single '}' encountered in format string",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include the position in this error?

// Find the corresponding "}" delimiter...
end := strings.IndexByte(self[start+1:], '}')
// ...and if there isn't one, this must be a malformed format string.
s.Assert(end != -1, "single '{' encountered in format string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s.Assert(end != -1, "single '{' encountered in format string")
s.Assert(end != -1, "unmatched and unescaped '{' encountered in format string")

continue
}
// We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter.
s.Assert(self[start] == '{', "single '}' encountered in format string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s.Assert(self[start] == '{', "single '}' encountered in format string")
s.Assert(self[start] == '{', "unmatched and unescaped '}' encountered in format string")

"Numerical field name": {
FormatStr: `1={one} 2={2} 0={zero}`,
Args: `zero="a", one="b", two="c"`,
Error: "unspecified keyword argument '2'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to special-case this to get the more meaningful error about mixing format types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants