Make `str.format` more Pythonic #3412

chrisnovakovic · 2025-09-21T19:17:05Z

str.format has a number of surprising behaviours that aren't Pythonic:

Numerical replacement fields use a different syntax to named replacement fields (i.e. ${0}).
Automatic and manual replacement field names can be used within the same format string.
Replacement field names are passed through if the corresponding keyword argument is undefined.
Format strings are allowed to contain unbalanced delimiters.

Some of these were introduced in #3146, along with some regressions that quietly broke substitution and delimiter escaping (see #3356).

Make Please's str.format implementation more Pythonic by aligning it with Bazel's:

Permit the use of by-order, by-position or by-name replacement field referencing, but do not allow them to be combined in the same format string.
Parse all occurrences of { and } in the format string as if they are delimiters, except for those escaped via {{ and }} - do not pass them through to the return value, and raise errors if they appear inappropriately.
Perform more robust checks on replacement field references.

The new error messages are intentionally reminiscent of those returned by Python's str.format, and are identical in some cases.

The correctness and robustness of the new implementation comes at the cost of performance - it is hard to compare them fairly against the degenerate case in src/parse/asp/builtins_bench_test.go, but in the typical case, the new implementation is about 75% slower and performs four times more allocations:

cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12            5117605               257.0 ns/op            80 B/op          2 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12          512914              2474 ns/op            7568 B/op          3 allocs/op
PASS

cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12            3261982               454.1 ns/op           176 B/op          8 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12          358680              9921 ns/op            7712 B/op         12 allocs/op
PASS

`str.format` has a number of surprising behaviours that aren't Pythonic: * Numerical replacement fields use a different syntax to named replacement fields (i.e. `${0}`). * Automatic and manual replacement field names can be used within the same format string. * Replacement field names are passed through if the corresponding keyword argument is undefined. * Format strings are allowed to contain unbalanced delimiters. Some of these were introduced in thought-machine#3146, along with some regressions that quietly broke substitution and delimiter escaping (see thought-machine#3356). Make Please's `str.format` implementation more Pythonic by aligning it with [Bazel's](https://bazel.build/rules/lib/core/string#format): * Permit the use of by-order, by-position or by-name replacement field referencing, but do not allow them to be combined in the same format string. * Parse all occurrences of `{` and `}` in the format string as if they are delimiters, except for those escaped via `{{` and `}}` - do not pass them through to the return value, and raise errors if they appear inappropriately. * Perform more robust checks on replacement field references. The new error messages are intentionally reminiscent of those returned by Python's `str.format`, and are identical in some cases. The correctness and robustness of the new implementation comes at the cost of performance - it is hard to compare them fairly against the degenerate case in `src/parse/asp/builtins_bench_test.go`, but in the typical case, the new implementation is about 75% slower and performs four times more allocations: ``` cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz BenchmarkStrFormat BenchmarkStrFormat-12 5117605 257.0 ns/op 80 B/op 2 allocs/op BenchmarkStrFormatBig BenchmarkStrFormatBig-12 512914 2474 ns/op 7568 B/op 3 allocs/op PASS cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz BenchmarkStrFormat BenchmarkStrFormat-12 3261982 454.1 ns/op 176 B/op 8 allocs/op BenchmarkStrFormatBig BenchmarkStrFormatBig-12 358680 9921 ns/op 7712 B/op 12 allocs/op PASS ```

toastwaffle · 2025-09-26T11:59:00Z

src/parse/asp/builtins.go

+			self = self[start+2:]
+			continue
 		}
+		// We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter.


Is "field name" the right word here (given it may be {} or {1})? Looks like"replacement field" is the terminology used by python

toastwaffle · 2025-09-26T11:59:39Z

src/parse/asp/builtins.go

 	return newPyInt(strings.LastIndex(string(self), string(needle)))
 }

+// strFormat implements the str.format function. It interpolates a format string using the arguments passed to


Just to check, is this used both for f-strings and explicit calls to .format()?

toastwaffle · 2025-09-26T12:00:52Z

src/parse/asp/builtins.go

-			buf.WriteString(args[arg].String())
-			arg++
-		} else if val, present := s.locals[key]; present {
+		// Extract the replacement field's name, excluding the delimiters.


Suggested change

// Extract the replacement field's name, excluding the delimiters.

// Extract the replacement field's name or index if present, excluding the delimiters.

toastwaffle · 2025-09-26T12:13:16Z

src/parse/asp/builtins.go

+			// be more of them than there are positional arguments (although there may ultimately be fewer). Internally, the
+			// first positional argument is the format string - the positional arguments from the caller's perspective are
+			// shifted one to the right in args.
+			s.Assert(fieldName == "", "cannot switch from automatic field numbering to manual field specification")


I am concerned that this error message will be very unclear to somebody who is unfamiliar with how this is implemented - I think a better phrasing would be something like "cannot mix replacement field specification types in one format string", and possibly include examples of the mismatched replacement fields

toastwaffle · 2025-09-26T12:13:49Z

src/parse/asp/builtins.go

+		case strFormatByName:
+			// With named replacement fields, output the string value of the keyword argument with the given name.
+			s.Assert(fieldName != "", "must use named replacement fields with keyword arguments")
+			val, exists := s.locals[fieldName]


are keyword arguments implemented as locals?

toastwaffle · 2025-09-26T12:34:34Z

src/parse/asp/interpreter_test.go

+					s, err := parseString(fmt.Sprintf(`ret = "%s".format(%s)`, test.FormatStr, test.Args))
+					if test.Error == "" {
+						assert.NotNil(t, s)
+						assert.NoError(t, err)


nit: I think we should assert.NoError first

toastwaffle · 2025-09-26T12:35:30Z

src/parse/asp/interpreter_test.go

+			"Dangling opening delimiter at start of string": {
+				FormatStr: `{{} {} {}`,
+				Args:      `"one", "two", "three"`,
+				Error:     "single '}' encountered in format string",


Can we include the position in this error?

toastwaffle · 2025-09-26T12:36:59Z

src/parse/asp/builtins.go

+		// Find the corresponding "}" delimiter...
+		end := strings.IndexByte(self[start+1:], '}')
+		// ...and if there isn't one, this must be a malformed format string.
+		s.Assert(end != -1, "single '{' encountered in format string")


Suggested change

s.Assert(end != -1, "single '{' encountered in format string")

s.Assert(end != -1, "unmatched and unescaped '{' encountered in format string")

toastwaffle · 2025-09-26T12:37:04Z

src/parse/asp/builtins.go

+			continue
 		}
+		// We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter.
+		s.Assert(self[start] == '{', "single '}' encountered in format string")


Suggested change

s.Assert(self[start] == '{', "single '}' encountered in format string")

s.Assert(self[start] == '{', "unmatched and unescaped '}' encountered in format string")

toastwaffle · 2025-09-26T12:44:39Z

src/parse/asp/interpreter_test.go

+			"Numerical field name": {
+				FormatStr: `1={one} 2={2} 0={zero}`,
+				Args:      `zero="a", one="b", two="c"`,
+				Error:     "unspecified keyword argument '2'",


It'd be good to special-case this to get the more meaningful error about mixing format types

chrisnovakovic requested a review from toastwaffle September 21, 2025 19:17

chrisnovakovic added bug enhancement labels Sep 21, 2025

Format src/parse/asp/builtins_bench_test.go

62547dc

toastwaffle requested changes Sep 26, 2025

View reviewed changes

toastwaffle mentioned this pull request Nov 3, 2025

Refactor brace handling in strFormat() #3446

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `str.format` more Pythonic #3412

Make `str.format` more Pythonic #3412

chrisnovakovic commented Sep 21, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

toastwaffle Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// Extract the replacement field's name, excluding the delimiters.
	// Extract the replacement field's name or index if present, excluding the delimiters.

	s.Assert(end != -1, "single '{' encountered in format string")
	s.Assert(end != -1, "unmatched and unescaped '{' encountered in format string")

	s.Assert(self[start] == '{', "single '}' encountered in format string")
	s.Assert(self[start] == '{', "unmatched and unescaped '}' encountered in format string")

Make str.format more Pythonic #3412

Are you sure you want to change the base?

Make str.format more Pythonic #3412

Conversation

chrisnovakovic commented Sep 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Make `str.format` more Pythonic #3412

Make `str.format` more Pythonic #3412