Commit e0f3e2b
committed
mbstring: Make encoding detection stricter
PHP 8.3 changed how source encoding detection works:
https://www.php.net/manual/en/migration83.other-changes.php#migration83.other-changes.functions.mbstring
Most locales only consider `ASCII` and `UTF-8` (see `mb_detect_order()`),
and when a byte sequence invalid in both tested encodings (such as 0x91 for ‘ in Windows-1252) is encountered,
one of them might now be chosen as the most fitting encoding.
(This is done using the heuristics introduced in PHP 8.1:
php/php-src@28b346b)
Compare the output of the following script across PHP versions:
<?php
$result = hex2bin("91");
var_dump(mb_detect_encoding($result));
var_dump(mb_detect_encoding($result, 'auto', true));
var_dump(mb_convert_encoding($result, 'UTF-8', 'auto'));
Let’s run the `mb_detect_encoding()` ourselves with `$strict` argument set to `true`, to ensure consistent behaviour across all PHP versions.
This might potentially cause a regression is some cases. Not sure.
Additionally, since we are now ensuring all encodings are valid, we can drop the warning capture mechanism.
It does not work on PHP ≥ 8.0 anyway, since that raises a `ValueError` instead of a warning when an invalid encoding is provided.
https://www.php.net/manual/en/function.mb-convert-encoding.php#refsect1-function.mb-convert-encoding-errors
Also adjust the confusing string in tests.
https://www.php.net/manual/en/function.mb-convert-encoding.php
https://www.php.net/manual/en/function.mb-detect-encoding.php
https://www.php.net/manual/en/function.mb-detect-order.php1 parent 293bc26 commit e0f3e2b
3 files changed
+23
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
5 | 10 | | |
6 | 11 | | |
7 | 12 | | |
| |||
23 | 28 | | |
24 | 29 | | |
25 | 30 | | |
26 | | - | |
| 31 | + | |
| 32 | + | |
27 | 33 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
| 59 | + | |
58 | 60 | | |
59 | 61 | | |
60 | 62 | | |
61 | 63 | | |
62 | 64 | | |
63 | 65 | | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
| 66 | + | |
| 67 | + | |
72 | 68 | | |
73 | 69 | | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
| 70 | + | |
| 71 | + | |
84 | 72 | | |
85 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
86 | 83 | | |
87 | 84 | | |
88 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
| 58 | + | |
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| |||
0 commit comments