This repository was archived by the owner on Aug 5, 2024. It is now read-only.
Commit 50f1542
committed
Python3: Stop breaking surrogate pairs in toDelta()
Resolves #69 for Python3
Sometimes we can find a common prefix that runs into the middle of a
surrogate pair and we split that pair when building our diff groups.
This is fine as long as we are operating on UTF-16 code units. It
becomes problematic when we start trying to treat those substrings as
valid Unicode (or UTF-8) sequences.
When we pass these split groups into `toDelta()` we do just that and the
library crashes. In this patch we're post-processing the diff groups
before encoding them to make sure that we un-split the surrogate pairs.
The post-processed diffs should produce the same output when applying
the diffs. The diff string itself will be different but should change
that much - only by a single character at surrogate boundaries.1 parent db1cbba commit 50f1542
2 files changed
+82
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| |||
1147 | 1148 | | |
1148 | 1149 | | |
1149 | 1150 | | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
1150 | 1154 | | |
1151 | 1155 | | |
1152 | 1156 | | |
1153 | 1157 | | |
1154 | 1158 | | |
1155 | | - | |
| 1159 | + | |
1156 | 1160 | | |
1157 | | - | |
| 1161 | + | |
1158 | 1162 | | |
1159 | 1163 | | |
1160 | 1164 | | |
| |||
1172 | 1176 | | |
1173 | 1177 | | |
1174 | 1178 | | |
1175 | | - | |
| 1179 | + | |
| 1180 | + | |
1176 | 1181 | | |
1177 | 1182 | | |
1178 | 1183 | | |
| |||
1191 | 1196 | | |
1192 | 1197 | | |
1193 | 1198 | | |
1194 | | - | |
1195 | | - | |
| 1199 | + | |
| 1200 | + | |
1196 | 1201 | | |
1197 | 1202 | | |
1198 | 1203 | | |
| |||
1201 | 1206 | | |
1202 | 1207 | | |
1203 | 1208 | | |
1204 | | - | |
| 1209 | + | |
1205 | 1210 | | |
1206 | 1211 | | |
1207 | | - | |
| 1212 | + | |
1208 | 1213 | | |
1209 | 1214 | | |
1210 | 1215 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
444 | 445 | | |
445 | 446 | | |
446 | 447 | | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
447 | 454 | | |
448 | 455 | | |
449 | 456 | | |
| |||
455 | 462 | | |
456 | 463 | | |
457 | 464 | | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
458 | 528 | | |
459 | 529 | | |
460 | 530 | | |
| |||
0 commit comments