#scrub lines to gracefully handle invalid UTF-8 byte sequences #569

maximevaillancourt · 2025-10-29T13:25:36Z

Prevents raising Encoding::CompatibilityError: invalid byte sequence in UTF-8.

This new spec would fail on main:

Failures:

  1) PDF::Reader::PageLayout#to_s with an A4 page with one word that includes an invalid UTF-8 byte sequence returns a correct string with replacement character
     Failure/Error: line_lengths = rows.map { |l| l.strip.length }
     
     Encoding::CompatibilityError:
       invalid byte sequence in UTF-8
     # ./lib/pdf/reader/page_layout.rb:75:in 'String#strip'
     # ./lib/pdf/reader/page_layout.rb:75:in 'block in PDF::Reader::PageLayout#interesting_rows'
     # ./lib/pdf/reader/page_layout.rb:75:in 'Array#map'
     # ./lib/pdf/reader/page_layout.rb:75:in 'PDF::Reader::PageLayout#interesting_rows'
     # ./lib/pdf/reader/page_layout.rb:52:in 'PDF::Reader::PageLayout#to_s'
     # ./spec/page_layout_spec.rb:226:in 'block (5 levels) in <top (required)>'

Finished in 0.00598 seconds (files took 0.2636 seconds to load)
18 examples, 1 failure

Here's an example PDF to test: invalid-byte-sequence.pdf

#scrub lines to gracefully handle invalid UTF-8 byte sequences

ee142dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#scrub lines to gracefully handle invalid UTF-8 byte sequences #569

#scrub lines to gracefully handle invalid UTF-8 byte sequences #569

Uh oh!

maximevaillancourt commented Oct 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

#scrub lines to gracefully handle invalid UTF-8 byte sequences #569

Are you sure you want to change the base?

#scrub lines to gracefully handle invalid UTF-8 byte sequences #569

Uh oh!

Conversation

maximevaillancourt commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maximevaillancourt commented Oct 29, 2025 •

edited

Loading