Skip to content

Conversation

@maximevaillancourt
Copy link

@maximevaillancourt maximevaillancourt commented Oct 29, 2025

Prevents raising Encoding::CompatibilityError: invalid byte sequence in UTF-8.

This new spec would fail on main:

Failures:

  1) PDF::Reader::PageLayout#to_s with an A4 page with one word that includes an invalid UTF-8 byte sequence returns a correct string with replacement character
     Failure/Error: line_lengths = rows.map { |l| l.strip.length }
     
     Encoding::CompatibilityError:
       invalid byte sequence in UTF-8
     # ./lib/pdf/reader/page_layout.rb:75:in 'String#strip'
     # ./lib/pdf/reader/page_layout.rb:75:in 'block in PDF::Reader::PageLayout#interesting_rows'
     # ./lib/pdf/reader/page_layout.rb:75:in 'Array#map'
     # ./lib/pdf/reader/page_layout.rb:75:in 'PDF::Reader::PageLayout#interesting_rows'
     # ./lib/pdf/reader/page_layout.rb:52:in 'PDF::Reader::PageLayout#to_s'
     # ./spec/page_layout_spec.rb:226:in 'block (5 levels) in <top (required)>'

Finished in 0.00598 seconds (files took 0.2636 seconds to load)
18 examples, 1 failure

Here's an example PDF to test: invalid-byte-sequence.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant