fix: stop storing HTML entities and decode legacy entity-encoded data #702
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR prevents text fields from being stored as HTML entities by switching internal/admin write paths from
getSanitisedInput()to trimmed raw input. It also normalizes escaping in a few templates where affected fields were echoed without the standard escaping helper.To keep existing installations consistent, this PR adds an installer schema upgrade that detects entity-encoded values in common text columns (job orders, companies, contacts, candidates, activities, calendar events) and decodes them back to raw UTF-8.
Note: The public Careers Portal (modules/careers/CareersUI.php) is intentionally left functionally unchanged in this PR. Related input/output hardening is being handled in PR #697. Once #697 lands, we can reassess whether additional adjustments are needed here without overlapping changes.
Motivation
While working on the special-character issues addressed in #701, it became apparent that some code paths still transform user input before it reaches the database, resulting in HTML entities being persisted. This can lead to double-escaping and inconsistent rendering of special characters across modules and environments.
By standardizing on "store raw, escape on output" and providing a best-effort upgrade step to normalize legacy data, this PR reduces character corruption risks without tackling formatting concerns like line break rendering yet.