Skip to content

Conversation

@Tschuppi81
Copy link
Contributor

@Tschuppi81 Tschuppi81 commented Nov 10, 2025

Org: Ensure mime type validator on file upload fields in form code

TYPE: Feature
LINK: ogc-2738

@linear
Copy link

linear bot commented Nov 10, 2025

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

❌ Patch coverage is 97.61905% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 85.35%. Comparing base (fc831a5) to head (c31d0e3).
⚠️ Report is 10 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/onegov/form/validators.py 96.42% 1 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/onegov/agency/forms/agency.py 97.26% <100.00%> (ø)
src/onegov/election_day/forms/election.py 98.14% <100.00%> (ø)
src/onegov/election_day/forms/election_compound.py 97.76% <100.00%> (ø)
src/onegov/election_day/forms/upload/common.py 100.00% <ø> (ø)
src/onegov/election_day/forms/vote.py 97.68% <100.00%> (ø)
src/onegov/form/fields.py 94.74% <100.00%> (+0.02%) ⬆️
src/onegov/landsgemeinde/forms/agenda.py 47.66% <100.00%> (-0.27%) ⬇️
src/onegov/landsgemeinde/forms/assembly.py 94.91% <100.00%> (-0.09%) ⬇️
src/onegov/org/forms/event.py 91.88% <ø> (ø)
src/onegov/org/forms/parliamentarian.py 98.59% <100.00%> (ø)
... and 6 more

... and 10 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc831a5...c31d0e3. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

'image/x-ms-bmp',
'text/plain',
'text/csv'
*MIME_TYPES_DOCUMENT,
Copy link
Contributor Author

@Tschuppi81 Tschuppi81 Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I excluded json files by default. Does it make sense or can we allow json files anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON is pretty harmless, so we could allow it. On the other hand we have no existing uploads with that mimetype, so it can wait until someone explicitly needs it.

people_source = UploadMultipleField(
label=_('People Data (JSON)'),
description=_('JSON file containing parliamentarian data.'),
validators=[WhitelistedMimeType(MIME_TYPES_JSON)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But of course json files are allowed if explicitly enabled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these import files aren't stored we could also just not have the validator here, since it could lead to false positives. There's nothing dangerous about a JSON parser opening these files, whatever they may contain.

@Tschuppi81 Tschuppi81 requested a review from Daverball December 4, 2025 12:03
@Tschuppi81
Copy link
Contributor Author

I saw that files types are handled differently for view_upload_file_by_json in handle_file_upload. Basically all file types are allowed. Shall we keep this?

@Tschuppi81
Copy link
Contributor Author

Should I completely remove type application/octet-stream ? It is mostly used in conjunction with application/zip

@Daverball
Copy link
Member

I saw that files types are handled differently for view_upload_file_by_json in handle_file_upload. Basically all file types are allowed. Shall we keep this?

We can make sure to set supported_content_types on GeneralFileCollection. That's the only one that would allow anything to be uploaded currently through those views.

Copy link
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but there's a couple of details we should iron out.

Comment on lines +146 to +155
'application/msword', # doc
'application/rtf',
*MIME_TYPES_PDF,
'application/vnd.ms-excel', # xls
('application/vnd.openxmlformats-officedocument.'
'presentationml.presentation'), # pptx
('application/vnd.openxmlformats-officedocument.'
'spreadsheetml.sheet'), # xlsx
('application/vnd.openxmlformats-officedocument.'
'wordprocessingml.document'), # docx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to allow some extra mimetypes like application/CDFV2 and application/CDFV2-unknown, which I believe can be reported for some really old Word files and application/x-ole-storage which can be reported for some old Excel files, especially since we did have some of those.

You could try to take a look at some example files, in order to verify that these are indeed legitimate files. loxo has some files with those mimetypes, so it should be fairly quick to check those three instances

Comment on lines +171 to +181
MIME_TYPES_IMAGE = {
'image/bmp',
'image/gif',
'image/jpeg', # jpeg, jpg
'image/png',
'image/svg',
'image/svg+xml',
'image/tiff',
'image/webp', # shall we allow it?
'image/x-ms-bmp',
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could be more generous and allow onegov.file.get_supported_image_mime_types instead (You may need to manually add image/svg+xml to that list, since we don't process SVG files with Pillow).

Also we currently only sanitize image/svg+xml, not image/svg, which would make the latter unsafe, although I assume this probably means our mimetype detection can never return image/svg, but we could expand our checks in onegov.file.attachments to include image/svg just to be extra safe. Or remove it from the whitelist here.

'image/x-ms-bmp',
'text/plain',
'text/csv'
*MIME_TYPES_DOCUMENT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON is pretty harmless, so we could allow it. On the other hand we have no existing uploads with that mimetype, so it can wait until someone explicitly needs it.

'text/csv',
'text/plain',
}),
WhitelistedMimeType(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a file we store and could be downloaded by unsuspecting users after the fact, so the whitelist being strict isn't that important. That being said, we could probably trim it a little bit, since all we seem to accept for event imports are .xls and .xlsx files, it might be worth adding application/x-ole-storage though for old Excel files and application/octet-stream is probably fine here as well.

So I would keep the original whitelist, get rid of the bottom three and add application/x-ole-storage.

people_source = UploadMultipleField(
label=_('People Data (JSON)'),
description=_('JSON file containing parliamentarian data.'),
validators=[WhitelistedMimeType(MIME_TYPES_JSON)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these import files aren't stored we could also just not have the validator here, since it could lead to false positives. There's nothing dangerous about a JSON parser opening these files, whatever they may contain.

action: Literal['keep', 'replace', 'delete']
file: IO[bytes] | None
filename: str | None
validators = [WhitelistedMimeType()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very robust, we definitely should overwrite __init__ instead, the only remaining question is, whether or not we want to add an extra parameter allowed_mimetypes or if we want to change the default of the validators argument to (WhitelistedMimeType(),).

I kind of like the extra parameter better, since it means we don't need to import WhitelistedMimeType everywhere.

You can then pass it on to super().__init__ as validators=[*(validators or ()), WhitelistedMimeType(allowed_mimetypes)].


upload_field_class: type[UploadField] = UploadField
upload_widget: Widget[UploadField] = UploadWidget()
validators = [WhitelistedMimeType()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here

widget=widget, # type:ignore[arg-type]
render_kw=render_kw,
name=name,
validators=validators,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you passing the validators to both the list and each field in the list? Is there something that didn't work right when it was only passed to each field in the list?

@Daverball
Copy link
Member

Daverball commented Dec 4, 2025

Should I completely remove type application/octet-stream ? It is mostly used in conjunction with application/zip

It's probably fine to remove it for now. There may however be the rare false positive for any files that cannot be identified correctly by libmagic. Generally pdfs, zips and any other binary file formats can end up as application/octet-stream, it's a generic catch-all content type for binary data if it couldn't be detected as anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants