-
Notifications
You must be signed in to change notification settings - Fork 2
Rework mime type white list #2198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Rework mime type white list #2198
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 10 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
| 'image/x-ms-bmp', | ||
| 'text/plain', | ||
| 'text/csv' | ||
| *MIME_TYPES_DOCUMENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I excluded json files by default. Does it make sense or can we allow json files anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON is pretty harmless, so we could allow it. On the other hand we have no existing uploads with that mimetype, so it can wait until someone explicitly needs it.
| people_source = UploadMultipleField( | ||
| label=_('People Data (JSON)'), | ||
| description=_('JSON file containing parliamentarian data.'), | ||
| validators=[WhitelistedMimeType(MIME_TYPES_JSON)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But of course json files are allowed if explicitly enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these import files aren't stored we could also just not have the validator here, since it could lead to false positives. There's nothing dangerous about a JSON parser opening these files, whatever they may contain.
|
I saw that files types are handled differently for |
|
Should I completely remove type |
We can make sure to set |
Daverball
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, but there's a couple of details we should iron out.
| 'application/msword', # doc | ||
| 'application/rtf', | ||
| *MIME_TYPES_PDF, | ||
| 'application/vnd.ms-excel', # xls | ||
| ('application/vnd.openxmlformats-officedocument.' | ||
| 'presentationml.presentation'), # pptx | ||
| ('application/vnd.openxmlformats-officedocument.' | ||
| 'spreadsheetml.sheet'), # xlsx | ||
| ('application/vnd.openxmlformats-officedocument.' | ||
| 'wordprocessingml.document'), # docx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to allow some extra mimetypes like application/CDFV2 and application/CDFV2-unknown, which I believe can be reported for some really old Word files and application/x-ole-storage which can be reported for some old Excel files, especially since we did have some of those.
You could try to take a look at some example files, in order to verify that these are indeed legitimate files. loxo has some files with those mimetypes, so it should be fairly quick to check those three instances
| MIME_TYPES_IMAGE = { | ||
| 'image/bmp', | ||
| 'image/gif', | ||
| 'image/jpeg', # jpeg, jpg | ||
| 'image/png', | ||
| 'image/svg', | ||
| 'image/svg+xml', | ||
| 'image/tiff', | ||
| 'image/webp', # shall we allow it? | ||
| 'image/x-ms-bmp', | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could be more generous and allow onegov.file.get_supported_image_mime_types instead (You may need to manually add image/svg+xml to that list, since we don't process SVG files with Pillow).
Also we currently only sanitize image/svg+xml, not image/svg, which would make the latter unsafe, although I assume this probably means our mimetype detection can never return image/svg, but we could expand our checks in onegov.file.attachments to include image/svg just to be extra safe. Or remove it from the whitelist here.
| 'image/x-ms-bmp', | ||
| 'text/plain', | ||
| 'text/csv' | ||
| *MIME_TYPES_DOCUMENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON is pretty harmless, so we could allow it. On the other hand we have no existing uploads with that mimetype, so it can wait until someone explicitly needs it.
| 'text/csv', | ||
| 'text/plain', | ||
| }), | ||
| WhitelistedMimeType(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a file we store and could be downloaded by unsuspecting users after the fact, so the whitelist being strict isn't that important. That being said, we could probably trim it a little bit, since all we seem to accept for event imports are .xls and .xlsx files, it might be worth adding application/x-ole-storage though for old Excel files and application/octet-stream is probably fine here as well.
So I would keep the original whitelist, get rid of the bottom three and add application/x-ole-storage.
| people_source = UploadMultipleField( | ||
| label=_('People Data (JSON)'), | ||
| description=_('JSON file containing parliamentarian data.'), | ||
| validators=[WhitelistedMimeType(MIME_TYPES_JSON)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these import files aren't stored we could also just not have the validator here, since it could lead to false positives. There's nothing dangerous about a JSON parser opening these files, whatever they may contain.
| action: Literal['keep', 'replace', 'delete'] | ||
| file: IO[bytes] | None | ||
| filename: str | None | ||
| validators = [WhitelistedMimeType()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not very robust, we definitely should overwrite __init__ instead, the only remaining question is, whether or not we want to add an extra parameter allowed_mimetypes or if we want to change the default of the validators argument to (WhitelistedMimeType(),).
I kind of like the extra parameter better, since it means we don't need to import WhitelistedMimeType everywhere.
You can then pass it on to super().__init__ as validators=[*(validators or ()), WhitelistedMimeType(allowed_mimetypes)].
|
|
||
| upload_field_class: type[UploadField] = UploadField | ||
| upload_widget: Widget[UploadField] = UploadWidget() | ||
| validators = [WhitelistedMimeType()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here
| widget=widget, # type:ignore[arg-type] | ||
| render_kw=render_kw, | ||
| name=name, | ||
| validators=validators, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you passing the validators to both the list and each field in the list? Is there something that didn't work right when it was only passed to each field in the list?
It's probably fine to remove it for now. There may however be the rare false positive for any files that cannot be identified correctly by libmagic. Generally pdfs, zips and any other binary file formats can end up as |
Org: Ensure mime type validator on file upload fields in form code
TYPE: Feature
LINK: ogc-2738