-
Notifications
You must be signed in to change notification settings - Fork 1
[DRAFT][PRMP-188] Refactor bulk upload #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| logger.info( | ||
| "Cannot validate patient due to PDS responded with Too Many Requests" | ||
| ) | ||
| logger.info("Cannot process for now due to PDS rate limit reached.") | ||
| logger.info( | ||
| "All remaining messages in this batch will be returned to sqs queue to retry later." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary true, not every pds failure is rate limit failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a refactor of bulk upload, so no logic was/can be changed in this ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you still remove/edit the logs? that is not a logic change.
| return StagingMetadata.model_validate_json(staging_metadata_json) | ||
| except (pydantic.ValidationError, KeyError) as e: | ||
| logger.error(f"Got incomprehensible message: {message}") | ||
| logger.error(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You log the error here and then raise it and log it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to keep the bulk upload logic as much possible
| staging_metadata_json = message["body"] | ||
| return StagingMetadata.model_validate_json(staging_metadata_json) | ||
| except (pydantic.ValidationError, KeyError) as e: | ||
| logger.error(f"Got incomprehensible message: {message}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would error when body does exist, so this log does not reflect the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to keep the bulk upload logic as much possible
| file_names = [ | ||
| os.path.basename(metadata.file_path) for metadata in staging_metadata.files | ||
| ] | ||
| request_context.patient_nhs_no = staging_metadata.nhs_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you set this, every log will have this nhs number in them until the value is reset. Which mean this will be in the logs of the next message that belongs to another nhs number. You need to reset this to avoid confusing info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a refactor so cant change logic in this ticket
| if self.unhandled_messages: | ||
| logger.info("Unable to process the following messages:") | ||
| for message in self.unhandled_messages: | ||
| message_body = json.loads(message.get("body", "{}")) | ||
| request_context.patient_nhs_no = message_body.get( | ||
| "NHS-NO", "no number found" | ||
| ) | ||
| logger.info(message_body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens to these messages, do we a dlq setup for thme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ticket created for it
| This method performs the following steps: | ||
| 1. Parses the message and constructs staging metadata. | ||
| 2. Validates the NHS number and file names. | ||
| 3. Performs additional validation checks such as patient access conditions | ||
| (e.g., deceased, restricted) and virus scan results. | ||
| 4. Initiates transactional operations and transfers the validated files. | ||
| 5. Removes the ingested files from the staging bucket. | ||
| 6. Logs the completion of ingestion and writes the report to DynamoDB. | ||
| 7. Sends metadata to the stitching queue for further processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this method is doing 7 steps then it is doing too much.
Functions Should Do One Thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this service need breaking down to smaller services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for new we will keep it as is, but that is something that needs to be done
Co-authored-by: Mohammad Iqbal <[email protected]>
|
Unit tests are failing. |
|
|
||
| logger.info( | ||
| "NHS Number and filename validation complete." | ||
| "Validated strict mode, and patient information is accessible (e.g. patient not deceased/restricted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing closing quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
MohammadIqbalAD-NHS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests failing.
|
I thought the V2 code was going to be added before this refactor, this way we won't have visibility/ traceability of what's been refactored once it all gets squashed |
bethany-kish-nhs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you compare with bulk upload on main it looks like changes have been made there and the V2 code here is out of date
eg. custodian logic has been changed on main here:
f9941b0
We need to make sure this ticket only refactors what's there on main and doesn't miss anything, we'll need to have a process for keeping both up to date as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that you have a PR with a title of Refactor Bulk Upload but I'm unsure what the intention is?
While I have seen the JIRA ticket which does have information on what your attempting to accomplish, I think your trying to do too much at once.
I suggest you breakdown the work further so the PR's have intent e.g. I'm doing XYZ
Created a new class called bulk_upload_service_v2
Refactored this class to better follow good practices
https://nhsd-jira.digital.nhs.uk/browse/PRMT-580