Skip to content

403 error on resource_copy #244

@dvdokkum

Description

@dvdokkum

Describe the Bug

When running process-events job in the Event Gather workflow, there's a 403 error when trying to copy the video file.

Expected Behavior

The video file is publicly accessible (link in the logs below) so I would expect the script to not hit a 403.

Reproduction

This is for a cookie cutter CDP instance for the Legistar chapelhill client. Link to the failed run logs: https://github.com/triangleblogblog/cdp-ch/actions/runs/8516893019/job/23326788209

Error logs:

[INFO: file_utils: 203 2024-04-02 03:30:10,750] Beginning resource copy from: https://archive-video.granicus.com/chapelhill/chapelhill_bf9e87e2-e776-11ee-98bb-0050569183fa.mp4
/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/urllib3/connectionpool.py:1061: InsecureRequestWarning: Unverified HTTPS request is being made to host 'archive-video.granicus.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
[ERROR: file_utils: 263 2024-04-02 03:30:10,836] Something went wrong during resource copy. Attempted copy from: 'https://archive-video.granicus.com/chapelhill/chapelhill_bf9e87e2-e776-11ee-98bb-0050569183fa.mp4', resulted in error.
[ERROR: task_runner: 910 2024-04-02 03:30:10,836] Task 'resource_copy_task': Exception encountered during task execution!
Traceback (most recent call last):
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/prefect/engine/task_runner.py", line 880, in get_task_run_state
    value = prefect.utilities.executors.run_task_with_timeout(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/prefect/utilities/executors.py", line 468, in run_task_with_timeout
    return task.run(*args, **kwargs)  # type: ignore
[2024-04-02 03:30:10+0000] ERROR - prefect.TaskRunner | Task 'resource_copy_task': Exception encountered during task execution!
Traceback (most recent call last):
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/prefect/engine/task_runner.py", line 880, in get_task_run_state
    value = prefect.utilities.executors.run_task_with_timeout(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/prefect/utilities/executors.py", line 468, in run_task_with_timeout
    return task.run(*args, **kwargs)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/cdp_backend/pipeline/event_gather_pipeline.py", line 272, in resource_copy_task
    return file_utils.resource_copy(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/cdp_backend/utils/file_utils.py", line 267, in resource_copy
    raise e
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/cdp_backend/utils/file_utils.py", line 246, in resource_copy
    response.raise_for_status()
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://archive-video.granicus.com/chapelhill/chapelhill_bf9e87e2-e776-11ee-98bb-0050569183fa.mp4
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/cdp_backend/pipeline/event_gather_pipeline.py", line 272, in resource_copy_task
    return file_utils.resource_copy(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/cdp_backend/utils/file_utils.py", line 267, in resource_copy
    raise e
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/cdp_backend/utils/file_utils.py", line 246, in resource_copy
    response.raise_for_status()
  File "/__w/_tool/Python/3.11.8/x64/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://archive-video.granicus.com/chapelhill/chapelhill_bf9e87e2-e776-11ee-98bb-0050569183fa.mp4
[INFO: task_runner: 335 [202](https://github.com/triangleblogblog/cdp-ch/actions/runs/8516893019/job/23326788209#step:11:203)4-04-02 03:30:10,844] Task 'resource_copy_task': Finished task run for task with final state: 'Retrying'```

### Environment

<!-- Any additional information about your environment. -->

-   cdp-backend Version: latest

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions