-
Notifications
You must be signed in to change notification settings - Fork 5
Dansubak/202510 authoring jupyterhub #3793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@blarghmatey or @Ardiea - this still needs work, but I'd appreciate if ya'll could take a look. Of note, I'd love your opinions on the open questions listed in the description; guessing I'll need another pass once I've got some of those answers! |
blarghmatey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would probably make sense is to wrap the majority of the resource provisioning into a function call as a sibling Python module which can then be executed using inputs to provide the customization that we want on a per-deploy basis. Most of the resources just need a '-authoring` change, so this would cut down a lot on copy-🍝
Copilot should be able to take what you've got here and figure out the majority of the refactoring.
|
Regarding the RDS, I think it makes sense to keep that on a single piece of hardware since the majority of load will be from the students and not the course authors. |
|
@blarghmatey Alright, wrapped most of the deployment related stuff into a function - let me know if this is what you had in mind. Output looks about right at a high level - it's mostly a no-op for the existing deployment and the new one looks plausibly correct. I did end up with one more open question w/r/t downtime which I've put into the description since I'm definitely not up to speed on the APISix resources; would appreciate some additional guidance! |
4d8f3d4 to
d5ab680
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR sets up a separate JupyterHub authoring environment for course content creation, running alongside the existing learner-facing JupyterHub deployment. The changes refactor the JupyterHub deployment code into a reusable function to support both environments.
Key changes:
- Refactored JupyterHub deployment into a parameterized
provision_jupyterhub_deployment()function - Added separate
jupyter-authoringnamespace and deployment with its own domain and configuration - Both deployments share the same physical RDS database but use separate logical databases and Vault backends
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
deployment.py |
New reusable function to provision JupyterHub with all AWS/K8s resources |
__main__.py |
Refactored to use new deployment function for both main and authoring environments |
author_menu_override.json |
Empty JSON config placeholder for authoring environment menu customization |
author_disabled_extensions.json |
Empty JSON config placeholder for authoring environment extension settings |
| EKS Pulumi configs (CI/QA/Production) | Added jupyter-authoring namespace to each environment |
| JupyterHub Pulumi configs (CI/QA/Production) | Added authoring_domain configuration for each environment |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Args: | ||
| base_name: Base name for the deployment (e.g., "jupyterhub" or | ||
| "jupyterhub-authoring"). This name is used to derive all resource | ||
| names throughout the deployment. | ||
| domain_name: Domain name for the deployment (e.g., "jupyter.mitlearn.mit.edu") | ||
| namespace: Kubernetes namespace for the deployment | ||
| stack_info: Stack information from parse_stack() | ||
| jupyterhub_config: Pulumi config for jupyterhub | ||
| vault_config: Pulumi config for vault | ||
| cluster_stack: EKS cluster stack reference | ||
| application_labels: Labels to apply to Kubernetes resources | ||
| k8s_global_labels: Global Kubernetes labels | ||
| extra_images: List of extra images to pre-pull (optional) | ||
| menu_override_json: JSON contents for menu override Jupyter config | ||
| disabled_extensions_json: JSON contents for disabled extensions Jupyter config | ||
| extra_config: Extra configuration values to merge into the Helm chart values |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is missing the db_config and app_db parameters which are required arguments in the function signature (lines 45 and 47). These should be documented to explain their purpose in configuring the database connection.
| f"ol-jupyterhub-vault-policy-{stack_info.env_suffix}", | ||
| name="jupyterhub", | ||
| policy=Path(__file__).parent.joinpath("jupyterhub_policy.hcl").read_text(), | ||
| jupyterhub_authoring_db_config = OLPostgresDBConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're not creating another physical RDS instance, the change that is necessary to allow for an additional logical database is actually to manage that via the Vault role definitions. You can see an example of how to modify that in https://github.com/mitodl/ol-infrastructure/blob/main/src/ol_infrastructure/applications/mit_learn/__main__.py#L539-L752
In the role definition you'll want to add a 'create databasestatement along the lines ofSELECT 'CREATE DATABASE my_database' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'my_database')\gexec;`
You'll then need to reference that role in the dynamic credentials used by the authoring hub deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright @blarghmatey, I pattern matched against the link so it looks plausibly correct now. That said, I'm definitely a bit out of my depth, having done very little with postgres or vault so I've got a few questions to run by ya!
Correct me if I'm misreading anything but I'm not quite sure this'll work as written based on a bit of research - I think gexec is specific to psql and vault will use a driver so we won't have that, and since CREATE DATABASE won't run in a transaction and it seems like Vault's PG plugin appears to use txns I don't think I can conditionalize it using DO.
One suggestion was to use dblink_exec (this actually also references gexec as well) - do you know if that's installed and available to vault?
FWIW, I'm happy to give this a rip on CI and see what happens to start with as long as you think that's safe enough, but any preliminary advice is greatly appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running it as is gets me the following error:
Diagnostics:
pulumi:pulumi:Stack (ol-infrastructure-jupyterhub-application-applications.jupyterhub.CI):
error: update failed
error: kubernetes:yaml/v2:ConfigGroup resource 'jupyterhub-authoring-vso-resources' has a problem: grpc: the client connection is closing
error: kubernetes:yaml/v2:ConfigGroup resource 'OLVaultK8SSecret-jupyter-authoring-jupyterhub-authoring-app-db-creds' has a problem: grpc: the client connection is closing
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1764097778.993076 40670945 fork_posix.cc:71] Other threads are currently calling into gRPC, skipping fork() handlers
vault:database:SecretBackendConnection (jupyterhub_authoring-database-connection):
error: sdk-v2/provider2.go:572: sdk.helper_schema: error configuring database connection "postgres-jupyterhub-authoring/config/jupyterhub_authoring": Error making API request.
URL: PUT https://vault-ci.odl.mit.edu/v1/postgres-jupyterhub-authoring/config/jupyterhub_authoring
Code: 400. Errors:
* error creating database object: error verifying connection: ping failed: failed to connect to `host=jupyterhub-db-ci.cbnm7ajau6mi.us-east-1.rds.amazonaws.com user=oldevops database=jupyterhub_authoring`: server error (FATAL: database "jupyterhub_authoring" does not exist (SQLSTATE 3D000)): [email protected]
error: 1 error occurred:
* error configuring database connection "postgres-jupyterhub-authoring/config/jupyterhub_authoring": Error making API request.
URL: PUT https://vault-ci.odl.mit.edu/v1/postgres-jupyterhub-authoring/config/jupyterhub_authoring
Code: 400. Errors:
* error creating database object: error verifying connection: ping failed: failed to connect to `host=jupyterhub-db-ci.cbnm7ajau6mi.us-east-1.rds.amazonaws.com user=oldevops database=jupyterhub_authoring`: server error (FATAL: database "jupyterhub_authoring" does not exist (SQLSTATE 3D000))
That's not entirely surprising if I'm reading this right and understanding when everything executes - the database wouldn't exist until the role statements run (assuming the role statements work) so just attempting to connect as the admin user chokes.
Unfortunately that does leave me a bit unsure of how to proceed.
- I could specify the application jupyter database in the
OLPostgresDBConfig, which I think will let it connect and execute the role creation statements, but will that cause issues down the line? I think I can specify the deployment-specific database value in theOLVaultK8SDynamicSecretConfigbut I'm not sure if there's other times it'd connect using the vault info that I should be aware of. - I can make the database manually if we'd like - it'd work, but it also means that we'd have a manual step every time we instantiated more than one deployment per stack with distinct databases
- I could maybe wire up a database resource directly? It's not entirely clear to me if this'll work since I've only been working w/ the OL component resource wrappers, but if I can connect to it and that represents a logical db, I think I can get that going
- I could try and shove everything into the same logical database. Probably like 50/50 odds that'd work at the outset, but I am pretty sure that it'll be a question of when, not if it comes back to bite us.
…physical hardware w/ a new logical database
… though since we want one physical instance
…e statements correct and vault bits in place though
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
283f9aa to
75b56f0
Compare
Description (What does it do?)
This is a first, rough cut at getting a jupyterhub authoring environment set up. It's definitely not complete but it provides a starting point.
How can this be tested?
Currently it cannot and probably doesn't work. Notably, I expect that we may need some changes to the APISix and Keycloak. However I wanted to put this up as early as possible so I could get eyes on it.
I have not currently run this, but want to run this against
applications.jupyterhub.CIonce it has gotten some preliminary eyes. I believe I'll need to run againstinfrastructure.aws.eks.applications.CIbefore I do so to get the namespace provisionedThe
pulumi upcommand output is as follows.Details can be found in the following.
jupyterhub_authoring_pulumi_ouput.txt
The only resource it wants to destroy is
ol-k8s-apisix-olapisixoidcresources-ci- this is because it's now being replaced with an identical one namedol-k8s-apisix-jupyterhub-olapisixoidcresources-ci.Open Questions
hub)ol-k8s-apisix-olapisixoidcresources-cian online operation or will that result in downtime?Resolved Questions
Database: We could probably use the same one and just do two logical databases if we want. I can't see an obvious reason it wouldn't work - question is if we'd prefer hardware-level separation or not.We're going to cram everything into one physical database.