-
Notifications
You must be signed in to change notification settings - Fork 356
Add user friendly messages to snakemake workflow #1846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi @willu47, Thanks for this PR! It will certainly improve the readability of the logs. However, I'm concerned that removing the rule name from the logs when the rule starts cause inconvenience. I'm used to searching for rule names in the logs when debugging. Using the example in the description, I would only find finished jobs. This makes it more difficult to determine when the rule started, and where to look for logs from a specific rule. This is particularly important when multiple rules are logging at the same time. |
|
NB this could be mitigated by adding the rule name into the rule my_cool_rule:
message:
"This is a cool message for {rule}."gives Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
------------ -------
my_cool_rule 1
total 1
Select jobs to execute...
Execute 1 jobs...
[Mon Oct 6 09:31:38 2025]
Job 0: This is a cool message for my_cool_rule.
Reason: Rules with neither input nor output files are always executed.
[Mon Oct 6 09:31:38 2025]
Finished jobid: 0 (Rule: my_cool_rule)edit: rule my_cool_rule:
message:
"This is a cool message for {rule}."
# Make sure not to use `rule` as a variable name
for name, r in rules._rules.items():
print(r.rule.message)
r.rule.message += " And this is an extra message for {rule}."output: Building DAG of jobs...
Job stats:
job count
------------ -------
my_cool_rule 1
total 1
[Mon Oct 6 09:40:40 2025]
Job 0: This is a cool message for my_cool_rule. And this is an extra message for my_cool_rule.
Reason: Rules with neither input nor output files are always executed.
Job stats:
job count
------------ -------
my_cool_rule 1
total 1
Reasons:
(check individual jobs above for details)
neither input nor output:
my_cool_rule
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution. |
|
On the advice of @euronion, I've appended the rule names to the end of the messages. Here's what the job start and end points look like in the logs now: [Mon Oct 6 09:59:23 2025]
Job 30: Retrieving synthetic electricity demand data (Rule: retrieve_synthetic_electricity_demand)
Reason: Missing output files: data/load_synthetic_raw.csv
[Mon Oct 6 09:59:28 2025]
Finished jobid: 30 (Rule: retrieve_synthetic_electricity_demand)
3 of 45 steps (7%) done
[Mon Oct 6 09:59:23 2025]
Job 38: Retrieving shipping raster data (Rule: retrieve_ship_raster)
Reason: Missing output files: data/shipdensity_global.zip
[Mon Oct 6 09:59:28 2025]
Finished jobid: 38 (Rule: retrieve_ship_raster)
4 of 45 steps (9%) done
[Mon Oct 6 09:59:28 2025]
Job 8: Retrieving Marine Regions data (Rule: retrieve_eez)
Reason: Missing output files: data/eez/World_EEZ_v12_20231025_LR/eez_v12_lowres.gpkg
[Mon Oct 6 09:59:28 2025]
Finished jobid: 8 (Rule: retrieve_eez)
5 of 45 steps (11%) done |
lkstrp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @willu47 !
I would rather vote for putting it at the front:
[Mon Oct 6 09:59:23 2025]
Job 30: retrieve_synthetic_electricity_demand - Retrieving synthetic electricity demand data.
Reason: Missing output files: data/load_synthetic_raw.csv
For some rules, the messages contain useful additional information (e.g. when wildcards are used). But for others, there is no added value (e.g. simplify_network vs "Simplifying network"). The trade off is fine for me, but I think the rule name, which is the most important information, should still be at the beginning
Snakefile
Outdated
|
|
||
| # Appends the name of the rule to each rule's message for easier identification in logs | ||
| for name, r in rules._rules.items(): | ||
| if f"{name}" not in r.rule.message: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check isn't very robust for short rule names. If we follow the proposed format, the following might be better:
| if f"{name}" not in r.rule.message: | |
| if not r.rule.message.startswith(f"{rule} - ") |
I think this is a valid point by @lkstrp . Overwriting the Although looking at the code in |
|
I very much support the idea of descriptive messages in the console output! At the same time, I agree with the concerns about rulename, wildcard, input, output, and resource information, which should not be lost. Let's try to have both? |
No, not a bug. It's just quite complicated, the logging system shuttles this extra struct down to finally: This actually sent me on an exploration spree which resulted in the following experiment: https://github.com/coroa/snakemake-logger-plugin-tui You can install that with And then trial run it with: It integrates with the messages here, so feel free to use this branch. |
|
Thanks all for the comments. If I try to summarise the discussion so far:
I for one am surprised that snakemake does not seem to use a two tier logging structure out of the box. Normally INFO level messages are much less verbose than what snakemake prints out, and would correspond to a more user-friendly description level. A DEBUG level would then include more verbose input and output information corresponding to what a power user might want. Personally, I would also appreciate a less verbose INFO level print out to the console, and a more verbose DEBUG output streamed to the filesystem. Way forwardOption 1Incorporate all information needed for logging into the descriptive message. Messages can include wildcards (this is already implemented); and we could add a custom function reproducing the existing logging behaviour of snakemake, only adding the descriptive message. Option 2A custom logging plugin (TUI), enabling different logging levels, the redirection of verbose DEBUG messages to the filesystem and descriptive INFO level messages to the console. Option 3Leave it as it is! Close this pull request and save this conversation for posterity! |
I would favour Option 1: Reproducing the original |
The logger plugin registers a derived class of snakemake's DefaultFormatter which shows the bold message in front of the regular job info block.
This reverts commit c4fd02e.
|
The minimal variant of Option 2, is not difficult, demonstration in #1855 . |
for more information, see https://pre-commit.ci
|
@coroa I have merged/cherry-picked your commits implementing the plugin logger and then renamed it to PyPSA. You can close #1855 without merging. I noticed that the log file being produced does not contain sufficient information for debugging - but I think that is a separate issue. For now, I believe the PR is ready for final review and merge. |
lkstrp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it !
The profile and plugin setup is great. I could add another plugin to that structure soon and was searching for a good solution.
Thanks everyone! Some minor comments and then good to go, if everyone is happy
| pass | ||
|
|
||
|
|
||
| class PypsaFormatter(DefaultFormatter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Chief of Branding, I suffer from reading Pypsa, but I'll be fine 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See what you mean, but you're strong.
|
Glad, you like it. It will be a small nightmare or maybe a good claude job to align with #1869 . |
Co-authored-by: Lukas Trippe <[email protected]>
Thanks for that @willu47 . It looks good, i would say. Yes, it is a known issue that the logfile does not contain all relevant output, because snakemake.log only contains logging output from the main snakemake executor not the launched job subprocesses. It's a thing that has been long on my list to try to improve the job log output. But agreed not something for this PR. I would like to hold off with merging this one until we get the retrieval updates in. |

Closes #1844 .
Changes proposed in this Pull Request
Adds user-readable messages describing what each rule does in understandable language.
Instead of:
we see
Checklist
envs/environment.yaml.config/config.default.yaml.doc/configtables/*.csv.doc/data_sources.rst.doc/release_notes.rstis added.TODO