Skip to content

Conversation

@willu47
Copy link

@willu47 willu47 commented Oct 3, 2025

Closes #1844 .

Changes proposed in this Pull Request

Adds user-readable messages describing what each rule does in understandable language.

Instead of:

[Fri Oct  3 17:11:26 2025]
Job 0: plot_power_networks_clustered
Reason: Input files updated by another job: resources/test-elec-tyndp/maps/power-network-s-all.pdf
[Fri Oct  3 17:11:26 2025]
Finished jobid: 0 (Rule: plot_power_networks_clustered)

we see

[Fri Oct  3 17:11:26 2025]
Job 0: Plotting clustered power network topology
Reason: Input files updated by another job: resources/test-elec-tyndp/maps/power-network-s-all.pdf
[Fri Oct  3 17:11:26 2025]
Finished jobid: 0 (Rule: plot_power_networks_clustered)

Checklist

  • I tested my contribution locally and it works as intended.
  • Code and workflow changes are sufficiently documented.
  • Changed dependencies are added to envs/environment.yaml.
  • Changes in configuration options are added in config/config.default.yaml.
  • Changes in configuration options are documented in doc/configtables/*.csv.
  • Sources of newly added data are documented in doc/data_sources.rst.
  • A release note doc/release_notes.rst is added.

TODO

@willu47 willu47 marked this pull request as ready for review October 3, 2025 15:59
@willu47 willu47 changed the title Feature/1844 snakemake message Add user friendly messages to snakemake workflow Oct 3, 2025
@tgilon
Copy link
Collaborator

tgilon commented Oct 5, 2025

Hi @willu47, Thanks for this PR! It will certainly improve the readability of the logs. However, I'm concerned that removing the rule name from the logs when the rule starts cause inconvenience. I'm used to searching for rule names in the logs when debugging. Using the example in the description, I would only find finished jobs. This makes it more difficult to determine when the rule started, and where to look for logs from a specific rule. This is particularly important when multiple rules are logging at the same time.

@euronion
Copy link
Contributor

euronion commented Oct 6, 2025

NB this could be mitigated by adding the rule name into the message. There is a placeholder for it as well:

rule my_cool_rule:
    message:
        "This is a cool message for {rule}."

gives

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job             count
------------  -------
my_cool_rule        1
total               1

Select jobs to execute...
Execute 1 jobs...

[Mon Oct  6 09:31:38 2025]
Job 0: This is a cool message for my_cool_rule.
Reason: Rules with neither input nor output files are always executed.
[Mon Oct  6 09:31:38 2025]
Finished jobid: 0 (Rule: my_cool_rule)

edit:
Or also dynamically, could just prepend it after all rules:

rule my_cool_rule:
    message:
        "This is a cool message for {rule}."

# Make sure not to use `rule` as a variable name
for name, r in rules._rules.items():
    print(r.rule.message)
    r.rule.message += " And this is an extra message for {rule}."

output:

Building DAG of jobs...
Job stats:
job             count
------------  -------
my_cool_rule        1
total               1


[Mon Oct  6 09:40:40 2025]
Job 0: This is a cool message for my_cool_rule. And this is an extra message for my_cool_rule.
Reason: Rules with neither input nor output files are always executed.
Job stats:
job             count
------------  -------
my_cool_rule        1
total               1

Reasons:
    (check individual jobs above for details)
    neither input nor output:
        my_cool_rule
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

@willu47
Copy link
Author

willu47 commented Oct 6, 2025

On the advice of @euronion, I've appended the rule names to the end of the messages. Here's what the job start and end points look like in the logs now:

[Mon Oct  6 09:59:23 2025]
Job 30: Retrieving synthetic electricity demand data (Rule: retrieve_synthetic_electricity_demand)
Reason: Missing output files: data/load_synthetic_raw.csv
[Mon Oct  6 09:59:28 2025]
Finished jobid: 30 (Rule: retrieve_synthetic_electricity_demand)
3 of 45 steps (7%) done

[Mon Oct  6 09:59:23 2025]
Job 38: Retrieving shipping raster data (Rule: retrieve_ship_raster)
Reason: Missing output files: data/shipdensity_global.zip
[Mon Oct  6 09:59:28 2025]
Finished jobid: 38 (Rule: retrieve_ship_raster)
4 of 45 steps (9%) done

[Mon Oct  6 09:59:28 2025]
Job 8: Retrieving Marine Regions data (Rule: retrieve_eez)
Reason: Missing output files: data/eez/World_EEZ_v12_20231025_LR/eez_v12_lowres.gpkg
[Mon Oct  6 09:59:28 2025]
Finished jobid: 8 (Rule: retrieve_eez)
5 of 45 steps (11%) done

@tgilon
Copy link
Collaborator

tgilon commented Oct 6, 2025

On the advice of @euronion, I've appended the rule names to the end of the messages. Here's what the job start and end points look like in the logs now:

That's definitely more practical for me, at least. Thank you, @willu47!

Copy link
Member

@lkstrp lkstrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willu47 !
I would rather vote for putting it at the front:

[Mon Oct  6 09:59:23 2025]
Job 30: retrieve_synthetic_electricity_demand - Retrieving synthetic electricity demand data.
Reason: Missing output files: data/load_synthetic_raw.csv

For some rules, the messages contain useful additional information (e.g. when wildcards are used). But for others, there is no added value (e.g. simplify_network vs "Simplifying network"). The trade off is fine for me, but I think the rule name, which is the most important information, should still be at the beginning

Snakefile Outdated

# Appends the name of the rule to each rule's message for easier identification in logs
for name, r in rules._rules.items():
if f"{name}" not in r.rule.message:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check isn't very robust for short rule names. If we follow the proposed format, the following might be better:

Suggested change
if f"{name}" not in r.rule.message:
if not r.rule.message.startswith(f"{rule} - ")

@euronion
Copy link
Contributor

euronion commented Oct 8, 2025

Thanks @willu47 ! I would rather vote for putting it at the front:

[Mon Oct  6 09:59:23 2025]
Job 30: retrieve_synthetic_electricity_demand - Retrieving synthetic electricity demand data.
Reason: Missing output files: data/load_synthetic_raw.csv

For some rules, the messages contain useful additional information (e.g. when wildcards are used). But for others, there is no added value (e.g. simplify_network vs "Simplifying network"). The trade off is fine for me, but I think the rule name, which is the most important information, should still be at the beginning

I think this is a valid point by @lkstrp . Overwriting the message also hides information not only about the wildcards, but also about the resources and outputs.
I'd prefer it if we could keep the standard output message from snakemake and with the additional messages you have prepared @willu47 .

Although looking at the code in snakemake it seems more like a bug that the additional information is hidden. Can I summon @coroa on this briefly?

@fneum
Copy link
Member

fneum commented Oct 8, 2025

I very much support the idea of descriptive messages in the console output!

At the same time, I agree with the concerns about rulename, wildcard, input, output, and resource information, which should not be lost.

Let's try to have both?

@coroa
Copy link
Member

coroa commented Oct 10, 2025

Although looking at the code in snakemake it seems more like a bug that the additional information is hidden. Can I summon @coroa on this briefly?

No, not a bug. It's just quite complicated, the logging system shuttles this extra struct down to finally:
https://github.com/snakemake/snakemake/blob/2eb9079dae29f271aeacff75e59f6c2f6a0d352d/src/snakemake/logging.py#L187-L201
where if a message is given, it overrides the default job info, that's sad, but not really a good way around it except for writing a custom logger plugin for snakemake. See https://github.com/snakemake/snakemake-interface-logger-plugins for details.

This actually sent me on an exploration spree which resulted in the following experiment: https://github.com/coroa/snakemake-logger-plugin-tui

You can install that with pip install -e https://github.com/coroa/snakemake-logger-plugin-tui in your pypsa-eur env.

And then trial run it with: snakemake --logger tui -c4. It is neat.
unfortunately it does not receive any information about the downloads, so while everything downloads it just looks stuck, error handling is also not there yet, and i think the approach is limited and it would be good to move the tui out of the logging handler actually. We can set up a quick session to discuss if someone is interested.

It integrates with the messages here, so feel free to use this branch.
image

@willu47
Copy link
Author

willu47 commented Oct 10, 2025

Thanks all for the comments. If I try to summarise the discussion so far:

  • We would like to see more descriptive messages for each of the PyPSA-Eur rules, so that it is easier to understand the purpose of each rule, particularly for new users. Some of the rule names are quite cryptic while others are straightforward.
  • [Power/all] users need to see information on inputs, outputs, rule names and other key rule parameters to rules. This information is by default printed to the console as well as to the log.
  • Using the message attribute overwrites all existing console output, essentially hiding the key metadata associated with a rule, including the rule name. This makes debugging harder.

I for one am surprised that snakemake does not seem to use a two tier logging structure out of the box. Normally INFO level messages are much less verbose than what snakemake prints out, and would correspond to a more user-friendly description level. A DEBUG level would then include more verbose input and output information corresponding to what a power user might want. Personally, I would also appreciate a less verbose INFO level print out to the console, and a more verbose DEBUG output streamed to the filesystem.

Way forward

Option 1

Incorporate all information needed for logging into the descriptive message. Messages can include wildcards (this is already implemented); and we could add a custom function reproducing the existing logging behaviour of snakemake, only adding the descriptive message.

Option 2

A custom logging plugin (TUI), enabling different logging levels, the redirection of verbose DEBUG messages to the filesystem and descriptive INFO level messages to the console.

Option 3

Leave it as it is! Close this pull request and save this conversation for posterity!

@euronion
Copy link
Contributor

Thanks all for the comments. If I try to summarise the discussion so far:

  • We would like to see more descriptive messages for each of the PyPSA-Eur rules, so that it is easier to understand the purpose of each rule, particularly for new users. Some of the rule names are quite cryptic while others are straightforward.
  • [Power/all] users need to see information on inputs, outputs, rule names and other key rule parameters to rules. This information is by default printed to the console as well as to the log.
  • Using the message attribute overwrites all existing console output, essentially hiding the key metadata associated with a rule, including the rule name. This makes debugging harder.

I for one am surprised that snakemake does not seem to use a two tier logging structure out of the box. Normally INFO level messages are much less verbose than what snakemake prints out, and would correspond to a more user-friendly description level. A DEBUG level would then include more verbose input and output information corresponding to what a power user might want. Personally, I would also appreciate a less verbose INFO level print out to the console, and a more verbose DEBUG output streamed to the filesystem.

Way forward

Option 1

Incorporate all information needed for logging into the descriptive message. Messages can include wildcards (this is already implemented); and we could add a custom function reproducing the existing logging behaviour of snakemake, only adding the descriptive message.

Option 2

A custom logging plugin (TUI), enabling different logging levels, the redirection of verbose DEBUG messages to the filesystem and descriptive INFO level messages to the console.

Option 3

Leave it as it is! Close this pull request and save this conversation for posterity!

I would favour Option 1: Reproducing the original snakemake messages should not be to difficult, since all placeholders like {resources}, {input}, {output}, ... work in a rule's message.

coroa added 2 commits October 10, 2025 18:04
The logger plugin registers a derived class of snakemake's DefaultFormatter
which shows the bold message in front of the regular job info block.
@coroa
Copy link
Member

coroa commented Oct 10, 2025

The minimal variant of Option 2, is not difficult, demonstration in #1855 .

@willu47
Copy link
Author

willu47 commented Oct 22, 2025

@coroa I have merged/cherry-picked your commits implementing the plugin logger and then renamed it to PyPSA. You can close #1855 without merging. I noticed that the log file being produced does not contain sufficient information for debugging - but I think that is a separate issue. For now, I believe the PR is ready for final review and merge.

Copy link
Member

@lkstrp lkstrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it !
The profile and plugin setup is great. I could add another plugin to that structure soon and was searching for a good solution.

Thanks everyone! Some minor comments and then good to go, if everyone is happy

pass


class PypsaFormatter(DefaultFormatter):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Chief of Branding, I suffer from reading Pypsa, but I'll be fine 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See what you mean, but you're strong.

@coroa
Copy link
Member

coroa commented Oct 27, 2025

Glad, you like it. It will be a small nightmare or maybe a good claude job to align with #1869 .

Co-authored-by: Lukas Trippe <[email protected]>
@coroa
Copy link
Member

coroa commented Oct 27, 2025

@coroa I have merged/cherry-picked your commits implementing the plugin logger and then renamed it to PyPSA. You can close #1855 without merging. I noticed that the log file being produced does not contain sufficient information for debugging - but I think that is a separate issue. For now, I believe the PR is ready for final review and merge.

Thanks for that @willu47 . It looks good, i would say. Yes, it is a known issue that the logfile does not contain all relevant output, because snakemake.log only contains logging output from the main snakemake executor not the launched job subprocesses. It's a thing that has been long on my list to try to improve the job log output. But agreed not something for this PR.

I would like to hold off with merging this one until we get the retrieval updates in.

@willu47 willu47 mentioned this pull request Nov 12, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use snakemake message: attribute to document rules

6 participants