Skip to content

Conversation

@mskiptr
Copy link

@mskiptr mskiptr commented Oct 23, 2025

Hello. I've been pleasantly surprised to discover that nbconvert can now clear metadata fields. That makes for much nicer diffs when you're trying to revision-control your notebooks.

But then I soon noticed that not all of the metadata is actually being removed. More specifically, markdown cells seemed to retain their ids at least. So today, after looking through the source code and doing some Git archaeology, (I think) I've finally figured out why.

When the ClearMetadataPreprocessor was added in #805, it was based on the ClearOutputPreprocessor which too performs some metadata transformations. Removing cell output only makes sense for code cells, and so there is a a check for that. But the same check is now done for general metadata clearing, which leads to this unexpected behavior.

While trying to fix it, I also ended up making a patch that clarifies some of the documentation present in that file. And then I recalled that at first it wasn't really clear to me how I should go about enabling the ClearMetadataPreprocessor, so I also wrote a patch to add a --clear-metadata command line flag that works just like --clear-output

I marked this PR as a draft, because:

  • I still need to test this truly works correctly I have now built and tested it (in a guix shell jupyter --with-patch=python-nbconvert=metadata.diff)
  • I have not extended the unit tests to cover this (help appreciated)
  • the way ClearOutputPreprocessor.remove_metadata_fields and ClearMetadataPreprocessor.preserve_cell_metadata_mask interact is rather confusing, so I would like to fix that
  • maybe there's something I'm missing and we shouldn't actually clear metadata for certain cells

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant