Docrepobib #167

pahjbo · 2025-11-04T10:53:28Z

This update adds some more tools for managing the docrepo.bib file, and to some extent it is going in a different direction to that implied by suggest-bibupgrade.py.

firstly, fetch_from_ads.py adds a URL field that points to the IVOA document landing page - this done with the aim of potentially being able to reduce the number of clicks in references - currently it is often 4 to get from the text of one standard to the text of another. - it shows a problem with 2017ivoa.spec.0517B the SODA landing page is missing.
secondly, annotate_docrepo_bib.py adds in version keys and adds an ids field that biblatex can use as a citation key alias - this would allow the intention of pointing at the latest version by using ivoadoc:shortname and the citation key

I don't think that this breaks anything current, as it is just adding more information to docrepo.bib

this is done to allow a more direct form of addressing than always having to go via ADS

this is done to allow some more automated processing of the bib entries, and possibly in the future if using biblatex for processing then using a cite alias that points to the most current version.

msdemlei

I think this is a good idea; thanks for contributing this.

I was a bit skeptical about introducing an extra dependency, bibtexparser, but it's in Debian, and it's only necessary for people running make docrepo.bib occasionally, and that's people who shouldn't be concerned about another installed package. So, for me that's fine.

However, I'd request a few changes.

(1) A minor one is be PEP8-compliant with function arguments, i.e., write func(a, b) rather than func(a,b).

(2) Then I'd prefer if the functions and functionality from annotate_docrepo_bib.py were merged into fetch_from_ads.py. I'm trying to keep the number for files in ivoatex low, and the annotation has to reliably run after downloading anyway, so I don't think there is much to be won by the splitup.

(3) The thing I like least, though, is the request orgy at the end of the new fetch_from_ads.py. Running one request per IVOA document to obtain the document URL seems to high a price to pay. So, can you perhaps check the ADS API docs to see if we can get the document URL with the ADS-provided BibTeX, and if not, contact them and ask if they'd put in a hack for us to let us retrieve BibTeX with URLs to begin with?

(4) Also, I don't think we should check all URLs all the time; attempting an update of the bibliography should be quick and lightweight. I'd not argue against a flag that enables link checking, to be run perhaps once a year, but it shouldn't be the default behaviour.

Thanks!

pahjbo · 2025-11-10T11:28:46Z

(2) Then I'd prefer if the functions and functionality from annotate_docrepo_bib.py were merged into fetch_from_ads.py. I'm trying to keep the number for files in ivoatex low, and the annotation has to reliably run after downloading anyway, so I don't think there is much to be won by the splitup.

One of the reasons for the split was related to the points that you make below, and now I might be tempted to do the opposite and put the URL discovery into annotate_docrepo_bib.py given that docrepo.bib from this PR already contains most of these URLs - I think that the the ideal workflow might be to

run what would be the original fetch_from_ads.py
run git mergetool on docrepo.bib and selectively edit in any new entries, but leave existing ones with the extra annotation (an alternative would be to do the parsing of the ADS response with bibtexparser and do the merge automatically - I slightly worry that might miss other important updates though)
then run a modified annotate_docrepo_bib.py that accepted bibkeys to annotate as arguments.

(3) The thing I like least, though, is the request orgy at the end of the new fetch_from_ads.py. Running one request per IVOA document to obtain the document URL seems to high a price to pay. So, can you perhaps check the ADS API docs to see if we can get the document URL with the ADS-provided BibTeX, and if not, contact them and ask if they'd put in a hack for us to let us retrieve BibTeX with URLs to begin with?

I did have a fairly extensive look at their API documentation and did not see any better way to do this - however, if you think about the bigger picture - this script is run infrequently and what it does is reduce the number of times that effectively the same API call is made by the hopefully much more frequent occurrence of someone currently having to go via the ADS landing page when clicking on the only reference that they have from the IVOA document.

As far as I can tell the request orgy will not become a problem until there are 5000.) IVOA documents

(4) Also, I don't think we should check all URLs all the time; attempting an update of the bibliography should be quick and lightweight. I'd not argue against a flag that enables link checking, to be run perhaps once a year, but it shouldn't be the default behaviour.

pahjbo · 2025-11-10T11:42:12Z

I will make some changes, but probably not until after the Interop

pahjbo · 2025-11-11T12:45:34Z

What I want out of this whole exercise is a mapping between bibcode -> DOCNAME (as defined in the top level Makefile of a document prepared with ivoatex)

msdemlei · 2025-11-11T14:14:49Z

On Tue, Nov 11, 2025 at 04:46:04AM -0800, Paul Harrison wrote: pahjbo left a comment (ivoa-std/ivoatex#167) What I want out of this whole exercise is a mapping between bibcode -> DOCNAME (as defined in the top level Makefile of a document prepared with ivoatex)

The stuff that does this is in https://github.com/ivoa/docrepoToADS In there, it would be easy to build the mapping you are looking for, and it could be either pushed into ivoatex or, probably preferably, just be put to some bespoke URI in the docrepo. Actually, I've just implemented that and for testing, put the resulting document to <https://docs.g-vo.org/bibcode_mapping.json>. Can you work with that? We'd still have to establish a workflow to keep this document up to date in the docrepo (but perhaps that will be easily done by the new docrepo infrastructure that Kalyani is building), but I feel this is doable, in particular because there are globally only two people who run docrepoToADS (perhaps regrettably).

pahjbo · 2025-11-12T12:19:46Z

Actually, I've just implemented that and for testing, put the resulting document to https://docs.g-vo.org/bibcode_mapping.json. Can you work with that? We'd still have to establish a workflow to keep this document up to date in the docrepo (but perhaps that will be easily done by the new docrepo infrastructure that Kalyani is building), but I feel this is doable, in particular because there are globally only two people who run docrepoToADS (perhaps regrettably).

that is essentially the mapping - though I would really like the docname case to match what is used in both the GitHub project name and the part of the URL where the standard is published to. And I would still like to get this into the bib file as keys similarly to how I have done in this PR, as then some standard bibliography processing can know about it too.

msdemlei · 2025-11-12T13:30:01Z

On Wed, Nov 12, 2025 at 04:21:36AM -0800, Paul Harrison wrote: pahjbo left a comment (ivoa-std/ivoatex#167) that is essentially the mapping - though I would really like the docname case to match what is used in both the GitHub project name and the part of the URL where the standard is published to. And I

The reason that I'm lowercasing things is that the docrepo treats the paths case-insensitively, and hence there is no guarantee that there is consistent capitalisation. I'm now not lowercasing any more; could you have a close look at the result to see whether anything is wrong now?

would still like to get this into the bib file as keys similarly to how I have done in this PR, as then some standard bibliography processing can know about it too.

Yeah, the idea has been that rather than doing n requests to ADS you would in the future just do one request to the doc repo (or, until we've moved things there, doc.g-vo.org) to add these tags. Wouldn't this just work? Or should I invert the mapping to make it go from bibcode to short name?

pahjbo · 2025-11-12T14:17:35Z

The reason that I'm lowercasing things is that the docrepo treats the paths case-insensitively, and hence there is no guarantee that there is consistent capitalisation. I'm now not lowercasing any more; could you have a close look at the result to see whether anything is wrong now?

that is better - there are still some names that have the full file name though. Before you spend too much more time on that, I did have a plan for altering my original script so that it only did the ADS call for new bibkeys

would still like to get this into the bib file as keys similarly to how I have done in this PR, as then some standard bibliography processing can know about it too.
Yeah, the idea has been that rather than doing n requests to ADS you would in the future just do one request to the doc repo (or, until we've moved things there, doc.g-vo.org) to add these tags. Wouldn't this just work? Or should I invert the mapping to make it go from bibcode to short name?

If I alter the script so that the ADS call only happens for new bibkeys then the storm of calls has already happened. I think that it is important not to completely overwrite the existing bib file indiscriminately as we can then retain the ability to make manual adjustments to override where some of the heuristics in these scripts have come up with the wrong answers. As you are OK with the use of the bibtexparser package that is feasible in the fetch_from_ads.py script which can then do this all in one pass.

msdemlei · 2025-11-12T15:38:21Z

On Wed, Nov 12, 2025 at 06:33:42AM -0800, Paul Harrison wrote: pahjbo left a comment (ivoa-std/ivoatex#167) that is better - there are still some names that have the full file name though. Before you spend too much more time on that, I did have a plan for altering my original script so that it only did the ADS call for new bibkeys

Ok, that sounds reasonable.

scripts have come up with the wrong answers. As you are OK with the use of the bibtexparser package that is feasible in the `fetch_from_ads.py` script which can then do this all in one pass.

Let's do it like this, then. I'll retain the code in docrepoToADS, though. It may become useful one of these days.

* only add new entries to docrepo.bib * report on any differences between ADS and docrepo.bib

pahjbo · 2025-11-17T13:32:13Z

I have combined all the functionality into the one script now, and it will only do the full URL lookup on new entries

msdemlei

Looks like a reasonable addition -- thanks!

The one thing we need to take some stance on is that document rebuilds at a later time (think: errata applied) will yield different references when people do "cite most recent" citations. This doesn't feel too good to me, but not bad enough to reject what seems a rather reasonable thing to do.

msdemlei · 2025-11-18T10:27:03Z

On Mon, Nov 17, 2025 at 05:32:36AM -0800, Paul Harrison wrote: pahjbo left a comment (ivoa-std/ivoatex#167) I have combined all the functionality into the one script now, and it will only do the full URL lookup on new entries

Thanks -- that's merged in now. Paul, would you document the new features in ivoatexDoc? I think just mentioning it in 3.3.1 would basically be enough, perhaps together with a few words on when people should cite version-sharp when when cite-latest is the right strategy. To mitigate my concerns about changing bibliographies on erratum rebuilds, an obvious solution would be to check out the version of ivoatex that was in use at the original REC build. Do you have thoughts on how one would sensibly do this? Ideally, we'd record this in the tag created; perhaps we should just include the ivoatex SHA in the make tag message?

pahjbo · 2025-11-19T08:46:14Z

Thanks -- that's merged in now. Paul, would you document the new features in ivoatexDoc? I think just mentioning it in 3.3.1 would basically be enough, perhaps together with a few words on when people should cite version-sharp when when cite-latest is the right strategy.

I will update the document - however, I think that the cite-latest functionality will only work if biblatex is used rather than BibTeX and that has not yet been introduced into the build process - what I have done so far is only being exercised in https://github.com/ivoa/IvoaDocViewSite

To mitigate my concerns about changing bibliographies on erratum rebuilds, an obvious solution would be to check out the version of ivoatex that was in use at the original REC build. Do you have thoughts on how one would sensibly do this? Ideally, we'd record this in the tag created; perhaps we should just include the ivoatex SHA in the make tag message?

that seems like a good idea anyway for reproducibility

pahjbo added 2 commits November 4, 2025 09:07

update the fetch script to add in the url to the IVOA documentation site

2ea0c6e

this is done to allow a more direct form of addressing than always having to go via ADS

infer the docName and the latest version

c8a8dea

this is done to allow some more automated processing of the bib entries, and possibly in the future if using biblatex for processing then using a cite alias that points to the most current version.

msdemlei requested changes Nov 5, 2025

View reviewed changes

combine the update functionality into one script

5d26570

* only add new entries to docrepo.bib * report on any differences between ADS and docrepo.bib

msdemlei approved these changes Nov 18, 2025

View reviewed changes

msdemlei merged commit c5c7e22 into ivoa-std:master Nov 18, 2025

Docrepobib #167

Docrepobib #167

Conversation

pahjbo commented Nov 4, 2025

Uh oh!

msdemlei left a comment

Choose a reason for hiding this comment

Uh oh!

pahjbo commented Nov 10, 2025

Uh oh!

pahjbo commented Nov 10, 2025

Uh oh!

pahjbo commented Nov 11, 2025

Uh oh!

msdemlei commented Nov 11, 2025 via email

Uh oh!

pahjbo commented Nov 12, 2025

Uh oh!

msdemlei commented Nov 12, 2025 via email

Uh oh!

pahjbo commented Nov 12, 2025

Uh oh!

msdemlei commented Nov 12, 2025 via email

Uh oh!

pahjbo commented Nov 17, 2025

Uh oh!

msdemlei left a comment

Choose a reason for hiding this comment

Uh oh!

msdemlei commented Nov 18, 2025 via email

Uh oh!

pahjbo commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants