Skip to content

Conversation

@pahjbo
Copy link
Member

@pahjbo pahjbo commented Nov 4, 2025

This update adds some more tools for managing the docrepo.bib file, and to some extent it is going in a different direction to that implied by suggest-bibupgrade.py.

  • firstly, fetch_from_ads.py adds a URL field that points to the IVOA document landing page - this done with the aim of potentially being able to reduce the number of clicks in references - currently it is often 4 to get from the text of one standard to the text of another. - it shows a problem with 2017ivoa.spec.0517B the SODA landing page is missing.

  • secondly, annotate_docrepo_bib.py adds in version keys and adds an ids field that biblatex can use as a citation key alias - this would allow the intention of pointing at the latest version by using ivoadoc:shortname and the citation key

I don't think that this breaks anything current, as it is just adding more information to docrepo.bib

this is done to allow a more direct form of addressing than always having to go via ADS
this is done to allow some more automated processing
of the bib entries, and possibly in the future if
using biblatex for processing then using a cite alias
that points to the most current version.
Copy link
Collaborator

@msdemlei msdemlei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea; thanks for contributing this.

I was a bit skeptical about introducing an extra dependency, bibtexparser, but it's in Debian, and it's only necessary for people running make docrepo.bib occasionally, and that's people who shouldn't be concerned about another installed package. So, for me that's fine.

However, I'd request a few changes.

(1) A minor one is be PEP8-compliant with function arguments, i.e., write func(a, b) rather than func(a,b).

(2) Then I'd prefer if the functions and functionality from annotate_docrepo_bib.py were merged into fetch_from_ads.py. I'm trying to keep the number for files in ivoatex low, and the annotation has to reliably run after downloading anyway, so I don't think there is much to be won by the splitup.

(3) The thing I like least, though, is the request orgy at the end of the new fetch_from_ads.py. Running one request per IVOA document to obtain the document URL seems to high a price to pay. So, can you perhaps check the ADS API docs to see if we can get the document URL with the ADS-provided BibTeX, and if not, contact them and ask if they'd put in a hack for us to let us retrieve BibTeX with URLs to begin with?

(4) Also, I don't think we should check all URLs all the time; attempting an update of the bibliography should be quick and lightweight. I'd not argue against a flag that enables link checking, to be run perhaps once a year, but it shouldn't be the default behaviour.

Thanks!

@pahjbo
Copy link
Member Author

pahjbo commented Nov 10, 2025

(2) Then I'd prefer if the functions and functionality from annotate_docrepo_bib.py were merged into fetch_from_ads.py. I'm trying to keep the number for files in ivoatex low, and the annotation has to reliably run after downloading anyway, so I don't think there is much to be won by the splitup.

One of the reasons for the split was related to the points that you make below, and now I might be tempted to do the opposite and put the URL discovery into annotate_docrepo_bib.py given that docrepo.bib from this PR already contains most of these URLs - I think that the the ideal workflow might be to

  • run what would be the original fetch_from_ads.py
  • run git mergetool on docrepo.bib and selectively edit in any new entries, but leave existing ones with the extra annotation (an alternative would be to do the parsing of the ADS response with bibtexparser and do the merge automatically - I slightly worry that might miss other important updates though)
  • then run a modified annotate_docrepo_bib.py that accepted bibkeys to annotate as arguments.

(3) The thing I like least, though, is the request orgy at the end of the new fetch_from_ads.py. Running one request per IVOA document to obtain the document URL seems to high a price to pay. So, can you perhaps check the ADS API docs to see if we can get the document URL with the ADS-provided BibTeX, and if not, contact them and ask if they'd put in a hack for us to let us retrieve BibTeX with URLs to begin with?

I did have a fairly extensive look at their API documentation and did not see any better way to do this - however, if you think about the bigger picture - this script is run infrequently and what it does is reduce the number of times that effectively the same API call is made by the hopefully much more frequent occurrence of someone currently having to go via the ADS landing page when clicking on the only reference that they have from the IVOA document.

As far as I can tell the request orgy will not become a problem until there are 5000.) IVOA documents

(4) Also, I don't think we should check all URLs all the time; attempting an update of the bibliography should be quick and lightweight. I'd not argue against a flag that enables link checking, to be run perhaps once a year, but it shouldn't be the default behaviour.

@pahjbo
Copy link
Member Author

pahjbo commented Nov 10, 2025

I will make some changes, but probably not until after the Interop

@pahjbo
Copy link
Member Author

pahjbo commented Nov 11, 2025

What I want out of this whole exercise is a mapping between bibcode -> DOCNAME (as defined in the top level Makefile of a document prepared with ivoatex)

@msdemlei
Copy link
Collaborator

msdemlei commented Nov 11, 2025 via email

@pahjbo
Copy link
Member Author

pahjbo commented Nov 12, 2025

Actually, I've just implemented that and for testing, put the resulting document to https://docs.g-vo.org/bibcode_mapping.json. Can you work with that? We'd still have to establish a workflow to keep this document up to date in the docrepo (but perhaps that will be easily done by the new docrepo infrastructure that Kalyani is building), but I feel this is doable, in particular because there are globally only two people who run docrepoToADS (perhaps regrettably).

that is essentially the mapping - though I would really like the docname case to match what is used in both the GitHub project name and the part of the URL where the standard is published to. And I would still like to get this into the bib file as keys similarly to how I have done in this PR, as then some standard bibliography processing can know about it too.

@msdemlei
Copy link
Collaborator

msdemlei commented Nov 12, 2025 via email

@pahjbo
Copy link
Member Author

pahjbo commented Nov 12, 2025

The reason that I'm lowercasing things is that the docrepo treats the paths case-insensitively, and hence there is no guarantee that there is consistent capitalisation. I'm now not lowercasing any more; could you have a close look at the result to see whether anything is wrong now?

that is better - there are still some names that have the full file name though. Before you spend too much more time on that, I did have a plan for altering my original script so that it only did the ADS call for new bibkeys

would still like to get this into the bib file as keys similarly to how I have done in this PR, as then some standard bibliography processing can know about it too.
Yeah, the idea has been that rather than doing n requests to ADS you would in the future just do one request to the doc repo (or, until we've moved things there, doc.g-vo.org) to add these tags. Wouldn't this just work? Or should I invert the mapping to make it go from bibcode to short name?

If I alter the script so that the ADS call only happens for new bibkeys then the storm of calls has already happened. I think that it is important not to completely overwrite the existing bib file indiscriminately as we can then retain the ability to make manual adjustments to override where some of the heuristics in these scripts have come up with the wrong answers. As you are OK with the use of the bibtexparser package that is feasible in the fetch_from_ads.py script which can then do this all in one pass.

@msdemlei
Copy link
Collaborator

msdemlei commented Nov 12, 2025 via email

* only add new entries to docrepo.bib
* report on any differences between ADS and docrepo.bib
@pahjbo
Copy link
Member Author

pahjbo commented Nov 17, 2025

I have combined all the functionality into the one script now, and it will only do the full URL lookup on new entries

Copy link
Collaborator

@msdemlei msdemlei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a reasonable addition -- thanks!

The one thing we need to take some stance on is that document rebuilds at a later time (think: errata applied) will yield different references when people do "cite most recent" citations. This doesn't feel too good to me, but not bad enough to reject what seems a rather reasonable thing to do.

@msdemlei msdemlei merged commit c5c7e22 into ivoa-std:master Nov 18, 2025
@msdemlei
Copy link
Collaborator

msdemlei commented Nov 18, 2025 via email

@pahjbo
Copy link
Member Author

pahjbo commented Nov 19, 2025

Thanks -- that's merged in now. Paul, would you document the new features in ivoatexDoc? I think just mentioning it in 3.3.1 would basically be enough, perhaps together with a few words on when people should cite version-sharp when when cite-latest is the right strategy.

I will update the document - however, I think that the cite-latest functionality will only work if biblatex is used rather than BibTeX and that has not yet been introduced into the build process - what I have done so far is only being exercised in https://github.com/ivoa/IvoaDocViewSite

To mitigate my concerns about changing bibliographies on erratum rebuilds, an obvious solution would be to check out the version of ivoatex that was in use at the original REC build. Do you have thoughts on how one would sensibly do this? Ideally, we'd record this in the tag created; perhaps we should just include the ivoatex SHA in the make tag message?

that seems like a good idea anyway for reproducibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants