-
Notifications
You must be signed in to change notification settings - Fork 16
Docrepobib #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docrepobib #167
Conversation
this is done to allow a more direct form of addressing than always having to go via ADS
this is done to allow some more automated processing of the bib entries, and possibly in the future if using biblatex for processing then using a cite alias that points to the most current version.
msdemlei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea; thanks for contributing this.
I was a bit skeptical about introducing an extra dependency, bibtexparser, but it's in Debian, and it's only necessary for people running make docrepo.bib occasionally, and that's people who shouldn't be concerned about another installed package. So, for me that's fine.
However, I'd request a few changes.
(1) A minor one is be PEP8-compliant with function arguments, i.e., write func(a, b) rather than func(a,b).
(2) Then I'd prefer if the functions and functionality from annotate_docrepo_bib.py were merged into fetch_from_ads.py. I'm trying to keep the number for files in ivoatex low, and the annotation has to reliably run after downloading anyway, so I don't think there is much to be won by the splitup.
(3) The thing I like least, though, is the request orgy at the end of the new fetch_from_ads.py. Running one request per IVOA document to obtain the document URL seems to high a price to pay. So, can you perhaps check the ADS API docs to see if we can get the document URL with the ADS-provided BibTeX, and if not, contact them and ask if they'd put in a hack for us to let us retrieve BibTeX with URLs to begin with?
(4) Also, I don't think we should check all URLs all the time; attempting an update of the bibliography should be quick and lightweight. I'd not argue against a flag that enables link checking, to be run perhaps once a year, but it shouldn't be the default behaviour.
Thanks!
One of the reasons for the split was related to the points that you make below, and now I might be tempted to do the opposite and put the URL discovery into
I did have a fairly extensive look at their API documentation and did not see any better way to do this - however, if you think about the bigger picture - this script is run infrequently and what it does is reduce the number of times that effectively the same API call is made by the hopefully much more frequent occurrence of someone currently having to go via the ADS landing page when clicking on the only reference that they have from the IVOA document. As far as I can tell the request orgy will not become a problem until there are 5000.) IVOA documents
|
|
I will make some changes, but probably not until after the Interop |
|
What I want out of this whole exercise is a mapping between bibcode -> DOCNAME (as defined in the top level Makefile of a document prepared with ivoatex) |
|
On Tue, Nov 11, 2025 at 04:46:04AM -0800, Paul Harrison wrote:
pahjbo left a comment (ivoa-std/ivoatex#167)
What I want out of this whole exercise is a mapping between bibcode
-> DOCNAME (as defined in the top level Makefile of a document
prepared with ivoatex)
The stuff that does this is in
https://github.com/ivoa/docrepoToADS
In there, it would be easy to build the mapping you are looking for,
and it could be either pushed into ivoatex or, probably preferably,
just be put to some bespoke URI in the docrepo.
Actually, I've just implemented that and for testing, put the
resulting document to <https://docs.g-vo.org/bibcode_mapping.json>.
Can you work with that? We'd still have to establish a workflow to
keep this document up to date in the docrepo (but perhaps that will
be easily done by the new docrepo infrastructure that Kalyani is
building), but I feel this is doable, in particular because there are
globally only two people who run docrepoToADS (perhaps regrettably).
|
that is essentially the mapping - though I would really like the docname case to match what is used in both the GitHub project name and the part of the URL where the standard is published to. And I would still like to get this into the bib file as keys similarly to how I have done in this PR, as then some standard bibliography processing can know about it too. |
|
On Wed, Nov 12, 2025 at 04:21:36AM -0800, Paul Harrison wrote:
pahjbo left a comment (ivoa-std/ivoatex#167)
that is essentially the mapping - though I would really like the
docname case to match what is used in both the GitHub project name
and the part of the URL where the standard is published to. And I
The reason that I'm lowercasing things is that the docrepo treats the
paths case-insensitively, and hence there is no guarantee that there
is consistent capitalisation. I'm now not lowercasing any more;
could you have a close look at the result to see whether anything is
wrong now?
would still like to get this into the bib file as keys similarly to
how I have done in this PR, as then some standard bibliography
processing can know about it too.
Yeah, the idea has been that rather than doing n requests to ADS you
would in the future just do one request to the doc repo (or, until
we've moved things there, doc.g-vo.org) to add these tags. Wouldn't
this just work? Or should I invert the mapping to make it go from
bibcode to short name?
|
that is better - there are still some names that have the full file name though. Before you spend too much more time on that, I did have a plan for altering my original script so that it only did the ADS call for new bibkeys
If I alter the script so that the ADS call only happens for new bibkeys then the storm of calls has already happened. I think that it is important not to completely overwrite the existing bib file indiscriminately as we can then retain the ability to make manual adjustments to override where some of the heuristics in these scripts have come up with the wrong answers. As you are OK with the use of the bibtexparser package that is feasible in the |
|
On Wed, Nov 12, 2025 at 06:33:42AM -0800, Paul Harrison wrote:
pahjbo left a comment (ivoa-std/ivoatex#167)
that is better - there are still some names that have the full file
name though. Before you spend too much more time on that, I did
have a plan for altering my original script so that it only did the
ADS call for new bibkeys
Ok, that sounds reasonable.
scripts have come up with the wrong answers. As you are OK with the
use of the bibtexparser package that is feasible in the
`fetch_from_ads.py` script which can then do this all in one pass.
Let's do it like this, then. I'll retain the code in docrepoToADS,
though. It may become useful one of these days.
|
* only add new entries to docrepo.bib * report on any differences between ADS and docrepo.bib
|
I have combined all the functionality into the one script now, and it will only do the full URL lookup on new entries |
msdemlei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a reasonable addition -- thanks!
The one thing we need to take some stance on is that document rebuilds at a later time (think: errata applied) will yield different references when people do "cite most recent" citations. This doesn't feel too good to me, but not bad enough to reject what seems a rather reasonable thing to do.
|
On Mon, Nov 17, 2025 at 05:32:36AM -0800, Paul Harrison wrote:
pahjbo left a comment (ivoa-std/ivoatex#167)
I have combined all the functionality into the one script now, and
it will only do the full URL lookup on new entries
Thanks -- that's merged in now. Paul, would you document the new
features in ivoatexDoc?
I think just mentioning it in 3.3.1 would basically be enough,
perhaps together with a few words on when people should cite
version-sharp when when cite-latest is the right strategy.
To mitigate my concerns about changing bibliographies on erratum
rebuilds, an obvious solution would be to check out the version of
ivoatex that was in use at the original REC build. Do you have
thoughts on how one would sensibly do this? Ideally, we'd record
this in the tag created; perhaps we should just include the ivoatex
SHA in the make tag message?
|
I will update the document - however, I think that the cite-latest functionality will only work if biblatex is used rather than BibTeX and that has not yet been introduced into the build process - what I have done so far is only being exercised in https://github.com/ivoa/IvoaDocViewSite
that seems like a good idea anyway for reproducibility |
This update adds some more tools for managing the docrepo.bib file, and to some extent it is going in a different direction to that implied by
suggest-bibupgrade.py.firstly,
fetch_from_ads.pyadds a URL field that points to the IVOA document landing page - this done with the aim of potentially being able to reduce the number of clicks in references - currently it is often 4 to get from the text of one standard to the text of another. - it shows a problem with2017ivoa.spec.0517Bthe SODA landing page is missing.secondly,
annotate_docrepo_bib.pyadds in version keys and adds anidsfield that biblatex can use as a citation key alias - this would allow the intention of pointing at the latest version by usingivoadoc:shortnameand the citation keyI don't think that this breaks anything current, as it is just adding more information to docrepo.bib