-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Web resources are currently downloaded to rpath which is constructed by combining a unique id (if requested) and the file name extracted from the url. However, some url dont include a filename e.g.
src = 'https://pubchem.ncbi.nlm.nih.gov/sdq/sdqagent.cgi?infmt=json&outfmt=csv&query={%22download%22:%22*%22,%22collection%22:%22pathway%22,%22order%22:[%22relevancescore,desc%22],%22start%22:1,%22limit%22:10000000,%22downloadfilename%22:%22PubChem_pathway_text_Reactome%22,%22where%22:{%22ands%22:[{%22*%22:%22Reactome%22},{%22source%22:%22Reactome%22}]}}'
In this case the url contains json, so I think the download fails as the filename generated for rpath isnt valid. However, any url that doesn't have a filename at the end but returns a file could end up with an unwieldy filename in the cache folder.
I tried to overcome this using bfcupdate to change rpath before downloading, but it fails because bfcupdate changes the rtype to "local".
One option would be to include an input in bfcadd that allows the user to override the default filename for rpath e.g. rpath_filename = "new_filename.xyz" and construct rpath from that instead of trying to extract it from the url.
Or you could try to extract the intended filename from the httr:GET response, if there is one.
Is there a work around for this that doesnt need an update to BiocFileCache?