-
Notifications
You must be signed in to change notification settings - Fork 8
WIP: New hashing schema #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@abulte I would need a review here. I left several comments in the code after implementing the double cache feature. A work around would be to change the endpoint: This would work but would it make the URL hash useless and not relevant? |
abulte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About upload view: I think we can rely on file hash only in this case, and store a url hash = None.
| else: | ||
| raise RuntimeError('Func get_db_info need at least one not none argument') | ||
|
|
||
| res = c.fetchone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code below can probably be made cleaner/shorter by getting column values in a dict instead of a list eg res[‘uuid’].
| logger=app.logger, | ||
| sniff_limit=app.config.get('CSV_SNIFF_LIMIT'), | ||
| max_file_size=app.config.get('MAX_FILE_SIZE') | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation
abulte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests :-)
| encoding = detect_encoding(filepath) if not encoding else encoding | ||
| table = from_csv(filepath, encoding=encoding, sniff_limit=sniff_limit) | ||
| return to_sql(table, urlhash, storage) | ||
| return to_sql(table, urlhash, filehash, storage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing line at EOF
| filehash = X.hexdigest() | ||
| logger.debug('* Downloaded %s', filehash) | ||
| if not is_hash_relevant(urlhash, filehash): | ||
| print("HASH IS NOT RELEVANT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
| except Exception as e: | ||
| raise APIError('Error parsing CSV: %s' % e) | ||
| else: | ||
| print("HASH IS RELEVANT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
| raise APIError('Error parsing CSV: %s' % e) | ||
| else: | ||
| app.logger.info(f"{urlhash}.db already exists, skipping parse.") | ||
| print("AGE IS OK") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
No description provided.