Skip to content

Measure the time of update operations separately #125

@szarnyasg

Description

@szarnyasg

For the interest of analyzing the results, it would interesting to measure the time spent on update operations. E.g. for Umbra this would entail measuring the time spent on 22 operations: 14 inserts and 8 deletes:

insert_nodes = ["Comment", "Forum", "Person", "Post"]
insert_edges = ["Comment_hasTag_Tag", "Forum_hasMember_Person", "Forum_hasTag_Tag", "Person_hasInterest_Tag", "Person_knows_Person", "Person_likes_Comment", "Person_likes_Post", "Person_studyAt_University", "Person_workAt_Company",  "Post_hasTag_Tag"]

delete_nodes = ["Comment", "Post", "Forum", "Person"]
delete_edges = ["Forum_hasMember_Person", "Person_knows_Person", "Person_likes_Comment", "Person_likes_Post"]

It's rather impractical to save the parameters of the entities to the timings file (it would grow massive with billions of operations on SF10,000), so a good schema for storing the results in timings.csv would be:

tool|sf|day|batch_type|q|parameters|time
Umbra|10|2012-12-26|power|insert|Person_knows_Person|123.45
Umbra|10|2012-12-26|power|delete|Forum|67.89

This would allow to see what the decomposition of the writes is (inserts, deletes, precomputations).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions