-
Notifications
You must be signed in to change notification settings - Fork 4
Replace tile metadata files with layouts database #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: TASM-interface
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great! Feel free to use/ignore any of the style/naming comments, and then let me know if you have any questions about the transaction or prepared statement stuff.
I think we can also probably delete the TileConfiguration.proto file at this point.
tasm/tiles/include/LayoutDatabase.h
Outdated
|
|
||
| class LayoutDatabase { | ||
| public: | ||
| static std::shared_ptr<LayoutDatabase> &instance() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is a pointer, we don't need to return a reference. However, if we want to enforce that callers of this can't change it, we can add some const annotations.
Option 1: return a const reference const LayoutDatabase&
Option 2: return a shared_ptr to a constant object: std::shared_ptr<const LayoutDatabase>
Option 3: return a constant shared_ptr to a non-constant object: const std::shared_ptr<LayoutDatabase>
Option 4: return a constant shared_ptr to a constant object: const std::shared_ptr<const LayoutDatabase>
Option 4 means that a caller can't re-assign the pointer, and they also can't modify the pointed-to object.
I haven't been good about putting const in or around pointers in the rest of the project, so I don't feel super strongly about having to add them here.
tasm/tiles/include/LayoutDatabase.h
Outdated
| class LayoutDatabase { | ||
| public: | ||
| static std::shared_ptr<LayoutDatabase> &instance() { | ||
| if (instance_ == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be shortened to if (!instance_)
tasm/tiles/include/LayoutDatabase.h
Outdated
| public: | ||
| static std::shared_ptr<LayoutDatabase> &instance() { | ||
| if (instance_ == nullptr) { | ||
| instance_ = std::make_shared<LayoutDatabase>(LayoutDatabase()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not positive, but I think that the compiler may be able to figure out how to construct the LayoutDatabase based on the template, so you may be able to shorten this to std::make_shared<LayoutDatabase>();
| public: | ||
| TiledVideoManager(std::shared_ptr<TiledEntry> entry) | ||
| : entry_(entry), | ||
| layouts_(LayoutDatabase::instance()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nit and feel free to ignore, but layouts_ makes me think that it is some sort of container (because of the plural-ness). What about layoutDB_?
tasm/tiles/src/LayoutDatabase.cc
Outdated
| } | ||
|
|
||
| std::vector<int> LayoutDatabase::tileLayoutIdsForFrame(const std::string &video, unsigned int frameNumber) const { | ||
| std::string selectLayoutIds = "SELECT id FROM layouts WHERE video = ? AND ? BETWEEN first_frame AND last_frame"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are presumably going to be executing this query multiple times, it may make sense initialize it as a prepared statement, and then do bind/reset for each invocation. An example of this is in SemanticIndexSQLite::initializeStatements(), ::destroyStatements(), ::addMetadata()
(Ditto for the rest of the queries in this class)
tasm/tiles/src/LayoutDatabase.cc
Outdated
|
|
||
| std::vector<unsigned int> heights = heightsForId(video, ids.at(0)); | ||
| unsigned int totalHeight = std::accumulate(heights.begin(), heights.end(), 0); | ||
| for (unsigned int id : ids) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't verify that the height is the same for all IDs, then we could do this aggregation in the query: "SELECT sum(height) FROM heights WHERE video=? and id=?". On the other hand, it's nice to re-use the heightsForId function, despite it adding a bit to the complexity. It's up to you how you want to do this.
| return layouts_->locationOfTileForId(entry_, tileNumber, id); | ||
| } | ||
|
|
||
| } // namespace tasm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to see all of the deletions in this file!
| #include "Video.h" | ||
| #include <gtest/gtest.h> | ||
| #include "sqlite3.h" | ||
| #include "LayoutDatabase.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this import actually needed?
tasm-test/src/DatabaseTest.cc
Outdated
| unsigned int numColumns = 2; | ||
| unsigned int numRows = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could switch these to widths.size(), heights.size() like in the other tests.
tasm-test/src/DatabaseTest.cc
Outdated
|
|
||
| std::experimental::filesystem::path expectedLocation = ""; | ||
|
|
||
| std::experimental::filesystem::path tileLocation = layoutDatabase->locationOfTileForId(entry, tileNumber, id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this test actually checking? It doesn't look like expectedLocation and tileLocation are ever used.
Load resources from test-inputs in SetUp and remove resources in TearDown, ensuring each test begins at the same state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! The only thing is the issue with the videos, but for now I recommend just removing them from the commit, and then the next time we meet we can figure out whether a generic video like red10 would be a suitable replacement.
| auto config = EnvironmentConfiguration::instance(); | ||
| std::experimental::filesystem::remove_all(config.defaultLayoutDatabasePath()); | ||
| std::experimental::filesystem::remove_all(config.catalogPath()); | ||
| std::experimental::filesystem::remove_all(config.defaultLabelsDatabasePath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible future enhancement could be to move these copy/remove calls that are common among the test cases to utility functions that can be called by each test.
| TEST_F(TasmTestFixture, testScanBirdsFullFrame) { | ||
| tasm::TASM tasm(SemanticIndex::IndexType::XY, "/home/maureen/home_videos/birds_tasm.db"); | ||
| auto selection = tasm.selectFrames("birds-birds", "bird", 0, 5, "birds"); | ||
| auto selection = tasm.selectFrames("birdsincage-2x2", "bird", 0, 5, "birds"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot about copyright issues, and I don't believe any of the non-red10 videos have a copyright that allows us to embed it in this repository. Would it be possible to switch all uses of birds/elfuente/any other to use "red10"? Since we don't care about the actual contents of the videos, hopefully all of the tests still work. Let me know if you want me to generate red10 videos with the same dimensions/frame rate/number of frames as these other videos!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's easiest for now, you can leave the tests as-is and just remove the videos from the commits.
tasm-test/src/LayoutBenchmarkTest.cc
Outdated
| // output data to csv file | ||
| printStat(outputFile, experiment, "total", videoName, frameNumber, iteration, totalDuration); | ||
| printStat(outputFile, experiment, "constructor", videoName, frameNumber, iteration, constructorDuration); | ||
| printStat(outputFile, experiment, videoName, "processing", frameNumber, iteration, functionDuration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should videoName and "processing" be switched?
tasm-test/src/LayoutBenchmarkTest.cc
Outdated
| std::string labels[NUM_VIDEOS] VIDEO_LABELS; | ||
| std::string names[NUM_VIDEOS] VIDEO_NAMES; | ||
| int numFrames[NUM_VIDEOS] NUM_FRAMES; | ||
| int ids[NUM_VIDEOS] IDS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could define all of these constants as their actual types at the top of the file and just use them here.
tasm-test/src/LayoutBenchmarkTest.cc
Outdated
| #define VIDEO_LABELS {"bird", "car"} | ||
| #define VIDEO_NAMES {"birdsincage", "traffic-2k-001_yolo"} | ||
| #define NUM_FRAMES {180, 27000} | ||
| #define IDS {197, 8476} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to add a comment specifying how we picked these IDs if it's important so future us knows how to update them if we modify the tests.
| int id = ids[i]; | ||
|
|
||
| std::experimental::filesystem::remove(config.defaultLayoutDatabasePath()); | ||
| LayoutDatabase::instance()->open(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the only two lines that are specific to the layout database instance?
| void printStat(std::ofstream &file, std::string experiment, std::string step, std::string video, int frame, | ||
| int iteration, long time) { | ||
| file << IMPLEMENTATION << SEP << experiment << SEP << step << SEP << video << SEP << frame << SEP << iteration << SEP << time << "\n"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to also flush the ofstream so that even if something fails and the tests don't finish, we should hopefully have partial results in the output file.
tasm-test/src/LayoutBenchmarkTest.cc
Outdated
| #define IDS {197, 8476} | ||
| #define NUM_VIDEOS 2 | ||
| #define NUM_ITERATIONS 3 | ||
| #define FUNCTION_REPETITIONS 50 // for functions that are too fast to measure without repetition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should also include FUNCTION_REPETITIONS in the output file? (With a value of 1 for the non-repeated ones) My thought is that it's possible this could change over time, so it could be useful to know for a specific set of results what its value was.
|
Benchmarking showed that this implementation makes the constructor of TiledVideoManager much faster and less impacted by video size than the metadata file version. However, tileLayoutForId, locationOfTileForId, and tileLayoutIdsForFrame are slower with this version. A high (~5,000 to 10,000) number of these queries would outweigh the benefit of the faster constructor. Video storage performance is similar for both implementations. |
Instead of traversing entire resource file system to load tile layouts, store tile layouts in a single database.
LayoutDatabase handles updates to the database, and operations that dealt with the metadata files have been replaced with calls to the corresponding LayoutDatabase functions.