Skip to content

Conversation

@kirstenmg
Copy link
Collaborator

Instead of traversing entire resource file system to load tile layouts, store tile layouts in a single database.
LayoutDatabase handles updates to the database, and operations that dealt with the metadata files have been replaced with calls to the corresponding LayoutDatabase functions.

@kirstenmg kirstenmg requested a review from maureendaum March 1, 2021 03:52
Copy link
Collaborator

@maureendaum maureendaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great! Feel free to use/ignore any of the style/naming comments, and then let me know if you have any questions about the transaction or prepared statement stuff.

I think we can also probably delete the TileConfiguration.proto file at this point.


class LayoutDatabase {
public:
static std::shared_ptr<LayoutDatabase> &instance() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a pointer, we don't need to return a reference. However, if we want to enforce that callers of this can't change it, we can add some const annotations.
Option 1: return a const reference const LayoutDatabase&
Option 2: return a shared_ptr to a constant object: std::shared_ptr<const LayoutDatabase>
Option 3: return a constant shared_ptr to a non-constant object: const std::shared_ptr<LayoutDatabase>
Option 4: return a constant shared_ptr to a constant object: const std::shared_ptr<const LayoutDatabase>
Option 4 means that a caller can't re-assign the pointer, and they also can't modify the pointed-to object.

I haven't been good about putting const in or around pointers in the rest of the project, so I don't feel super strongly about having to add them here.

class LayoutDatabase {
public:
static std::shared_ptr<LayoutDatabase> &instance() {
if (instance_ == nullptr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be shortened to if (!instance_)

public:
static std::shared_ptr<LayoutDatabase> &instance() {
if (instance_ == nullptr) {
instance_ = std::make_shared<LayoutDatabase>(LayoutDatabase());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not positive, but I think that the compiler may be able to figure out how to construct the LayoutDatabase based on the template, so you may be able to shorten this to std::make_shared<LayoutDatabase>();

public:
TiledVideoManager(std::shared_ptr<TiledEntry> entry)
: entry_(entry),
layouts_(LayoutDatabase::instance()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit and feel free to ignore, but layouts_ makes me think that it is some sort of container (because of the plural-ness). What about layoutDB_?

}

std::vector<int> LayoutDatabase::tileLayoutIdsForFrame(const std::string &video, unsigned int frameNumber) const {
std::string selectLayoutIds = "SELECT id FROM layouts WHERE video = ? AND ? BETWEEN first_frame AND last_frame";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are presumably going to be executing this query multiple times, it may make sense initialize it as a prepared statement, and then do bind/reset for each invocation. An example of this is in SemanticIndexSQLite::initializeStatements(), ::destroyStatements(), ::addMetadata()

(Ditto for the rest of the queries in this class)


std::vector<unsigned int> heights = heightsForId(video, ids.at(0));
unsigned int totalHeight = std::accumulate(heights.begin(), heights.end(), 0);
for (unsigned int id : ids) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't verify that the height is the same for all IDs, then we could do this aggregation in the query: "SELECT sum(height) FROM heights WHERE video=? and id=?". On the other hand, it's nice to re-use the heightsForId function, despite it adding a bit to the complexity. It's up to you how you want to do this.

return layouts_->locationOfTileForId(entry_, tileNumber, id);
}

} // namespace tasm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see all of the deletions in this file!

#include "Video.h"
#include <gtest/gtest.h>
#include "sqlite3.h"
#include "LayoutDatabase.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this import actually needed?

Comment on lines 90 to 91
unsigned int numColumns = 2;
unsigned int numRows = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could switch these to widths.size(), heights.size() like in the other tests.


std::experimental::filesystem::path expectedLocation = "";

std::experimental::filesystem::path tileLocation = layoutDatabase->locationOfTileForId(entry, tileNumber, id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test actually checking? It doesn't look like expectedLocation and tileLocation are ever used.

Copy link
Collaborator

@maureendaum maureendaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! The only thing is the issue with the videos, but for now I recommend just removing them from the commit, and then the next time we meet we can figure out whether a generic video like red10 would be a suitable replacement.

auto config = EnvironmentConfiguration::instance();
std::experimental::filesystem::remove_all(config.defaultLayoutDatabasePath());
std::experimental::filesystem::remove_all(config.catalogPath());
std::experimental::filesystem::remove_all(config.defaultLabelsDatabasePath());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible future enhancement could be to move these copy/remove calls that are common among the test cases to utility functions that can be called by each test.

TEST_F(TasmTestFixture, testScanBirdsFullFrame) {
tasm::TASM tasm(SemanticIndex::IndexType::XY, "/home/maureen/home_videos/birds_tasm.db");
auto selection = tasm.selectFrames("birds-birds", "bird", 0, 5, "birds");
auto selection = tasm.selectFrames("birdsincage-2x2", "bird", 0, 5, "birds");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about copyright issues, and I don't believe any of the non-red10 videos have a copyright that allows us to embed it in this repository. Would it be possible to switch all uses of birds/elfuente/any other to use "red10"? Since we don't care about the actual contents of the videos, hopefully all of the tests still work. Let me know if you want me to generate red10 videos with the same dimensions/frame rate/number of frames as these other videos!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's easiest for now, you can leave the tests as-is and just remove the videos from the commits.

// output data to csv file
printStat(outputFile, experiment, "total", videoName, frameNumber, iteration, totalDuration);
printStat(outputFile, experiment, "constructor", videoName, frameNumber, iteration, constructorDuration);
printStat(outputFile, experiment, videoName, "processing", frameNumber, iteration, functionDuration);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should videoName and "processing" be switched?

std::string labels[NUM_VIDEOS] VIDEO_LABELS;
std::string names[NUM_VIDEOS] VIDEO_NAMES;
int numFrames[NUM_VIDEOS] NUM_FRAMES;
int ids[NUM_VIDEOS] IDS;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could define all of these constants as their actual types at the top of the file and just use them here.

#define VIDEO_LABELS {"bird", "car"}
#define VIDEO_NAMES {"birdsincage", "traffic-2k-001_yolo"}
#define NUM_FRAMES {180, 27000}
#define IDS {197, 8476}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to add a comment specifying how we picked these IDs if it's important so future us knows how to update them if we modify the tests.

int id = ids[i];

std::experimental::filesystem::remove(config.defaultLayoutDatabasePath());
LayoutDatabase::instance()->open();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the only two lines that are specific to the layout database instance?

void printStat(std::ofstream &file, std::string experiment, std::string step, std::string video, int frame,
int iteration, long time) {
file << IMPLEMENTATION << SEP << experiment << SEP << step << SEP << video << SEP << frame << SEP << iteration << SEP << time << "\n";
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to also flush the ofstream so that even if something fails and the tests don't finish, we should hopefully have partial results in the output file.

#define IDS {197, 8476}
#define NUM_VIDEOS 2
#define NUM_ITERATIONS 3
#define FUNCTION_REPETITIONS 50 // for functions that are too fast to measure without repetition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should also include FUNCTION_REPETITIONS in the output file? (With a value of 1 for the non-repeated ones) My thought is that it's possible this could change over time, so it could be useful to know for a specific set of results what its value was.

@kirstenmg
Copy link
Collaborator Author

Benchmarking showed that this implementation makes the constructor of TiledVideoManager much faster and less impacted by video size than the metadata file version. However, tileLayoutForId, locationOfTileForId, and tileLayoutIdsForFrame are slower with this version. A high (~5,000 to 10,000) number of these queries would outweigh the benefit of the faster constructor. Video storage performance is similar for both implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants