-
Notifications
You must be signed in to change notification settings - Fork 110
Description
In SourceManager the persistFiles() method unloads the FileContent for the cached File entities for (unverified) memory management considerations:
void SourceManager::persistFiles()
{
std::lock_guard<std::mutex> guard(_createFileMutex);
_transaction([&, this]() {
for (const auto& p : _files)
{
if (_persistedFiles.find(p.second->id) == _persistedFiles.end())
_persistedFiles.insert(p.second->id);
else
continue;
try
{
// Directories don't have content.
if (p.second->content &&
_persistedContents.find(p.second->content.object_id()) ==
_persistedContents.end())
{
p.second->content.load();
_db->persist(*p.second->content);
_persistedContents.insert(p.second->content.object_id());
}
_db->persist(*p.second);
// TODO: The memory consumption should be checked to see if not
// unloading the lazy shared pointer keeps the file content in memory.
// If so then this line should be uncommented. The reason for not
// unloading is that some parsers may want to read the file contents and
// if this can be done through the File object then the file is not
// needed to be read from disk.
p.second->content.unload();
}
catch (const odb::object_already_persistent&)
{
}
}
});
}Based on the given comment it was assumed that the file's content could be reloaded later if desired. However File::content is an odb::lazy_shared_ptr<FileContent> and in case we parse it for the first time it is loaded from the disk by the SourceManager::getCreateFileEntry() and the pointer is transient (not persisted). According to the ODB manual the unload() method "for transient objects is equivalent to reset()" and therefore reloading the pointer later with load() is not possible.
Follow through the explanation among the code lines:
for (const auto& p : _files)
{
// not relevant code parts omitted ...
try
{
// This loads the lazy_ptr. Note that this lazy_ptr instance is transient
// if it was read from the disk and not loaded from the database.
p.second->content.load();
// This persists the FileContent record in the database,
// but the lazy pointer is still transient, since the persist() method
// does not marks is persisted. (For the persist() method the
// shared_ptr behind lazy_ptr was passed in the first place).
_db->persist(*p.second->content);
}
_db->persist(*p.second);
// This unloads the lazy_ptr, which is equal to reset() for transient pointers
p.second->content.unload();
}
}This results to bugs, e.g. with the following scenario:
- Run searchparser
- Loads all files into SourceManager from disk as transient objects.
- Persist the objects. The lazy pointer for
FileContententities are still transient. - Unloads the transient lazy pointers, which is an irreversible action.
- Run metricsparser
- Loads files from SourceManager.
- The lazy pointers for the
FileContententities are now invalid and cannot be loaded. - Metrics is skipped for most files, because it assumes the files have no content.
As a solution these transient lazy pointers:
- should not be unloaded upon persistence; or
- should be replaced with an unloaded, but persisted lazy pointer; or
- functionality should be added into
SourceManager::getCreateFileEntry()to handle these invalidated lazy pointers when required.
Open for discussion!