Skip to content

Conversation

@benjirewis
Copy link
Member

@benjirewis benjirewis commented Jan 14, 2026

RSDK-11788

Adds a last_shutdown timestamp to the version cache. Upon startup, uses journalctl -u viam-agent -t systemd -o json -S [last_shutdown] to forward all systemd-identified logs from the viam-agent systemd service since the last shutdown to app. This allows logs like the following to make it up to app:

image

Copy link
Member Author

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is still in "draft" since I'm adding tests and want to make sure "OOM killed" logs specifically get saved, but feel free to review @danielbotros + @jmatth .

@benjirewis benjirewis force-pushed the recent-systemd-logs-forwarding branch from 103c870 to 6759c78 Compare January 21, 2026 16:31
@benjirewis
Copy link
Member Author

Tests added; now ready for full review @jmatth + @danielbotros .

@benjirewis benjirewis marked this pull request as ready for review January 21, 2026 16:32
Comment on lines +39 to +45
// A nil lastShutdown value means none was saved in the cache. Bail in this case. We
// _could_ attempt to upload all historical systemd agent logs if we do not know when
// the agent last shutdown, but there may be quite a few of these logs, and doing so
// could slow startup.
if s.lastShutdown == nil {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some edge cases here:

  • If agent gets OOM killed or similar during its first run we won't upload any logs on the next start
  • If agent gets OOM killed or similar after one or more clean shutdowns then we're going to upload all the systemd logs since the last shutdown. I'm assuming agent has average uptimes in days or weeks so that could still be a lot of logs. Is it the case that viam-agent.systemd logs are only created very rarely during the lifetime of the process?

journalCmd *exec.Cmd
cancelFunc context.CancelFunc
noJournald bool
lastShutdown *time.Time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing to keep this extra value on the struct even though it's only really used during startup is kind of annoying. I don't see a good way around it given the current code though. Just flagging it as one more reason we should get rid of the weird subsystem duck typing at some point.

}

func TestVersionCacheJSONRoundtrip(t *testing.T) {
someTime, err := time.Parse("2006-01-02 15:04:05", "2011-11-11 00:00:00" /* https://tinyurl.com/dm4ytr3c */)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants