Skip to content

[Filestore] TStorageServiceActor overhead optimization #4853

@qkrorlqr

Description

@qkrorlqr

There are at least 2 big issues:

  • there's a bug because of which TStorageServiceActor is spammed with EvPingSessionRequests - we see that for high iops tests (high iodepth and/or numjobs and small request size) TStorageServiceActor is 100% busy and 20% of that time is spent processing EvPingSessionRequests
  • another 30% is spent in TStorageServiceActor::CompleteRequest which mostly does statistics accounting - this thing can be moved to a separate actor which wouldn't be on the critical path for IO processing

If we don't hit any other bottlenecks, freeing half of the time that TStorageServiceActor currently spends processing pings and stats to let it do IO instead may let us increase our max iops by x2.

A simple fio config which I used to trigger the issue:

[global]
filesize=2G
time_based=1
startdelay=5
exitall_on_error=1
create_serialize=0
filename_format=$filenum/$jobnum
group_reporting=1
clocksource=gettimeofday
ioengine=libaio
disk_util=0
direct=1

[init]
blocksize=1Mi
rw=write
size=100%
numjobs=64
time_based=0
description='pre-create files'

[read-4k]
stonewall
description='Read iops workload'
iodepth=8
bs=4k
rw=randread
numjobs=64
runtime=120

Another interesting thing which is worth testing here is how we scale if we increase guest queue count and max_background.

Metadata

Metadata

Assignees

Labels

filestoreAdd this label to run only cloud/filestore build and tests on PR

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions