Skip to content

Conversation

@AkmalFairuz
Copy link
Contributor

Description

VarInt is a hot path in the Minecraft Bedrock network protocol, so we need to make it as fast as possible. After some testing and benchmarking, I found that unrolling the loop makes VarInt writes faster. In most cases, we usually write 1-2-byte VarInts. For 2-byte VarInts, the speed is about the same, but for 1-byte VarInts it’s around 13% faster. As you can see from the benchmark results, writing VarInts of 3 bytes or more becomes significantly faster!.

Test & Benchmark Code: https://gist.github.com/AkmalFairuz/daadc682603ae5f1d0f77b7fd748853e
Inspired by: https://steinborn.me/posts/performance/how-fast-can-you-write-a-varint/

Benchmark Result

goos: darwin
goarch: arm64
pkg: varint-asap/asap
cpu: Apple M1 Pro
BenchmarkWriteVaruint64_Loop
BenchmarkWriteVaruint64_Loop/1B
BenchmarkWriteVaruint64_Loop/1B-2    	371880081	         3.066 ns/op
BenchmarkWriteVaruint64_Loop/2B
BenchmarkWriteVaruint64_Loop/2B-2    	275919524	         4.348 ns/op
BenchmarkWriteVaruint64_Loop/3B
BenchmarkWriteVaruint64_Loop/3B-2    	186233250	         6.439 ns/op
BenchmarkWriteVaruint64_Loop/4B
BenchmarkWriteVaruint64_Loop/4B-2    	139436500	         8.583 ns/op
BenchmarkWriteVaruint64_Loop/5B
BenchmarkWriteVaruint64_Loop/5B-2    	100000000	        10.71 ns/op
BenchmarkWriteVaruint64_Loop/6B
BenchmarkWriteVaruint64_Loop/6B-2    	93968712	        12.85 ns/op
BenchmarkWriteVaruint64_Loop/7B
BenchmarkWriteVaruint64_Loop/7B-2    	79125879	        14.97 ns/op
BenchmarkWriteVaruint64_Loop/8B
BenchmarkWriteVaruint64_Loop/8B-2    	69467226	        17.17 ns/op
BenchmarkWriteVaruint64_Loop/9B
BenchmarkWriteVaruint64_Loop/9B-2    	60143468	        19.27 ns/op
BenchmarkWriteVaruint64_Loop/10B
BenchmarkWriteVaruint64_Loop/10B-2   	55211548	        21.36 ns/op
BenchmarkWriteVaruint64_Unrolled
BenchmarkWriteVaruint64_Unrolled/1B
BenchmarkWriteVaruint64_Unrolled/1B-2         	452714943	         2.650 ns/op
BenchmarkWriteVaruint64_Unrolled/2B
BenchmarkWriteVaruint64_Unrolled/2B-2         	275627757	         4.349 ns/op
BenchmarkWriteVaruint64_Unrolled/3B
BenchmarkWriteVaruint64_Unrolled/3B-2         	240281890	         4.977 ns/op
BenchmarkWriteVaruint64_Unrolled/4B
BenchmarkWriteVaruint64_Unrolled/4B-2         	262098102	         4.588 ns/op
BenchmarkWriteVaruint64_Unrolled/5B
BenchmarkWriteVaruint64_Unrolled/5B-2         	249383619	         4.929 ns/op
BenchmarkWriteVaruint64_Unrolled/6B
BenchmarkWriteVaruint64_Unrolled/6B-2         	248166111	         4.807 ns/op
BenchmarkWriteVaruint64_Unrolled/7B
BenchmarkWriteVaruint64_Unrolled/7B-2         	235628038	         5.031 ns/op
BenchmarkWriteVaruint64_Unrolled/8B
BenchmarkWriteVaruint64_Unrolled/8B-2         	230897334	         5.175 ns/op
BenchmarkWriteVaruint64_Unrolled/9B
BenchmarkWriteVaruint64_Unrolled/9B-2         	223593786	         5.323 ns/op
BenchmarkWriteVaruint64_Unrolled/10B
BenchmarkWriteVaruint64_Unrolled/10B-2        	219333084	         5.450 ns/op
BenchmarkWriteVarint64_Loop
BenchmarkWriteVarint64_Loop/1B
BenchmarkWriteVarint64_Loop/1B-2              	392168236	         3.057 ns/op
BenchmarkWriteVarint64_Loop/2B
BenchmarkWriteVarint64_Loop/2B-2              	273143738	         4.371 ns/op
BenchmarkWriteVarint64_Loop/3B
BenchmarkWriteVarint64_Loop/3B-2              	184677511	         6.470 ns/op
BenchmarkWriteVarint64_Loop/4B
BenchmarkWriteVarint64_Loop/4B-2              	139671231	         8.567 ns/op
BenchmarkWriteVarint64_Loop/5B
BenchmarkWriteVarint64_Loop/5B-2              	100000000	        10.78 ns/op
BenchmarkWriteVarint64_Loop/6B
BenchmarkWriteVarint64_Loop/6B-2              	91436408	        12.91 ns/op
BenchmarkWriteVarint64_Loop/7B
BenchmarkWriteVarint64_Loop/7B-2              	78226857	        14.99 ns/op
BenchmarkWriteVarint64_Loop/8B
BenchmarkWriteVarint64_Loop/8B-2              	69691129	        17.07 ns/op
BenchmarkWriteVarint64_Loop/9B
BenchmarkWriteVarint64_Loop/9B-2              	61970401	        19.34 ns/op
BenchmarkWriteVarint64_Loop/10B
BenchmarkWriteVarint64_Loop/10B-2             	55047448	        21.39 ns/op
BenchmarkWriteVarint64_Unrolled
BenchmarkWriteVarint64_Unrolled/1B
BenchmarkWriteVarint64_Unrolled/1B-2          	445155319	         2.664 ns/op
BenchmarkWriteVarint64_Unrolled/2B
BenchmarkWriteVarint64_Unrolled/2B-2          	274548861	         4.350 ns/op
BenchmarkWriteVarint64_Unrolled/3B
BenchmarkWriteVarint64_Unrolled/3B-2          	247848409	         4.822 ns/op
BenchmarkWriteVarint64_Unrolled/4B
BenchmarkWriteVarint64_Unrolled/4B-2          	256862097	         4.706 ns/op
BenchmarkWriteVarint64_Unrolled/5B
BenchmarkWriteVarint64_Unrolled/5B-2          	257703038	         4.622 ns/op
BenchmarkWriteVarint64_Unrolled/6B
BenchmarkWriteVarint64_Unrolled/6B-2          	244516525	         4.869 ns/op
BenchmarkWriteVarint64_Unrolled/7B
BenchmarkWriteVarint64_Unrolled/7B-2          	238123762	         5.013 ns/op
BenchmarkWriteVarint64_Unrolled/8B
BenchmarkWriteVarint64_Unrolled/8B-2          	224577495	         5.326 ns/op
BenchmarkWriteVarint64_Unrolled/9B
BenchmarkWriteVarint64_Unrolled/9B-2          	217174342	         5.481 ns/op
BenchmarkWriteVarint64_Unrolled/10B
BenchmarkWriteVarint64_Unrolled/10B-2         	211523164	         5.636 ns/op
BenchmarkWriteVaruint32_Loop
BenchmarkWriteVaruint32_Loop/1B
BenchmarkWriteVaruint32_Loop/1B-2             	394702759	         3.035 ns/op
BenchmarkWriteVaruint32_Loop/2B
BenchmarkWriteVaruint32_Loop/2B-2             	274779033	         4.390 ns/op
BenchmarkWriteVaruint32_Loop/3B
BenchmarkWriteVaruint32_Loop/3B-2             	184443898	         6.480 ns/op
BenchmarkWriteVaruint32_Loop/4B
BenchmarkWriteVaruint32_Loop/4B-2             	139482516	         8.579 ns/op
BenchmarkWriteVaruint32_Loop/5B
BenchmarkWriteVaruint32_Loop/5B-2             	100000000	        10.74 ns/op
BenchmarkWriteVaruint32_Unrolled
BenchmarkWriteVaruint32_Unrolled/1B
BenchmarkWriteVaruint32_Unrolled/1B-2         	444092460	         2.689 ns/op
BenchmarkWriteVaruint32_Unrolled/2B
BenchmarkWriteVaruint32_Unrolled/2B-2         	273265885	         4.378 ns/op
BenchmarkWriteVaruint32_Unrolled/3B
BenchmarkWriteVaruint32_Unrolled/3B-2         	234322006	         5.001 ns/op
BenchmarkWriteVaruint32_Unrolled/4B
BenchmarkWriteVaruint32_Unrolled/4B-2         	260303686	         4.548 ns/op
BenchmarkWriteVaruint32_Unrolled/5B
BenchmarkWriteVaruint32_Unrolled/5B-2         	251496226	         4.894 ns/op
BenchmarkWriteVarint32_Loop
BenchmarkWriteVarint32_Loop/1B
BenchmarkWriteVarint32_Loop/1B-2              	392076781	         3.037 ns/op
BenchmarkWriteVarint32_Loop/2B
BenchmarkWriteVarint32_Loop/2B-2              	272029398	         4.389 ns/op
BenchmarkWriteVarint32_Loop/3B
BenchmarkWriteVarint32_Loop/3B-2              	184441194	         6.516 ns/op
BenchmarkWriteVarint32_Loop/4B
BenchmarkWriteVarint32_Loop/4B-2              	139376882	         8.583 ns/op
BenchmarkWriteVarint32_Loop/5B
BenchmarkWriteVarint32_Loop/5B-2              	100000000	        10.78 ns/op
BenchmarkWriteVarint32_Unrolled
BenchmarkWriteVarint32_Unrolled/1B
BenchmarkWriteVarint32_Unrolled/1B-2          	441986576	         2.717 ns/op
BenchmarkWriteVarint32_Unrolled/2B
BenchmarkWriteVarint32_Unrolled/2B-2          	272128358	         4.409 ns/op
BenchmarkWriteVarint32_Unrolled/3B
BenchmarkWriteVarint32_Unrolled/3B-2          	245919234	         4.926 ns/op
BenchmarkWriteVarint32_Unrolled/4B
BenchmarkWriteVarint32_Unrolled/4B-2          	265668538	         4.511 ns/op
BenchmarkWriteVarint32_Unrolled/5B
BenchmarkWriteVarint32_Unrolled/5B-2          	261839175	         4.620 ns/op
PASS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant