This repository contains artifacts to reproduce the paper "Fray: An Efficient General-Purpose Concurrency Testing Platform for the JVM". This README only includes the instructions to run the evaluation. The documentation of the Fray project is available in the Fray repository and docs folder. We claim the following badge for the paper:
- Artifact Available: Both Fray and Fray Benchmark are available on GitHub. Fray uses Nix to provide reproducible builds.
- Reusable: Fray is designed to be reusable in real-world applications. It can be used as a JUnit test to find concurrency bugs in Java applications. The Fray debugger plugin is also available for IntelliJ IDEA. The Fray documentation provides detailed instructions on how to use Fray in real-world applications and how to extend Fray to support new concurrency testing algorithms.
- Results Reproduced: The results of the paper are reproduced in this repository.
- x86 CPU with at least 6 cores
- 32 GB of RAM
- You can run evaluation on any Linux and macOS system
- Linux bare-metal machine with Linux kernel version 4.7 or higher
- If you do not have a Linux system or are running the evaluation in a VM, you cannot evaluate the results with RR.
- You need to set kernel.perf_event_paranoidto1or lower to use RR.- You can do this by running the following command:
echo 'kernel.perf_event_paranoid=1' | sudo tee '/etc/sysctl.d/51-enable-perf-events.conf' 
- If you are using NixOS, you can set the kernel.perf_event_paranoidoption in yourconfiguration.nixfile:boot.kernel.sysctl."kernel.perf_event_paranoid" = 1; 
 
- You can do this by running the following command:
 
- Intel CPU is recommended for better performance
- For AMD CPU, you need extra configuration
 
- You may clone this repository: git clone --recurse-submodules -j8 https://github.com/cmu-pasta/fray-benchmark.- The project uses Nix to manage its dependencies. You may download Nix from here.
 
- You may also use the pre-configured container image that includes all the dependencies and tools to run the evaluation.
- podman run -p 8888:8888 -it --privileged leeleo3x/fray-benchmark bash
- We need the --privilegedflag to run nix and RR.
 
- Enter the project directory: cd fray-benchmark.- In the pre-configured container image, you are already in the project directory (/fray-benchmark).
 
- In the pre-configured container image, you are already in the project directory (
- Next, you need to enter the devshell: nix develop.
- If you are not using the pre-configured container image, run the following command to build all projects: ./scripts/build.sh.
- You can run the following command to run the sctbench benchmark for each technique for 1 minute (mini RQ 1): bash ./scripts/run_kickthetire.sh.
- Next you can run the jupyter notebook to visualize the results: uv run --with jupyter jupyter lab --allow-root --ip 0.0.0.0 --no-browser.
- Go to the Kick The Tiresection in the notebook to see the results of the mini RQ 1.
- 
The source code of the benchmark applications is available in the bms/{benchmark_name}directory.
- 
Test cases are defined in fray_benchmark/assets/{benchmark_name}.txtfile.
- 
(~2 hours) Reproduce the benchmark results (RQ1 and RQ2): - bash ./scripts/run_benchmark.sh
- This script only runs the benchmark with 1 repetition for each technique.
- If you have a powerful machine, you can run the evaluation with more concurrent jobs by using the --cpu NUM_OF_JOBSoption. For example,bash ./scripts/run_benchmark.sh --cpu 24will run the evaluation with 24 concurrent jobs.
 
- 
(~30 minutes) Reproduce real-world bugs found by Fray (RQ3 and RQ4): - bash ./scripts/run_realworld.sh
- By default, this script only reproduces failures using the POS algorithm. You may add the --full-evaluationoption to run all algorithms (PCT3, PCT15, and Random) and all techniques (JPF and RR). The full evaluation will take ~4 hours to complete.
- Note that depending on the hardware, Fray may find more or fewer bugs than reported in the original paper.
- If you want to run all collected concurrency tests, replacing the fray_benchmark/assets/{benchmark_name}.txtwithfray_benchmark/assets/{benchmark_name}.full.txtwill run all tests. Note that this will take days to complete.
 
- All results will be saved in the outputdirectory.
- You can find the RQ1 and RQ2 results in the output/benchmarkdirectory and the RQ3 and RQ4 results in theoutput/realworlddirectory.- {benchmark_name}/{technique}/iter-0/{run_id}/contains the output of each technique for each test case.
- For Fray, the reportfolder contains the output.report/fray.logcontains the log of Fray and error information if Fray finds a bug.
 
- We provide a Jupyter notebook to analyze the results. You can run the notebook by using the following command: uv run --with jupyter jupyter lab --allow-root --ip 0.0.0.0 --no-browser. The notebook is located infray_benchmark/visualizer/visuralize_result.ipynb.
- 
Kafka - Run 11: #17112 StreamThread shutdown calls completeShutdown only in CREATED state
- Run 10: #17113 Flaky Test in GlobalStreamThreadTest#shouldThrowStreamsExceptionOnStartupIfExceptionOccurred
- Run 9: #17114 DefaultStateUpdater::handleRuntimeException should update isRunning before calling addToExceptionsAndFailedTasksThenClearUpdatingAndPausedTasks
- Run 8: #17162 DefaultTaskManagerTest may leak AwaitingRunnable thread
- Run 7: #17354 StreamThread::setState race condition causes java.lang.RuntimeException: State mismatch PENDING_SHUTDOWN different from STARTING
- Run 6: #17371 Flaky test in DefaultTaskExecutorTest.shouldUnassignTaskWhenRequired
- Run 5: #17379 KafkaStreams: Unexpected state transition from ERROR to PENDING_SHUTDOWN
- Run 4: #17394 Flaky test in DefaultTaskExecutorTest.shouldSetUncaughtStreamsException
- Run 3: #17402 Test failure: DefaultStateUpdaterTest.shouldGetTasksFromRestoredActiveTasks expected: <2> but was: <3>
- Run 2: #17929 awaitProcessableTasksis not safe in the presence of spurious wakeups.
- Run 1: #17946 Flaky test DeafultStateUpdaterTest::shouldResumeStandbyTask due to concurrency issue
- Run 0: #18418 Flaky test in KafkaStreamsTest::shouldThrowOnCleanupWhileShuttingDownStreamClosedWithCloseOptionLeaveGroupFalse
 
- 
Lucene - Run 1: #13547 Flaky Test in TestMergeSchedulerExternal#testSubclassConcurrentMergeScheduler
- Run 4: #13552 Test TestIndexWriterWithThreads#testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce Failed
- Run 0, 3: #13571 DocumentsWriterDeleteQueue.getNextSequenceNumber assertion failure seqNo=9 vs maxSeqNo=8
- Run 2: #13593 ConcurrentMergeScheduler may spawn more merge threads than specified
 
- 
Guava - Run 0, 1, 2: #7319 Lingering threads in multiple tests
 
The Fray repository contains the main documentation of Fray design and implementation.
- Usage Guide provides instructions on how to use Fray in normal applications as JUnit tests.
- Debugger Plugin provides instructions on how to use the Fray dubugger plugin.
- Architecture provdes an overview of the Fray architecture and design.
- Debugging Guide provides instructions on how to debug Fray and its components.
- We also wrote blog posts showing how to use Fray to find real-world bugs[1, 2, 3].
To reproduce a bug found by Fray, you can use the following command:
python -m fray_benchmark replay RUN_DIRFor example, you can run python3 -m fray_benchmark replay ./output/benchmark/sctbench/random/iter-0/0 to reproduce the StringBufferJDK bug found in SCTBench.
Note that real-world applications have randomness other than concurrency, so you may not be able to reproduce the bug deterministically. This commit shows the changes required to reproduce the bug found in Lucene.
Please follow the Fray documentation to use Fray in any real-world applications. Good luck finding real-world bugs!
RR is very sensitive to the CPU and kernel version. You may have slightly different result if you are using a different CPU or kernel version.
All required revision changes concern paper writing only and do not affect Fray's implementation or evaluation, so we expect no changes to the evaluation results.