-
-
Notifications
You must be signed in to change notification settings - Fork 114
Description
Hey there,
I've recently done a bit of work on physics-related tasks and in the process spent some thoughts on how Farseer performance could be improved.
Performance Observations
First, some assumptions on certain performance aspects that hopefully are uncontroversial enough to agree on as a precondition:
- The fastest way to process data is to iterate over an array of structs.
- It tends to be faster to bulk-process batches of data in a tight loop than to process one item at a time.
- Parallelization has a synchronization overhead, except where no synchronization is necessary.
- Executing user code has a guard overhead, except where no context-specific guard is necessary.
- There is a non-zero overhead to abstractions like
List<T>vs.T[]when it comes to bulk-processing large amounts of data.
Potential Performance Sinks
Next, here is a list of Farseer system design decisions that potentially drag performance below the theoretical maximum:
- Shape, Joint and Body are all classes stored in lists or similar data structures.
- Users can subscribe to events that are invoked from the middle of the physics simulation.
- User events are invoked callback style, item-by-item.
- Parallelizing physics simulation internally (i.e. not by putting Farseer into its own thread) is complicated because every object may alter the state of every other.
Design Decision Draft for Maximizing Efficiency
Finally, I'll just throw in some rather radical ideas on how to restructure the overall system to address the above issues:
- Make Body a struct and store all bodies in a World-global (potentially sparse) array.
- Make Joint a struct (generalized) and store all joints in a World-global (potentially sparse) array.
- Make Shape a struct (discriminated union) and store all shapes in a Body-local (dense) array.
- Double-buffer all of the above data within World, so a physics step / update can strictly read-only from buffer A (current frame) and write-only to buffer B (next frame). This enables internal parallelization without synchronization using
Parallel.Forand similar. - Grant users direct access to all of the above World-global arrays so they can perform efficient bulk operations.
- References between Body, Joint and Shape take the form of
int-based handles that represent the access index in their world. - Instead of event callbacks, gather all occurrences (structs!) to "subscribed" events in an array over an update step and deliver them in bulk once the update step is finished. This enables users to do efficient batch-processing and frees Velcro from the need for state guards and their overhead. See issue Move Towards Bulk Collision / Separation / Solve Event Handling #8.
- User code that absolutely requires to be executed within the physics frame (such as collision filters, but be really strict here) is required to consist of a pure function that gets all required data read-only via parameter and returns a value.
As a side effect to improved efficiency, less overhead and internal parallel execution, the above changes could also make serialization (see issue #6) a lot easier. It would also make pooling (see issue #5) obsolete.
On the downside, this would require users to adopt a new API and potentially be more careful due to less safety options being around. There will be a bigger need for good documentation so users are primed for certain aspects of the system. This might also generate a large amount of issues that need to be resolved in order to adapt to the new design, but if any of these changes should make it into Velcro, now would likely be a better opportunity to tackle them than later.
I'm totally aware that the above are quite radical suggestions and that I don't have any idea about most of the internal structures and design decisions of Farseer, so please read it as a collection of ideas by an interested observer :)