[Tracking] Reduce apply() overhead & improve Adapter usability

## Summary

This issue tracks two related improvements in FlashInfer-Bench:

1. Reduce Python-side `apply()` overhead so it’s negligible compared to kernel runtime.
2. Improve the Adapter API so it’s easier to use.

## Motivation

### 1. `apply()` overhead

* `apply()` overhead makes it harder to trust end-to-end latency numbers for very fast solutions.
* The Python orchestration cost around `apply()` is currently ~2% on Llama 3.1 8B, and can be further reduced.

### 2. Adapter usability

* Writing a new Adapter currently requires understanding several internal concepts like dispatch workflow.
* We’d like a smoother path for:
  * Adding a new adapter.
  * Configuring existing adapters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tracking] Reduce apply() overhead & improve Adapter usability #115

Summary

Motivation

1. `apply()` overhead

2. Adapter usability

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tracking] Reduce apply() overhead & improve Adapter usability #115

Description

Summary

Motivation

1. apply() overhead

2. Adapter usability

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `apply()` overhead