Skip to content

Fail to Reproduce WebArena Results with GenericAgent-GPT-4o #249

@bpucla

Description

@bpucla

Dear AgentLab Authors,

Thank you for the great work! I'm trying to reproduce the WebArena Results with GenericAgent-GPT-4o. In particular, I used the following code. Everything should just follow AgentLab's default. However the number I got is 25 which is significantly lower than 31.4 as shown on the BrowserGym Leaderboard. Do you have any suggestions for the reproduction? Any code available to reproduce the performance ~31?

Thanks again for you great contribution to the community!

from agentlab.agents.generic_agent import AGENT_4o 

from agentlab.experiments.study import make_study
from agentlab.experiments.study import Study

study = make_study(
    benchmark="webarena", 
    agent_args=[AGENT_4o],
    comment="repo 4o agent",
)



study.run(n_jobs=5)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions