You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many agent benchmarks require the participants to upload the execution trajectories to ensure the transparency and reproducibility. Would it be interesting to support this in BrowserGym leaderboard? Would it be possible to share the execution trajectories for the existing models for broader analysis?
Many agent benchmarks require the participants to upload the execution trajectories to ensure the transparency and reproducibility. Would it be interesting to support this in BrowserGym leaderboard? Would it be possible to share the execution trajectories for the existing models for broader analysis?
Examples:
SWE-bench
WebArena
Thanks!
The text was updated successfully, but these errors were encountered: