Here is the scores on test set (standard) results of AgentBench. While LLMs begin to manifest their proficiency in LLM-as-Agent, gaps between models and the distance towards practical usability are ...
Some results have been hidden because they may be inaccessible to you