evalplus

Rigorous Tools for Evaluating LLM-Generated Code

GitHubResearchFree

About

Evalplus is a tool designed to rigorously evaluate code generated by large language models (LLMs). It participated in the NeurIPS conference in 2023 and the COLM conference in 2024, providing a comprehensive testing framework to help researchers and developers understand and improve the quality of code generated by LLMs.

Key Features

•Supports multiple programming languages
•Integrated test cases
•Performance evaluation metrics

Use Cases

•AI research
•code quality assessment
•model optimization

JSON-LD Structured Data

This is the machine-readable structured data for this agent. AI systems and search engines use this to understand the agent's capabilities.

View Complete JSON-LD Array

[
  {
    "@context": "https://schema.org",
    "@type": "SoftwareApplication",
    "@id": "https://agentsignals.ai/agents/evalplus",
    "name": "evalplus",
    "description": "Evalplus is a tool designed to rigorously evaluate code generated by large language models (LLMs). It participated in the NeurIPS conference in 2023 and the COLM conference in 2024, providing a comprehensive testing framework to help researchers and developers understand and improve the quality of code generated by LLMs.",
    "url": "https://agentsignals.ai/agents/evalplus",
    "applicationCategory": "研究",
    "operatingSystem": "GitHub",
    "sameAs": "https://github.com/evalplus/evalplus",
    "installUrl": "https://github.com/evalplus/evalplus",
    "offers": {
      "@type": "Offer",
      "price": "0",
      "priceCurrency": "USD",
      "description": "免费",
      "availability": "https://schema.org/InStock"
    },
    "featureList": [
      "Supports multiple programming languages",
      "Integrated test cases",
      "Performance evaluation metrics"
    ],
    "datePublished": "2025-12-05T17:19:00.275717+00:00",
    "dateModified": "2025-12-19T19:23:23.730072+00:00",
    "publisher": {
      "@type": "Organization",
      "name": "Agent Signals",
      "url": "https://agentsignals.ai"
    }
  },
  {
    "@context": "https://schema.org",
    "@type": "BreadcrumbList",
    "itemListElement": [
      {
        "@type": "ListItem",
        "position": 1,
        "name": "Home",
        "item": "https://agentsignals.ai"
      },
      {
        "@type": "ListItem",
        "position": 2,
        "name": "Agents",
        "item": "https://agentsignals.ai/agents"
      },
      {
        "@type": "ListItem",
        "position": 3,
        "name": "evalplus",
        "item": "https://agentsignals.ai/agents/evalplus"
      }
    ]
  },
  {
    "@context": "https://schema.org",
    "@type": "FAQPage",
    "mainEntity": [
      {
        "@type": "Question",
        "name": "What is evalplus?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "Rigorous Tools for Evaluating LLM-Generated Code"
        }
      },
      {
        "@type": "Question",
        "name": "What features does evalplus offer?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "Supports multiple programming languages, Integrated test cases, Performance evaluation metrics"
        }
      },
      {
        "@type": "Question",
        "name": "What are the use cases for evalplus?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "AI research, code quality assessment, model optimization"
        }
      },
      {
        "@type": "Question",
        "name": "What are the advantages of evalplus?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "高度可定制的测试环境, 支持最新的LLM模型, 社区活跃，持续更新"
        }
      },
      {
        "@type": "Question",
        "name": "What are the limitations of evalplus?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "对于初学者可能有较高的学习曲线, 资源消耗较大"
        }
      }
    ]
  }
]