text-extract-api

API for processing documents using modern OCR and Ollama models, supporting anonymization and PII removal.

GitHubDevelopmentFree

About

text-extract-api is a GitHub-based project aimed at extracting and parsing document content through advanced OCR technology and models supported by Ollama. The API can handle various document formats such as PDF, Word, and PPTX, and supports converting documents or images into structured JSON or Markdown formats. Additionally, it provides anonymization and removal of personally identifiable information (PII) to ensure data security and privacy.

Key Features

•Supports multiple document formats (PDF, Word, PPTX...)
•Uses modern OCR and Ollama models to extract content
•Anonymizes and removes personal identifying information (PII)

Use Cases

•Automated Document Processing
•Data Extraction and Structuring
•Privacy Protection and Data Security

JSON-LD Structured Data

This is the machine-readable structured data for this agent. AI systems and search engines use this to understand the agent's capabilities.

View Complete JSON-LD Array

[
  {
    "@context": "https://schema.org",
    "@type": "SoftwareApplication",
    "@id": "https://agentsignals.ai/agents/text-extract-api",
    "name": "text-extract-api",
    "description": "text-extract-api is a GitHub-based project aimed at extracting and parsing document content through advanced OCR technology and models supported by Ollama. The API can handle various document formats such as PDF, Word, and PPTX, and supports converting documents or images into structured JSON or Markdown formats. Additionally, it provides anonymization and removal of personally identifiable information (PII) to ensure data security and privacy.",
    "url": "https://agentsignals.ai/agents/text-extract-api",
    "applicationCategory": "开发工具",
    "operatingSystem": "GitHub",
    "sameAs": "https://github.com/CatchTheTornado/text-extract-api",
    "installUrl": "https://github.com/CatchTheTornado/text-extract-api",
    "offers": {
      "@type": "Offer",
      "price": "0",
      "priceCurrency": "USD",
      "description": "免费",
      "availability": "https://schema.org/InStock"
    },
    "featureList": [
      "Supports multiple document formats (PDF, Word, PPTX...)",
      "Uses modern OCR and Ollama models to extract content",
      "Anonymizes and removes personal identifying information (PII)"
    ],
    "datePublished": "2025-12-05T17:14:07.507797+00:00",
    "dateModified": "2025-12-19T05:09:33.058893+00:00",
    "publisher": {
      "@type": "Organization",
      "name": "Agent Signals",
      "url": "https://agentsignals.ai"
    }
  },
  {
    "@context": "https://schema.org",
    "@type": "BreadcrumbList",
    "itemListElement": [
      {
        "@type": "ListItem",
        "position": 1,
        "name": "Home",
        "item": "https://agentsignals.ai"
      },
      {
        "@type": "ListItem",
        "position": 2,
        "name": "Agents",
        "item": "https://agentsignals.ai/agents"
      },
      {
        "@type": "ListItem",
        "position": 3,
        "name": "text-extract-api",
        "item": "https://agentsignals.ai/agents/text-extract-api"
      }
    ]
  },
  {
    "@context": "https://schema.org",
    "@type": "FAQPage",
    "mainEntity": [
      {
        "@type": "Question",
        "name": "What is text-extract-api?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "API for processing documents using modern OCR and Ollama models, supporting anonymization and PII removal."
        }
      },
      {
        "@type": "Question",
        "name": "What features does text-extract-api offer?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "Supports multiple document formats (PDF, Word, PPTX...), Uses modern OCR and Ollama models to extract content, Anonymizes and removes personal identifying information (PII)"
        }
      },
      {
        "@type": "Question",
        "name": "What are the use cases for text-extract-api?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "Automated Document Processing, Data Extraction and Structuring, Privacy Protection and Data Security"
        }
      },
      {
        "@type": "Question",
        "name": "What are the advantages of text-extract-api?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "支持多种文档格式, 先进的 OCR 和 AI 模型, 提供隐私保护功能"
        }
      },
      {
        "@type": "Question",
        "name": "What are the limitations of text-extract-api?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "可能需要一定的技术知识来集成和使用, 处理速度可能受文档复杂度影响"
        }
      }
    ]
  }
]