pdf — community Claude_skills_zh-CN, community, ide skills, Claude Code, Cursor, Windsurf

v1.0.0
GitHub

About this Skill

Perfect for Document Analysis Agents needing advanced PDF processing capabilities. Anthropics/skills中文学习版本 来自:https://github.com/anthropics/skills

LeastBit LeastBit
[0]
[0]
Updated: 2/20/2026

Agent Capability Analysis

The pdf skill by LeastBit is an open-source community AI agent skill for Claude Code and other IDE workflows, helping agents execute tasks with better context, repeatability, and domain-specific guidance.

Ideal Agent Persona

Perfect for Document Analysis Agents needing advanced PDF processing capabilities.

Core Value

Empowers agents to extract text, merge, and manipulate PDF documents using Python libraries like pypdf, enabling efficient document workflow automation and data extraction from PDF files.

Capabilities Granted for pdf

Extracting text from PDF documents
Merging multiple PDF files into a single document
Automating PDF form filling and data extraction

! Prerequisites & Limits

  • Requires Python environment
  • Limited to basic PDF operations, advanced features require additional libraries or tools
Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

pdf

Install pdf, an AI agent skill for AI agent workflows and automation. Works with Claude Code, Cursor, and Windsurf with one-command setup.

SKILL.md
Readonly

PDF 处理指南

概述

本指南涵盖使用 Python 库和命令行工具进行的基本 PDF 处理操作。有关高级功能、JavaScript 库和详细示例,请参阅 reference.md。如果需要填写 PDF 表单,请阅读 forms.md 并按照其说明操作。

快速开始

python
1from pypdf import PdfReader, PdfWriter 2 3# 读取 PDF 4reader = PdfReader("document.pdf") 5print(f"页数: {len(reader.pages)}") 6 7# 提取文本 8text = "" 9for page in reader.pages: 10 text += page.extract_text()

Python 库

pypdf - 基本操作

合并 PDF

python
1from pypdf import PdfWriter, PdfReader 2 3writer = PdfWriter() 4for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]: 5 reader = PdfReader(pdf_file) 6 for page in reader.pages: 7 writer.add_page(page) 8 9with open("merged.pdf", "wb") as output: 10 writer.write(output)

拆分 PDF

python
1reader = PdfReader("input.pdf") 2for i, page in enumerate(reader.pages): 3 writer = PdfWriter() 4 writer.add_page(page) 5 with open(f"page_{i+1}.pdf", "wb") as output: 6 writer.write(output)

提取元数据

python
1reader = PdfReader("document.pdf") 2meta = reader.metadata 3print(f"标题: {meta.title}") 4print(f"作者: {meta.author}") 5print(f"主题: {meta.subject}") 6print(f"创建者: {meta.creator}")

旋转页面

python
1reader = PdfReader("input.pdf") 2writer = PdfWriter() 3 4page = reader.pages[0] 5page.rotate(90) # 顺时针旋转90度 6writer.add_page(page) 7 8with open("rotated.pdf", "wb") as output: 9 writer.write(output)

pdfplumber - 文本和表格提取

提取带布局的文本

python
1import pdfplumber 2 3with pdfplumber.open("document.pdf") as pdf: 4 for page in pdf.pages: 5 text = page.extract_text() 6 print(text)

提取表格

python
1with pdfplumber.open("document.pdf") as pdf: 2 for i, page in enumerate(pdf.pages): 3 tables = page.extract_tables() 4 for j, table in enumerate(tables): 5 print(f"第 {i+1} 页的表格 {j+1}:") 6 for row in table: 7 print(row)

高级表格提取

python
1import pandas as pd 2 3with pdfplumber.open("document.pdf") as pdf: 4 all_tables = [] 5 for page in pdf.pages: 6 tables = page.extract_tables() 7 for table in tables: 8 if table: # 检查表格是否为空 9 df = pd.DataFrame(table[1:], columns=table[0]) 10 all_tables.append(df) 11 12# 合并所有表格 13if all_tables: 14 combined_df = pd.concat(all_tables, ignore_index=True) 15 combined_df.to_excel("extracted_tables.xlsx", index=False)

reportlab - 创建 PDF

基本 PDF 创建

python
1from reportlab.lib.pagesizes import letter 2from reportlab.pdfgen import canvas 3 4c = canvas.Canvas("hello.pdf", pagesize=letter) 5width, height = letter 6 7# 添加文本 8c.drawString(100, height - 100, "Hello World!") 9c.drawString(100, height - 120, "这是用 reportlab 创建的 PDF") 10 11# 添加线条 12c.line(100, height - 140, 400, height - 140) 13 14# 保存 15c.save()

创建多页 PDF

python
1from reportlab.lib.pagesizes import letter 2from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak 3from reportlab.lib.styles import getSampleStyleSheet 4 5doc = SimpleDocTemplate("report.pdf", pagesize=letter) 6styles = getSampleStyleSheet() 7story = [] 8 9# 添加内容 10title = Paragraph("报告标题", styles['Title']) 11story.append(title) 12story.append(Spacer(1, 12)) 13 14body = Paragraph("这是报告的正文内容。" * 20, styles['Normal']) 15story.append(body) 16story.append(PageBreak()) 17 18# 第2页 19story.append(Paragraph("第2页", styles['Heading1'])) 20story.append(Paragraph("第2页的内容", styles['Normal'])) 21 22# 构建 PDF 23doc.build(story)

命令行工具

pdftotext (poppler-utils)

bash
1# 提取文本 2pdftotext input.pdf output.txt 3 4# 提取文本并保留布局 5pdftotext -layout input.pdf output.txt 6 7# 提取指定页面 8pdftotext -f 1 -l 5 input.pdf output.txt # 第1-5页

qpdf

bash
1# 合并 PDF 2qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf 3 4# 拆分页面 5qpdf input.pdf --pages . 1-5 -- pages1-5.pdf 6qpdf input.pdf --pages . 6-10 -- pages6-10.pdf 7 8# 旋转页面 9qpdf input.pdf output.pdf --rotate=+90:1 # 将第1页旋转90度 10 11# 移除密码 12qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf

pdftk(如果可用)

bash
1# 合并 2pdftk file1.pdf file2.pdf cat output merged.pdf 3 4# 拆分 5pdftk input.pdf burst 6 7# 旋转 8pdftk input.pdf rotate 1east output rotated.pdf

常见任务

从扫描的 PDF 提取文本

python
1# 需要安装: pip install pytesseract pdf2image 2import pytesseract 3from pdf2image import convert_from_path 4 5# 将 PDF 转换为图像 6images = convert_from_path('scanned.pdf') 7 8# 对每一页进行 OCR 识别 9text = "" 10for i, image in enumerate(images): 11 text += f"第 {i+1} 页:\n" 12 text += pytesseract.image_to_string(image) 13 text += "\n\n" 14 15print(text)

添加水印

python
1from pypdf import PdfReader, PdfWriter 2 3# 创建水印(或加载现有的) 4watermark = PdfReader("watermark.pdf").pages[0] 5 6# 应用到所有页面 7reader = PdfReader("document.pdf") 8writer = PdfWriter() 9 10for page in reader.pages: 11 page.merge_page(watermark) 12 writer.add_page(page) 13 14with open("watermarked.pdf", "wb") as output: 15 writer.write(output)

提取图像

bash
1# 使用 pdfimages (poppler-utils) 2pdfimages -j input.pdf output_prefix 3 4# 这会将所有图像提取为 output_prefix-000.jpg、output_prefix-001.jpg 等

密码保护

python
1from pypdf import PdfReader, PdfWriter 2 3reader = PdfReader("input.pdf") 4writer = PdfWriter() 5 6for page in reader.pages: 7 writer.add_page(page) 8 9# 添加密码 10writer.encrypt("userpassword", "ownerpassword") 11 12with open("encrypted.pdf", "wb") as output: 13 writer.write(output)

快速参考

任务最佳工具命令/代码
合并 PDFpypdfwriter.add_page(page)
拆分 PDFpypdf每页一个文件
提取文本pdfplumberpage.extract_text()
提取表格pdfplumberpage.extract_tables()
创建 PDFreportlabCanvas 或 Platypus
命令行合并qpdfqpdf --empty --pages ...
OCR 扫描 PDFpytesseract先转换为图像
填写 PDF 表单pdf-lib 或 pypdf(参见 forms.md)参见 forms.md

后续步骤

  • 有关 pypdfium2 的高级用法,请参阅 reference.md
  • 有关 JavaScript 库(pdf-lib),请参阅 reference.md
  • 如果需要填写 PDF 表单,请按照 forms.md 中的说明操作
  • 有关故障排除指南,请参阅 reference.md

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is pdf?

Perfect for Document Analysis Agents needing advanced PDF processing capabilities. Anthropics/skills中文学习版本 来自:https://github.com/anthropics/skills

How do I install pdf?

Run the command: npx killer-skills add LeastBit/Claude_skills_zh-CN/pdf. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for pdf?

Key use cases include: Extracting text from PDF documents, Merging multiple PDF files into a single document, Automating PDF form filling and data extraction.

Which IDEs are compatible with pdf?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for pdf?

Requires Python environment. Limited to basic PDF operations, advanced features require additional libraries or tools.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add LeastBit/Claude_skills_zh-CN/pdf. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use pdf immediately in the current project.

Related Skills

Looking for an alternative to pdf or another community skill for your workflow? Explore these related open-source skills.

View All

widget-generator

Logo of f
f

f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

149.6k
0
AI

flags

Logo of vercel
vercel

flags is a Next.js feature management skill that enables developers to efficiently add or modify framework feature flags, streamlining React application development.

138.4k
0
Browser

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI