doc-cleaner

Category: Docs & Knowledge | Uploader: notoriouslabnotoriouslab | Downloads: 0 | Version: v1.0(Latest)

Convert PDF, DOCX, XLSX, and text files to clean, structured Markdown. CJK-friendly, table-friendly, privacy-first.

Changelog: Source: GitHub https://github.com/notoriouslab/doc-cleaner

Directory Structure

Current level: tree/main/

  • 📁 .claude/
    • 📁 commands/
      • 📁 spectra/
        • 📄 apply.md 9.6 KB
        • 📄 archive.md 5.3 KB
        • 📄 ask.md 7.2 KB
        • 📄 audit.md 9.5 KB
        • 📄 debug.md 4.4 KB
        • 📄 discuss.md 5.2 KB
        • 📄 ingest.md 9.9 KB
        • 📄 propose.md 10.5 KB
  • 📁 ai/
    • 📄 __init__.py 0 B
    • 📄 base.py 3.3 KB
    • 📄 gemini.py 1.9 KB
    • 📄 groq.py 3.9 KB
    • 📄 mlx.py 3.2 KB
    • 📄 nvidia.py 4.0 KB
    • 📄 ollama.py 3.9 KB
  • 📁 classifiers/
    • 📄 __init__.py 0 B
    • 📄 noise.py 3.5 KB
    • 📄 pdf_classifier.py 4.7 KB
    • 📄 pii.py 3.8 KB
  • 📁 output/
    • 📄 __init__.py 0 B
    • 📄 markdown.py 2.7 KB
  • 📁 parsers/
    • 📄 __init__.py 0 B
    • 📄 docx.py 3.6 KB
    • 📄 pdf.py 6.4 KB
    • 📄 text.py 1.2 KB
    • 📄 xlsx.py 3.1 KB
  • 📁 prompts/
    • 📄 __init__.py 0 B
    • 📄 default.txt 835 B
    • 📄 finance.txt 1.2 KB
  • 📄 .env.example 275 B
  • 📄 .gitignore 462 B
  • 📄 CLAUDE.md 1.1 KB
  • 📄 cleaner.py 22.0 KB
  • 📄 config.example.json 928 B
  • 📄 CONTRIBUTING.md 2.5 KB
  • 📄 LICENSE 1.0 KB
  • 📄 README.en.md 13.3 KB
  • 📄 README.md 14.0 KB
  • 📄 requirements.txt 761 B
  • 📄 SECURITY.md 3.7 KB
  • 📄 SKILL.md 2.8 KB

SKILL.md

Login to download/like/favorite ❤ 159 | ★ 0
Comments 0

Please login before commenting.

Loading comments...