← Back to Search Tool
📚 User Guide
Welcome to the PDF Keyword Search Tool! This guide will help you understand how to use the tool effectively and what to do when you encounter quality warnings.
⚡ Quick Start:
1️⃣ Upload your PDFs
2️⃣ Add your keywords
3️⃣ Choose Simple Search (recommended for most users)
4️⃣ Click Search PDFs
5️⃣ If you see quality warnings, try the Enhanced Search button that appears
🔍 Basic Usage
1. Upload PDFs
- Single PDF: Select one PDF file to search
- Batch Processing: Select multiple PDFs to search them all at once
- File Size Limit: 100MB total across all uploaded files - If you get an error such as "Error: Failed to execute 'json' on 'Response'. Unexpected end of JSON input", it is likely that too much data was submitted. Scan fewer PDFs and see if it works.
2. Add Keywords
You can add keywords in two ways:
- Upload Keywords File: Upload a .txt file with one keyword per line
- Type Keywords: Enter keywords directly in the text area (one per line)
- Combine Both: Use both methods - the tool will combine all keywords
Keyword Format Examples:
✅ Good: obesity
✅ Good: machine-learning
✅ Good: research methods
❌ Avoid: obesity, diversity, health (use separate lines instead)
3. Choose Search Mode
Select between two search methods based on your PDF quality and needs:
🟢 Simple Search (Recommended)
How it works: Uses the best single text extraction method for each PDF page
Best for: Most PDFs created from Word, Excel, or other digital documents
Results: Accurate keyword counts without inflation
When to use: Start here for all searches - it works well for 90% of PDFs
🟡 Enhanced Search
How it works: Uses 4 different text extraction methods simultaneously and combines results
Best for: Scanned PDFs, image-heavy documents, or PDFs with formatting issues
Results: Finds more keywords but may inflate counts (same keyword counted multiple times)
When to use: When Simple Search misses keywords you know exist, or for known problematic PDFs
4. Search and Download
- Click "Search PDFs" to start the search
- View results directly on the page
- Download reports in multiple formats: Text, JSON, CSV, or ZIP (for batch)
⚠️ PDF Quality Warnings
The tool analyzes your PDFs and may display quality warnings. Here's what they mean:
🚨 Poor Quality PDFs (Red Warning)
What it means: The PDF is likely scanned, image-based, or has very little searchable text.
Impact: Keywords may be missed entirely.
What to do: Try to recreate the PDF from the original document using "Print to PDF".
⚠️ Minor Issues (Yellow Warning)
What it means: The PDF has some formatting issues but should work reasonably well.
Impact: Most keywords will be found, but some might be missed.
What to do: Check results for accuracy; recreate PDF if results seem wrong.
✅ Good Quality PDFs (No Warning)
What it means: The PDF has clean, searchable text.
Impact: Keywords should be found accurately.
What to do: Nothing - results should be reliable.
🔧 Search Modes Explained
🟢 Simple Search (Recommended)
Technical Details:
• Uses 4 text extraction methods: Standard, Block-by-block, HTML, and XML
• Selects the best single method that extracted text for each page
• Priority order: Standard → Blocks → HTML → XML
• Provides accurate, non-inflated keyword counts
• Handles hyphenated words and PDF line breaks automatically
🟡 Enhanced Search (For Problematic PDFs)
Technical Details:
• Uses the same 4 text extraction methods as Simple Search
• Searches for keywords in ALL extraction methods simultaneously
• Combines results from all methods, which may count the same keyword multiple times
• Useful when PDFs have layers, hidden text, or complex formatting
• Trade-off: Finds more keywords but inflates counts
📈 Progressive Enhancement Workflow
Even if you start with Simple Search, you can still use Enhanced extraction:
- Start with Simple Search - Most reliable for clean PDFs
- Review quality warnings - Tool identifies problematic PDFs
- Try Enhanced for problematic PDFs only - Click the enhancement button that appears
- Compare results - Enhanced results are shown separately from Simple results
Smart Enhancement: When you use the "Try Enhanced Search for Problematic PDFs" button after a Simple search, only PDFs with quality issues are re-processed with Enhanced mode. Clean PDFs keep their accurate Simple Search results.
📊 Understanding Results
Report Information
- Matches Found: Total number of keyword occurrences
- Pages: Which pages contain each keyword
- Quality Score: How confident the tool is in the results (0-100%)
Download Formats
- Text Report: Human-readable summary
- JSON: Machine-readable data for further processing
- CSV/Excel: Spreadsheet format for analysis
- ZIP: All formats bundled together (batch processing)
💡 Best Practices
For Best Results:
- Use clean PDFs: Create PDFs using "Print to PDF" from Word, not "Save as PDF"
- Remove track changes: Accept all changes and delete comments before creating PDF
- Avoid scanned documents: Use original digital files when possible
- Test your PDFs: If you can't copy/paste text normally, recreate the PDF
Keyword Tips:
- Use specific terms: Rather than very common words
- Include variations: Both "behavior" and "behaviour"
- Hyphenated words are handled automatically:
- Searching "machine-learning" will find both "machine-learning" and "machinelearning"
- Searching "machine" will also find "machine-learning" and "machine learning"
- Searches handle PDF line breaks (e.g., "div-ersity" or "div- ersity")
- Searches are case-insensitive:
- Searching "OBESITY" will find "obesity", "Obesity", and "OBESITY"
- Searching "DNA" will find "dna", "Dna", and "DNA"
- Mixed case like "COVID-19" works the same as "covid-19"
❓ Troubleshooting
Keywords Not Found?
- First, check if the PDF has quality warnings - Low quality PDFs may miss keywords
- Try Enhanced Search mode - Either from the start or use the post-search enhancement button
- Verify keyword spelling - Check for typos or alternative spellings
- Test PDF manually - If you can't copy/paste text from the PDF, keywords won't be found
- Check hyphenation - Tool handles this automatically, but verify the keyword exists as expected
Too Many Matches?
- Enhanced mode inflates counts - This is expected behavior when using Enhanced Search
- Use Simple Search for accuracy - Provides precise counts for normal PDFs
- Compare modes - Use Simple first, then Enhanced to compare results
- Recreate problematic PDFs - Best long-term solution for accurate results
Which Search Mode Should I Use?
- Start with Simple Search - Works well for 90% of PDFs
- Switch to Enhanced if:
- Simple Search shows quality warnings
- You know keywords exist but aren't found
- PDFs are scanned or image-heavy
- PDFs have complex formatting or layers
- Use Progressive Enhancement - Simple Search first, then enhance only problematic PDFs
Upload Issues?
- Check file size (100MB total limit)
- Ensure files are actually PDF format
- Try uploading files one at a time to isolate issues
Still having issues? The tool is designed to handle most PDF types, but some heavily formatted or corrupted files may not work well. When in doubt, recreate the PDF from the original source document.