Complete Metadata Extraction
View all standard and custom metadata fields: author, creation date, modification date, PDF producer, software version, and custom keys (e.g., document ID, copyright, classification). Identify when and how the PDF was created.
- Author, title, subject, keywords
- Creation & modification timestamps (including timezone)
- Custom XMP metadata and hidden properties
Text & Content Analysis
Extract all text from the PDF with position information. Analyze word count, character count, font usage, and read difficulty. Detect text layers (searchable vs scanned). Identify hidden or invisible text.
- Full text extraction with page-by-page breakdown
- Detect OCR quality and text layer presence
- Highlight invisible or white-on-white hidden text
Extracted Images
List every image inside the PDF: format (JPEG, PNG, CCITT), resolution, color space, compression level, and size. Detect embedded videos, 3D objects, JavaScript, or attachments – crucial for security audits.
- Image count, dimensions, DPI, compression type
- Identify suspicious embedded files or scripts
- Extract and preview images inline
Font & Typography Deep Dive
Discover all fonts used in the document – including embedded, subset, and system fonts. Check for missing fonts, font type (TrueType, Type1, OpenType), and actual text-to-font mapping.
- List of font names, types, and embedding status
- Detect font substitution risks (for print reliability)
- Verify if fonts are fully embedded (good for archiving)
Document Structure & Navigation
Analyze bookmarks (outline tree), page labels, logical page order, article threads, and internal/external links. Understand how the document is organized – essential for e-book validation.
- Bookmark hierarchy and target page numbers
- Broken internal links detection
- Page transition effects and presentation settings
Security & Hidden Risk Detection
Check for encryption, password protection, and permission flags (printing, copying, editing). Detect potentially malicious elements: JavaScript, launch actions, embedded files, or forms that submit external data – critical for zero-trust document workflows.
- Encryption level (AES-128/256) and password presence
- Flag suspicious actions (URI, JavaScript, SubmitForm)
- Identify PDF/A compliance and digital signatures
Form Fields & Annotation Analysis
Extract all interactive form fields: text inputs, checkboxes, radio buttons, dropdowns, and signature fields. See field names, default values, validation scripts, and calculation order.
- Count and list all form fields per page
- Detect hidden fields or pre-filled data
- Analyze annotation types (sticky notes, highlights, stamps)
Page Dimensions & Quality Metrics
Get detailed per-page statistics: page size (e.g., A4, Letter), orientation, rotation, content complexity, number of objects, compression efficiency, and estimated file size per page.
- Page dimensions in points, mm, inches
- Identify unusually large pages (performance issues)
- Detect mixed page sizes in one document
Document Comparison (Version Diff)
Upload two versions of a PDF and instantly visualize differences: added/deleted text, moved images, changed metadata, or altered annotations. Ideal for contract review and revision tracking.
- Text-level diff highlighting (add/remove/modify)
- Metadata and structure comparison
- Export comparison report as JSON or HTML
Best Practices for PDF Analysis
Always analyze PDFs from untrusted sources before opening. Use metadata to verify document authenticity. For e-books, check text layer quality and font embedding. For legal documents, run security audits to detect hidden edits.
- Scan suspicious PDFs for JavaScript and launch actions
- Validate PDF/A compliance for long-term archiving
- Compare signed vs unsigned versions to detect tampering
- Use analysis before redaction to locate all sensitive data