Documents Feature Guide

Last Updated: November 15, 2025 Feature Status: ✅ Production Ready User Role: Organization Owner/Admin

Overview

The Documents feature allows you to upload various file types (PDFs, Word documents, text files, etc.) to your chatbot's knowledge base. Once uploaded, your AI assistant can search through these documents to answer customer questions with accurate, contextual information.

Purpose

Why Upload Documents?

Provide your chatbot with company-specific knowledge
Answer questions based on your policies, procedures, and documentation
Reduce manual responses by automating FAQs from existing documents
Keep information up-to-date without retraining the AI

Use Cases:

Company policies and procedures
Product manuals and specifications
FAQs and help articles
Training materials
Legal documents (terms, privacy policy)
Standard operating procedures (SOPs)

How It Works

1. Document Upload Process

graph LR
    A[Upload File] --> B[Parse Content]
    B --> C[Extract Text]
    C --> D[Process with AI]
    D --> E[Index & Store]
    E --> F[Ready for Search]

Step-by-Step:

Upload: User uploads file via UI
Parse: System extracts text from file format (PDF, DOCX, etc.)
Process: AI processes document using LlamaIndex or simple text extraction
Index: Document content is indexed for fast search
Store: Metadata and embeddings stored in database
Ready: Chatbot can now search this document when responding

2. Document Search During Chat

graph TB
    A[User Asks Question] --> B[Chatbot Analyzes Query]
    B --> C[Search Documents]
    C --> D{Relevant Docs?}
    D -->|Yes| E[Use Document Context]
    D -->|No| F[Use Base Knowledge]
    E --> G[Generate Response]
    F --> G

How Search Works:

User sends a message to chatbot
AI analyzes the question
Searches uploaded documents for relevant information
Uses document context to formulate accurate response
Responds with information from your documents

Accessing the Documents Feature

Navigation

Go to AI Assistants page
Click on your chatbot
Navigate to Settings
Click Knowledge Base tab (shows document management)

Direct URL: /ai-assistants/[chatbot-id]/settings/documents

Uploading Documents

Supported File Types

File Type	Extension	Max Size	Notes
PDF	`.pdf`	10MB	Best for formatted documents
Word Document	`.docx`, `.doc`	10MB	Preserves formatting
Text File	`.txt`	5MB	Plain text only
Markdown	`.md`	5MB	Formatted text
CSV	`.csv`	5MB	Data tables

Upload Methods

Method 1: File Upload (Recommended)

Steps:

Click Upload Document button
Select file from your computer
Wait for upload and processing
Document appears in list when ready

Processing Time:

Small files (<1MB): 5-10 seconds
Medium files (1-5MB): 30-60 seconds
Large files (5-10MB): 1-3 minutes

Status Indicators:

🔄 Parsing: Extracting text from file
⏳ Processing: AI is analyzing content
✅ Ready: Document is searchable
❌ Failed: Processing error occurred

Method 2: URL Import

Steps:

Click Import from URL
Paste document URL
System downloads and processes
Document added to list

Supported URLs:

Public PDFs
Google Docs (public view)
Publicly accessible text files

Method 3: Copy & Paste

Steps:

Click Add Text
Paste content directly
Give it a title
Save

Best For:

Quick FAQ additions
Copied content from websites
Short policies or procedures

Managing Documents

Document List View

Columns:

Name: Document filename
Type: File extension
Size: File size
Status: Processing status
Uploaded: Date/time uploaded
Actions: View, Download, Delete

Document Actions

View Document

What It Shows:

Original filename
File size
Upload date
Processing status
Extracted text preview
Metadata (pages, word count, etc.)

Use Case: Verify content was extracted correctly

Download Document

Function: Download original uploaded file

Use Case: Get a copy of the original file you uploaded

Edit Metadata

What You Can Edit:

Document title (for display)
Description (internal note)
Tags (for organization)

Cannot Edit: Extracted content (re-upload to change)

Delete Document

What Happens:

Document removed from knowledge base
Chatbot can no longer search this document
Original file deleted from storage
Cannot be undone

Confirmation Required: Yes (type document name to confirm)

Document Processing

How Documents Are Processed

Simple Text Extraction

Used For: Plain text files, simple PDFs

Process:

Extract text from file
Split into chunks (for efficient search)
Store chunks in database
Create searchable index

Advantages:

Fast processing
No external dependencies
Works offline

LlamaParse Cloud (Advanced)

Used For: Complex PDFs with tables, images, forms

Process:

Upload to LlamaParse Cloud service
AI parses complex layouts
Extracts tables, text, metadata
Returns structured data
Indexed and stored

Advantages:

Better accuracy for complex documents
Handles tables and forms
Preserves document structure

Requirements:

LlamaParse API key
Internet connection

Configuration: Set in .env:

LLAMA_CLOUD_API_KEY=your_api_key_here

Processing Status

Status Codes

Status	Meaning	Action Required
`pending`	Waiting to process	Wait
`parsing`	Extracting text	Wait
`processing`	AI analyzing	Wait
`ready`	Available for search	None
`failed`	Error occurred	Check logs, re-upload

Webhook Notifications

How It Works:

Document uploaded
Processing starts asynchronously
Webhook sent to /api/knowledge/webhook/parse-complete
Status updated in database
UI refreshes to show "ready"

Endpoint: POST /api/knowledge/webhook/parse-complete

Payload:

{
  "documentId": "uuid",
  "status": "ready",
  "extractedText": "...",
  "metadata": {
    "pages": 10,
    "wordCount": 5000
  }
}

Search & Retrieval

How Chatbot Searches Documents

Vector Search (Default)

Process:

User question converted to embedding (vector)
Search for similar document chunks
Rank by relevance score
Return top N results
Use in AI response

Example:

User: "What's your return policy?"
→ Searches documents for "return", "refund", "exchange"
→ Finds "Return Policy" document
→ Extracts relevant section
→ Chatbot responds: "Our return policy allows..."

Keyword Search (Fallback)

Used When: Vector search finds nothing

Process:

Extract keywords from question
Full-text search on document content
Return matching documents
Use in AI response

Search Endpoint

API Route: GET /api/knowledge/search

Query Parameters:

query (string): Search query
knowledgeBaseId (uuid): Which KB to search
limit (number): Max results (default 5)

Response:

{
  "results": [
    {
      "documentId": "uuid",
      "documentName": "Return Policy.pdf",
      "content": "We accept returns within 30 days...",
      "relevanceScore": 0.92,
      "metadata": {
        "page": 2,
        "section": "Returns"
      }
    }
  ]
}

API Reference

Upload Document

Endpoint: POST /api/knowledge/upload-document

Content-Type: multipart/form-data

Body:

const formData = new FormData();
formData.append('file', fileBlob);
formData.append('knowledgeBaseId', kbId);
formData.append('title', 'Optional Title');

Response:

{
  "success": true,
  "document": {
    "id": "uuid",
    "name": "document.pdf",
    "size": 1024000,
    "status": "pending",
    "createdAt": "2025-11-15T10:00:00Z"
  }
}

Errors:

400: Missing file or KB ID
413: File too large
415: Unsupported file type
500: Upload failed

List Documents

Endpoint: GET /api/knowledge/documents?knowledgeBaseId={id}

Response:

{
  "documents": [
    {
      "id": "uuid",
      "name": "document.pdf",
      "type": "pdf",
      "size": 1024000,
      "status": "ready",
      "uploadedAt": "2025-11-15T10:00:00Z"
    }
  ],
  "total": 10
}

Delete Document

Endpoint: DELETE /api/knowledge/documents/{id}

Response:

{
  "success": true,
  "message": "Document deleted successfully"
}

Check Processing Status

Endpoint: GET /api/knowledge/check-status/{documentId}

Response:

{
  "status": "ready",
  "progress": 100,
  "metadata": {
    "pages": 10,
    "wordCount": 5000
  }
}

Best Practices

Document Organization

Tips:

Use Descriptive Names: "Return_Policy_2025.pdf" better than "doc1.pdf"
Add Descriptions: Help identify documents later
Tag Documents: Group related docs (e.g., "policies", "products")
Regular Updates: Replace outdated documents with new versions

Content Guidelines

What Works Best:

✅ Clear, well-structured content
✅ Headings and sections
✅ Factual information
✅ FAQs in Q&A format
✅ Consistent terminology

What Doesn't Work Well:

❌ Scanned images without OCR
❌ Heavily formatted tables (use CSV instead)
❌ Password-protected files
❌ Corrupted files

Performance Optimization

For Faster Search:

Break Large Docs: Split 100-page PDFs into sections
Remove Fluff: Upload only relevant content
Update Regularly: Remove outdated docs
Use Tags: Helps narrow search scope

Limits:

Max documents: Unlimited (subject to storage limits)
Max file size: 10MB
Max total storage: Based on subscription tier

Troubleshooting

Document Won't Upload

Issue: Upload fails immediately

Solutions:

Check file size (<10MB)
Verify file type is supported
Try different browser
Check internet connection
Contact support if persists

Processing Stuck

Issue: Document stays in "processing" status

Solutions:

Wait 5 minutes (large files take time)
Refresh page
Check webhook endpoint is accessible
Review server logs
Re-upload document

Search Not Finding Content

Issue: Chatbot doesn't use document

Solutions:

Verify document status is "ready"
Check document content (View action)
Test with exact phrases from document
Ensure KB is linked to chatbot
Try different wording in question

Extracted Text Looks Wrong

Issue: Text garbled or missing

Solutions:

Use LlamaParse for complex PDFs
Convert to text format first
Use OCR for scanned images
Try different file format
Manually copy/paste content

Security & Privacy

Data Storage

Where Documents Are Stored:

Files: Supabase Storage (encrypted at rest)
Text Content: PostgreSQL database
Embeddings: Vector database

Access Control:

Only organization members can view
No cross-organization access
Admins can manage all documents

Data Deletion

What Gets Deleted:

Original file from storage
Extracted text from database
All embeddings and vectors
Document metadata

Permanent: Cannot be recovered

Compliance

GDPR:

Users can request document deletion
Export feature available
Audit logs tracked

Data Retention:

Documents stored until manually deleted
No automatic expiration
Can set custom retention policies

Subscription Limits

Free Tier

Documents: 10 documents
Max File Size: 5MB
Total Storage: 50MB
Processing: Simple extraction only

Pro Tier

Documents: 100 documents
Max File Size: 10MB
Total Storage: 1GB
Processing: LlamaParse included

Enterprise Tier

Documents: Unlimited
Max File Size: 50MB
Total Storage: Unlimited
Processing: Priority processing
Custom integrations

Documents Feature Guide

Last Updated: November 15, 2025 Feature Status: ✅ Production Ready User Role: Organization Owner/Admin

Overview

Purpose

Why Upload Documents?

Provide your chatbot with company-specific knowledge
Answer questions based on your policies, procedures, and documentation
Reduce manual responses by automating FAQs from existing documents
Keep information up-to-date without retraining the AI

Use Cases:

Company policies and procedures
Product manuals and specifications
FAQs and help articles
Training materials
Legal documents (terms, privacy policy)
Standard operating procedures (SOPs)

How It Works

1. Document Upload Process

graph LR
    A[Upload File] --> B[Parse Content]
    B --> C[Extract Text]
    C --> D[Process with AI]
    D --> E[Index & Store]
    E --> F[Ready for Search]

Step-by-Step:

Upload: User uploads file via UI
Parse: System extracts text from file format (PDF, DOCX, etc.)
Process: AI processes document using LlamaIndex or simple text extraction
Index: Document content is indexed for fast search
Store: Metadata and embeddings stored in database
Ready: Chatbot can now search this document when responding

2. Document Search During Chat

graph TB
    A[User Asks Question] --> B[Chatbot Analyzes Query]
    B --> C[Search Documents]
    C --> D{Relevant Docs?}
    D -->|Yes| E[Use Document Context]
    D -->|No| F[Use Base Knowledge]
    E --> G[Generate Response]
    F --> G

How Search Works:

User sends a message to chatbot
AI analyzes the question
Searches uploaded documents for relevant information
Uses document context to formulate accurate response
Responds with information from your documents

Accessing the Documents Feature

Navigation

Go to AI Assistants page
Click on your chatbot
Navigate to Settings
Click Knowledge Base tab (shows document management)

Direct URL: /ai-assistants/[chatbot-id]/settings/documents

Uploading Documents

Supported File Types

File Type	Extension	Max Size	Notes
PDF	`.pdf`	10MB	Best for formatted documents
Word Document	`.docx`, `.doc`	10MB	Preserves formatting
Text File	`.txt`	5MB	Plain text only
Markdown	`.md`	5MB	Formatted text
CSV	`.csv`	5MB	Data tables

Upload Methods

Method 1: File Upload (Recommended)

Steps:

Click Upload Document button
Select file from your computer
Wait for upload and processing
Document appears in list when ready

Processing Time:

Small files (<1MB): 5-10 seconds
Medium files (1-5MB): 30-60 seconds
Large files (5-10MB): 1-3 minutes

Status Indicators:

🔄 Parsing: Extracting text from file
⏳ Processing: AI is analyzing content
✅ Ready: Document is searchable
❌ Failed: Processing error occurred

Method 2: URL Import

Steps:

Click Import from URL
Paste document URL
System downloads and processes
Document added to list

Supported URLs:

Public PDFs
Google Docs (public view)
Publicly accessible text files

Method 3: Copy & Paste

Steps:

Click Add Text
Paste content directly
Give it a title
Save

Best For:

Quick FAQ additions
Copied content from websites
Short policies or procedures

Managing Documents

Document List View

Columns:

Name: Document filename
Type: File extension
Size: File size
Status: Processing status
Uploaded: Date/time uploaded
Actions: View, Download, Delete

Document Actions

View Document

What It Shows:

Original filename
File size
Upload date
Processing status
Extracted text preview
Metadata (pages, word count, etc.)

Use Case: Verify content was extracted correctly

Download Document

Function: Download original uploaded file

Use Case: Get a copy of the original file you uploaded

Edit Metadata

What You Can Edit:

Document title (for display)
Description (internal note)
Tags (for organization)

Cannot Edit: Extracted content (re-upload to change)

Delete Document

What Happens:

Document removed from knowledge base
Chatbot can no longer search this document
Original file deleted from storage
Cannot be undone

Confirmation Required: Yes (type document name to confirm)

Document Processing

How Documents Are Processed

Simple Text Extraction

Used For: Plain text files, simple PDFs

Process:

Extract text from file
Split into chunks (for efficient search)
Store chunks in database
Create searchable index

Advantages:

Fast processing
No external dependencies
Works offline

LlamaParse Cloud (Advanced)

Used For: Complex PDFs with tables, images, forms

Process:

Upload to LlamaParse Cloud service
AI parses complex layouts
Extracts tables, text, metadata
Returns structured data
Indexed and stored

Advantages:

Better accuracy for complex documents
Handles tables and forms
Preserves document structure

Requirements:

LlamaParse API key
Internet connection

Configuration: Set in .env:

LLAMA_CLOUD_API_KEY=your_api_key_here

Processing Status

Status Codes

Status	Meaning	Action Required
`pending`	Waiting to process	Wait
`parsing`	Extracting text	Wait
`processing`	AI analyzing	Wait
`ready`	Available for search	None
`failed`	Error occurred	Check logs, re-upload

Webhook Notifications

How It Works:

Document uploaded
Processing starts asynchronously
Webhook sent to /api/knowledge/webhook/parse-complete
Status updated in database
UI refreshes to show "ready"

Endpoint: POST /api/knowledge/webhook/parse-complete

Payload:

{
  "documentId": "uuid",
  "status": "ready",
  "extractedText": "...",
  "metadata": {
    "pages": 10,
    "wordCount": 5000
  }
}

Search & Retrieval

How Chatbot Searches Documents

Vector Search (Default)

Process:

User question converted to embedding (vector)
Search for similar document chunks
Rank by relevance score
Return top N results
Use in AI response

Example:

User: "What's your return policy?"
→ Searches documents for "return", "refund", "exchange"
→ Finds "Return Policy" document
→ Extracts relevant section
→ Chatbot responds: "Our return policy allows..."

Keyword Search (Fallback)

Used When: Vector search finds nothing

Process:

Extract keywords from question
Full-text search on document content
Return matching documents
Use in AI response

Search Endpoint

API Route: GET /api/knowledge/search

Query Parameters:

query (string): Search query
knowledgeBaseId (uuid): Which KB to search
limit (number): Max results (default 5)

Response:

{
  "results": [
    {
      "documentId": "uuid",
      "documentName": "Return Policy.pdf",
      "content": "We accept returns within 30 days...",
      "relevanceScore": 0.92,
      "metadata": {
        "page": 2,
        "section": "Returns"
      }
    }
  ]
}

API Reference

Upload Document

Endpoint: POST /api/knowledge/upload-document

Content-Type: multipart/form-data

Body:

const formData = new FormData();
formData.append('file', fileBlob);
formData.append('knowledgeBaseId', kbId);
formData.append('title', 'Optional Title');

Response:

{
  "success": true,
  "document": {
    "id": "uuid",
    "name": "document.pdf",
    "size": 1024000,
    "status": "pending",
    "createdAt": "2025-11-15T10:00:00Z"
  }
}

Errors:

400: Missing file or KB ID
413: File too large
415: Unsupported file type
500: Upload failed

List Documents

Endpoint: GET /api/knowledge/documents?knowledgeBaseId={id}

Response:

{
  "documents": [
    {
      "id": "uuid",
      "name": "document.pdf",
      "type": "pdf",
      "size": 1024000,
      "status": "ready",
      "uploadedAt": "2025-11-15T10:00:00Z"
    }
  ],
  "total": 10
}

Delete Document

Endpoint: DELETE /api/knowledge/documents/{id}

Response:

{
  "success": true,
  "message": "Document deleted successfully"
}

Check Processing Status

Endpoint: GET /api/knowledge/check-status/{documentId}

Response:

{
  "status": "ready",
  "progress": 100,
  "metadata": {
    "pages": 10,
    "wordCount": 5000
  }
}

Best Practices

Document Organization

Tips:

Use Descriptive Names: "Return_Policy_2025.pdf" better than "doc1.pdf"
Add Descriptions: Help identify documents later
Tag Documents: Group related docs (e.g., "policies", "products")
Regular Updates: Replace outdated documents with new versions

Content Guidelines

What Works Best:

✅ Clear, well-structured content
✅ Headings and sections
✅ Factual information
✅ FAQs in Q&A format
✅ Consistent terminology

What Doesn't Work Well:

❌ Scanned images without OCR
❌ Heavily formatted tables (use CSV instead)
❌ Password-protected files
❌ Corrupted files

Performance Optimization

For Faster Search:

Break Large Docs: Split 100-page PDFs into sections
Remove Fluff: Upload only relevant content
Update Regularly: Remove outdated docs
Use Tags: Helps narrow search scope

Limits:

Max documents: Unlimited (subject to storage limits)
Max file size: 10MB
Max total storage: Based on subscription tier

Troubleshooting

Document Won't Upload

Issue: Upload fails immediately

Solutions:

Check file size (<10MB)
Verify file type is supported
Try different browser
Check internet connection
Contact support if persists

Processing Stuck

Issue: Document stays in "processing" status

Solutions:

Wait 5 minutes (large files take time)
Refresh page
Check webhook endpoint is accessible
Review server logs
Re-upload document

Search Not Finding Content

Issue: Chatbot doesn't use document

Solutions:

Verify document status is "ready"
Check document content (View action)
Test with exact phrases from document
Ensure KB is linked to chatbot
Try different wording in question

Extracted Text Looks Wrong

Issue: Text garbled or missing

Solutions:

Use LlamaParse for complex PDFs
Convert to text format first
Use OCR for scanned images
Try different file format
Manually copy/paste content

Security & Privacy

Data Storage

Where Documents Are Stored:

Files: Supabase Storage (encrypted at rest)
Text Content: PostgreSQL database
Embeddings: Vector database

Access Control:

Only organization members can view
No cross-organization access
Admins can manage all documents

Data Deletion

What Gets Deleted:

Original file from storage
Extracted text from database
All embeddings and vectors
Document metadata

Permanent: Cannot be recovered

Compliance

GDPR:

Users can request document deletion
Export feature available
Audit logs tracked

Data Retention:

Documents stored until manually deleted
No automatic expiration
Can set custom retention policies

Subscription Limits

Free Tier

Documents: 10 documents
Max File Size: 5MB
Total Storage: 50MB
Processing: Simple extraction only

Pro Tier

Documents: 100 documents
Max File Size: 10MB
Total Storage: 1GB
Processing: LlamaParse included

Enterprise Tier

Documents: Unlimited
Max File Size: 50MB
Total Storage: Unlimited
Processing: Priority processing
Custom integrations