Documents Feature Guide
Last Updated: November 15, 2025 Feature Status: ✅ Production Ready User Role: Organization Owner/Admin
Overview
The Documents feature allows you to upload various file types (PDFs, Word documents, text files, etc.) to your chatbot's knowledge base. Once uploaded, your AI assistant can search through these documents to answer customer questions with accurate, contextual information.
Purpose
Why Upload Documents?
- Provide your chatbot with company-specific knowledge
- Answer questions based on your policies, procedures, and documentation
- Reduce manual responses by automating FAQs from existing documents
- Keep information up-to-date without retraining the AI
Use Cases:
- Company policies and procedures
- Product manuals and specifications
- FAQs and help articles
- Training materials
- Legal documents (terms, privacy policy)
- Standard operating procedures (SOPs)
How It Works
1. Document Upload Process
graph LR
A[Upload File] --> B[Parse Content]
B --> C[Extract Text]
C --> D[Process with AI]
D --> E[Index & Store]
E --> F[Ready for Search]
Step-by-Step:
- Upload: User uploads file via UI
- Parse: System extracts text from file format (PDF, DOCX, etc.)
- Process: AI processes document using LlamaIndex or simple text extraction
- Index: Document content is indexed for fast search
- Store: Metadata and embeddings stored in database
- Ready: Chatbot can now search this document when responding
2. Document Search During Chat
graph TB
A[User Asks Question] --> B[Chatbot Analyzes Query]
B --> C[Search Documents]
C --> D{Relevant Docs?}
D -->|Yes| E[Use Document Context]
D -->|No| F[Use Base Knowledge]
E --> G[Generate Response]
F --> G
How Search Works:
- User sends a message to chatbot
- AI analyzes the question
- Searches uploaded documents for relevant information
- Uses document context to formulate accurate response
- Responds with information from your documents
Accessing the Documents Feature
Navigation
- Go to AI Assistants page
- Click on your chatbot
- Navigate to Settings
- Click Knowledge Base tab (shows document management)
Direct URL: /ai-assistants/[chatbot-id]/settings/documents
Uploading Documents
Supported File Types
| File Type | Extension | Max Size | Notes |
|---|---|---|---|
.pdf | 10MB | Best for formatted documents | |
| Word Document | .docx, .doc | 10MB | Preserves formatting |
| Text File | .txt | 5MB | Plain text only |
| Markdown | .md | 5MB | Formatted text |
| CSV | .csv | 5MB | Data tables |
Upload Methods
Method 1: File Upload (Recommended)
Steps:
- Click Upload Document button
- Select file from your computer
- Wait for upload and processing
- Document appears in list when ready
Processing Time:
- Small files (<1MB): 5-10 seconds
- Medium files (1-5MB): 30-60 seconds
- Large files (5-10MB): 1-3 minutes
Status Indicators:
- 🔄 Parsing: Extracting text from file
- ⏳ Processing: AI is analyzing content
- ✅ Ready: Document is searchable
- ❌ Failed: Processing error occurred
Method 2: URL Import
Steps:
- Click Import from URL
- Paste document URL
- System downloads and processes
- Document added to list
Supported URLs:
- Public PDFs
- Google Docs (public view)
- Publicly accessible text files
Method 3: Copy & Paste
Steps:
- Click Add Text
- Paste content directly
- Give it a title
- Save
Best For:
- Quick FAQ additions
- Copied content from websites
- Short policies or procedures
Managing Documents
Document List View
Columns:
- Name: Document filename
- Type: File extension
- Size: File size
- Status: Processing status
- Uploaded: Date/time uploaded
- Actions: View, Download, Delete
Document Actions
View Document
What It Shows:
- Original filename
- File size
- Upload date
- Processing status
- Extracted text preview
- Metadata (pages, word count, etc.)
Use Case: Verify content was extracted correctly
Download Document
Function: Download original uploaded file
Use Case: Get a copy of the original file you uploaded
Edit Metadata
What You Can Edit:
- Document title (for display)
- Description (internal note)
- Tags (for organization)
Cannot Edit: Extracted content (re-upload to change)
Delete Document
What Happens:
- Document removed from knowledge base
- Chatbot can no longer search this document
- Original file deleted from storage
- Cannot be undone
Confirmation Required: Yes (type document name to confirm)
Document Processing
How Documents Are Processed
Simple Text Extraction
Used For: Plain text files, simple PDFs
Process:
- Extract text from file
- Split into chunks (for efficient search)
- Store chunks in database
- Create searchable index
Advantages:
- Fast processing
- No external dependencies
- Works offline
LlamaParse Cloud (Advanced)
Used For: Complex PDFs with tables, images, forms
Process:
- Upload to LlamaParse Cloud service
- AI parses complex layouts
- Extracts tables, text, metadata
- Returns structured data
- Indexed and stored
Advantages:
- Better accuracy for complex documents
- Handles tables and forms
- Preserves document structure
Requirements:
- LlamaParse API key
- Internet connection
Configuration: Set in .env:
LLAMA_CLOUD_API_KEY=your_api_key_here
Processing Status
Status Codes
| Status | Meaning | Action Required |
|---|---|---|
pending | Waiting to process | Wait |
parsing | Extracting text | Wait |
processing | AI analyzing | Wait |
ready | Available for search | None |
failed | Error occurred | Check logs, re-upload |
Webhook Notifications
How It Works:
- Document uploaded
- Processing starts asynchronously
- Webhook sent to
/api/knowledge/webhook/parse-complete - Status updated in database
- UI refreshes to show "ready"
Endpoint: POST /api/knowledge/webhook/parse-complete
Payload:
{
"documentId": "uuid",
"status": "ready",
"extractedText": "...",
"metadata": {
"pages": 10,
"wordCount": 5000
}
}
Search & Retrieval
How Chatbot Searches Documents
Vector Search (Default)
Process:
- User question converted to embedding (vector)
- Search for similar document chunks
- Rank by relevance score
- Return top N results
- Use in AI response
Example:
User: "What's your return policy?"
→ Searches documents for "return", "refund", "exchange"
→ Finds "Return Policy" document
→ Extracts relevant section
→ Chatbot responds: "Our return policy allows..."
Keyword Search (Fallback)
Used When: Vector search finds nothing
Process:
- Extract keywords from question
- Full-text search on document content
- Return matching documents
- Use in AI response
Search Endpoint
API Route: GET /api/knowledge/search
Query Parameters:
query(string): Search queryknowledgeBaseId(uuid): Which KB to searchlimit(number): Max results (default 5)
Response:
{
"results": [
{
"documentId": "uuid",
"documentName": "Return Policy.pdf",
"content": "We accept returns within 30 days...",
"relevanceScore": 0.92,
"metadata": {
"page": 2,
"section": "Returns"
}
}
]
}
API Reference
Upload Document
Endpoint: POST /api/knowledge/upload-document
Content-Type: multipart/form-data
Body:
const formData = new FormData();
formData.append('file', fileBlob);
formData.append('knowledgeBaseId', kbId);
formData.append('title', 'Optional Title');
Response:
{
"success": true,
"document": {
"id": "uuid",
"name": "document.pdf",
"size": 1024000,
"status": "pending",
"createdAt": "2025-11-15T10:00:00Z"
}
}
Errors:
400: Missing file or KB ID413: File too large415: Unsupported file type500: Upload failed
List Documents
Endpoint: GET /api/knowledge/documents?knowledgeBaseId={id}
Response:
{
"documents": [
{
"id": "uuid",
"name": "document.pdf",
"type": "pdf",
"size": 1024000,
"status": "ready",
"uploadedAt": "2025-11-15T10:00:00Z"
}
],
"total": 10
}
Delete Document
Endpoint: DELETE /api/knowledge/documents/{id}
Response:
{
"success": true,
"message": "Document deleted successfully"
}
Check Processing Status
Endpoint: GET /api/knowledge/check-status/{documentId}
Response:
{
"status": "ready",
"progress": 100,
"metadata": {
"pages": 10,
"wordCount": 5000
}
}
Best Practices
Document Organization
Tips:
- Use Descriptive Names: "Return_Policy_2025.pdf" better than "doc1.pdf"
- Add Descriptions: Help identify documents later
- Tag Documents: Group related docs (e.g., "policies", "products")
- Regular Updates: Replace outdated documents with new versions
Content Guidelines
What Works Best:
- ✅ Clear, well-structured content
- ✅ Headings and sections
- ✅ Factual information
- ✅ FAQs in Q&A format
- ✅ Consistent terminology
What Doesn't Work Well:
- ❌ Scanned images without OCR
- ❌ Heavily formatted tables (use CSV instead)
- ❌ Password-protected files
- ❌ Corrupted files
Performance Optimization
For Faster Search:
- Break Large Docs: Split 100-page PDFs into sections
- Remove Fluff: Upload only relevant content
- Update Regularly: Remove outdated docs
- Use Tags: Helps narrow search scope
Limits:
- Max documents: Unlimited (subject to storage limits)
- Max file size: 10MB
- Max total storage: Based on subscription tier
Troubleshooting
Document Won't Upload
Issue: Upload fails immediately
Solutions:
- Check file size (<10MB)
- Verify file type is supported
- Try different browser
- Check internet connection
- Contact support if persists
Processing Stuck
Issue: Document stays in "processing" status
Solutions:
- Wait 5 minutes (large files take time)
- Refresh page
- Check webhook endpoint is accessible
- Review server logs
- Re-upload document
Search Not Finding Content
Issue: Chatbot doesn't use document
Solutions:
- Verify document status is "ready"
- Check document content (View action)
- Test with exact phrases from document
- Ensure KB is linked to chatbot
- Try different wording in question
Extracted Text Looks Wrong
Issue: Text garbled or missing
Solutions:
- Use LlamaParse for complex PDFs
- Convert to text format first
- Use OCR for scanned images
- Try different file format
- Manually copy/paste content
Security & Privacy
Data Storage
Where Documents Are Stored:
- Files: Supabase Storage (encrypted at rest)
- Text Content: PostgreSQL database
- Embeddings: Vector database
Access Control:
- Only organization members can view
- No cross-organization access
- Admins can manage all documents
Data Deletion
What Gets Deleted:
- Original file from storage
- Extracted text from database
- All embeddings and vectors
- Document metadata
Permanent: Cannot be recovered
Compliance
GDPR:
- Users can request document deletion
- Export feature available
- Audit logs tracked
Data Retention:
- Documents stored until manually deleted
- No automatic expiration
- Can set custom retention policies
Subscription Limits
Free Tier
- Documents: 10 documents
- Max File Size: 5MB
- Total Storage: 50MB
- Processing: Simple extraction only
Pro Tier
- Documents: 100 documents
- Max File Size: 10MB
- Total Storage: 1GB
- Processing: LlamaParse included
Enterprise Tier
- Documents: Unlimited
- Max File Size: 50MB
- Total Storage: Unlimited
- Processing: Priority processing
- Custom integrations