Dezcry Platform
Documentation
Everything you need to know about using Dezcry — from ingesting documents through to disclosure-ready export.
Getting Started
Platform Overview
Dezcry is a self-service, AI-powered eDiscovery platform for privacy, legal, and compliance teams. It provides a complete workflow to ingest documents, review responsive material, apply AI-powered redactions, classify documents, search, and export disclosure-ready sets — all with a full audit trail and role-based access controls.
Unlike heavyweight eDiscovery suites, Dezcry is designed for internal teams who need a streamlined, defensible process without specialist eDiscovery admins or outsourced review support. All AI models run on internal infrastructure within the same Azure environment — no document data is sent to third-party AI services.
- Ingest 100+ file types including PST, EML, ZIP, Office, PDF, images, audio, and video
- Automatic deduplication, email threading, and NIST filtering
- AI-assisted redaction with a 5-layer detection pipeline
- AI-assisted classification with custom fields and confidence scoring
- eDiscovery-grade keyword search (Elasticsearch-powered, dtSearch equivalent)
- LLM-powered document summaries and conversational document Q&A
- AI OCR for image-heavy documents
- Production-ready export with Bates numbering, load files, and burned redactions
- Complete audit trail logging every action for regulatory defensibility
- Role-based access control with matter-level permissions
Key Concepts
| Concept | Description |
|---|---|
| Matter | A container for a single DSAR or investigation. All documents, redactions, classifications, exports, and audit logs are scoped to a matter. Matters have a unique code, client name, type, and status. |
| Document | A single file within a matter — an email, attachment, PDF, spreadsheet, image, audio, or video file. Each document has extracted text, metadata, a preview, and can carry reviewer decisions. |
| Family | A group of related documents — typically an email and its attachments. The parent email and child attachments share a family ID for grouped review. |
| Custodian | The person or data source from which documents were collected. Tracked per upload batch for chain-of-custody purposes. |
| Saved Search | A reusable query with filters that can be used as the scope for redaction, classification, export, or search term reports. |
| Redaction Set | A batch AI redaction job that processes a scope of documents through the 5-layer pipeline, producing redaction entries for review. |
| Classification Set | A batch AI classification job that applies custom decision fields to documents with confidence scoring. |
| Export Set | A configured export template with numbering, branding, and output settings that produces disclosure-ready packages. |
| Audit Log | An immutable record of every significant action taken in the platform, providing a defensible trail for regulators. |
Signing In
Navigate to your Dezcry instance's login page and enter your email address and password. If your organisation has enabled two-factor authentication (2FA), you will be prompted to enter a time-based one-time password (TOTP) from your authenticator app after entering your credentials.
If you have been invited to join Dezcry, you will receive an email with a unique invitation link. Click the link to set up your password and configure 2FA. Invitation links are single-use and expire after a set period.
Sessions automatically expire after 30 minutes of inactivity. Your session token is refreshed automatically every 20 minutes while you are active. If your session expires, a full-screen overlay will prompt you to sign in again — any unsaved work in progress is preserved in your browser.
Matters
Creating a Matter
A matter is the top-level container in Dezcry. Each DSAR, investigation, or review project is organised as a separate matter with its own documents, workflows, users, and audit trail.
To create a matter, navigate to the Matters page and click Create Matter (admin role required). You will be asked to provide:
| Field | Description |
|---|---|
| Name | A descriptive name for the matter (e.g. "Smith DSAR - Q1 2025"). |
| Matter Code | A unique 6-character alphanumeric code, auto-generated but editable. |
| Client Name | The organisation or client the matter relates to. |
| Matter Type | One of: DSAR, Investigation, Litigation, Cyber, or Other. |
| Description | Optional long-form description of the matter scope and objectives. |
| Summary Language | The language for AI-generated summaries (e.g. English, German, French). |
| Hosting Location | The Azure region for data residency (e.g. Australia, Switzerland, Germany, UK). |
Matter Dashboard
Clicking into a matter takes you to the matter dashboard — the central workspace for that matter. The dashboard shows a searchable, filterable table of all documents in the matter, along with access to all matter-scoped features via the sidebar navigation:
- Documents — browse, search, filter, and review all documents
- Upload — ingest new documents into the matter
- Redaction — create and manage AI redaction sets
- Classification — configure and run AI classification jobs
- Export — build and run disclosure-ready export packages
- Search Terms — create keyword search term sets and reports
- AI OCR — run optical character recognition on image documents
- Password Bank — manage passwords for encrypted files
- Audit — view the complete audit trail for this matter
- Reporting — view analytics dashboards and metrics
- Billing — view storage usage and costs for this matter
The document table supports bulk actions — select multiple documents to apply batch operations such as tagging, classification, or status changes. A background task tray shows the status of any running jobs (redaction, classification, export) in the matter.
Matter Settings
Matter settings control the behaviour of AI features and reviewer workflows within the matter. Administrators can configure:
- Decision fields — custom fields that reviewers can set on each document (e.g. "Relevance", "Privilege Status", "Data Category"). Fields can be single-select, multi-select, or free text.
- Summary language — the language used for AI-generated document summaries.
- Matter status — open, closed, or archived. Closed matters are read-only; archived matters are hidden from the default view.
Document Ingestion
Uploading Documents
Navigate to the Upload page within a matter to ingest documents. Dezcry supports drag-and-drop file upload or traditional file selection. You can upload individual files or container files (PST, ZIP, 7Z, RAR, TAR, GZ) which will be automatically extracted.
Before processing begins, configure the following options:
| Option | Description |
|---|---|
| Deduplication Mode | Choose "Global" to automatically identify and flag duplicate files across the entire matter using SHA-256 hashing. Duplicates are preserved but marked, saving reviewer time. |
| NIST Filtering | Enable to automatically filter out known system and runtime files (from the NIST National Software Reference Library) that are never relevant to review. |
| OCR | Enable to run Optical Character Recognition on image-based documents, extracting searchable text from scanned PDFs, photographs, and image files. |
| Email Threading | Enable to group related emails into conversation threads, identifying which messages are "inclusive" (contain unique content) versus non-inclusive duplicates. |
| Inclusive Only | When email threading is enabled, optionally exclude non-inclusive emails from the review workspace to reduce volume. |
You may also specify custodian information and data source metadata for chain-of-custody tracking. Available data sources include: Laptop, Desktop, Server, O365 Email, O365 OneDrive, SharePoint, Google Workspace, Mobile Device, External Hard Drive, USB Drive, Network Share, Cloud Storage, Backup Tape, Database, and Other.
Supported File Types
Dezcry supports over 100 file types out of the box. During ingestion, all files are extracted, their text content is parsed, metadata is captured, and they are indexed for search.
| Category | Formats |
|---|---|
| PST, OST, EML, MSG, MBOX | |
| Documents | DOCX, DOC, PDF, RTF, TXT, ODT |
| Spreadsheets | XLSX, XLS, CSV, ODS |
| Presentations | PPTX, PPT, ODP |
| Archives | ZIP, RAR, 7Z, TAR, GZ |
| Images | PNG, JPG, JPEG, TIFF, BMP, GIF (with OCR) |
| Audio | MP3, WAV, M4A, OGG, FLAC |
| Video | MP4, AVI, MOV, MKV, WEBM |
| Web / Data | HTML, XML, JSON, CSV |
Deduplication
When global deduplication is enabled, Dezcry performs top-level exact deduplication — the standard approach used in eDiscovery. This is an important distinction: Dezcry identifies and removes files that are byte-for-byte identical based on their MD5 hash, but it does so at the top level of the document hierarchy.
In eDiscovery, "top-level" deduplication means dedup is applied to standalone documents and parent containers (emails, archives) rather than to individual attachments or child items in isolation. When a top-level file is identified as a duplicate, the entire document and its family (including all attachments) are removed together — preserving the integrity of document families.
This differs from "attachment-level" deduplication, which would independently remove individual attachments that appear across multiple emails. Top-level dedup preserves the complete context of each email and its attachments as a unit, which is critical for defensible review — a reviewer always sees the full email with all of its attachments intact, never a partial family.
It also differs from near-deduplication, which identifies files that are similar but not identical (e.g. different versions of the same document). Dezcry's deduplication is strictly exact-match — only byte-for-byte identical files are flagged.
Deduplication is scoped globally across the entire matter, meaning a file uploaded by one custodian will be deduplicated against files from all other custodians in the same matter. The first instance ingested is kept as the master documentand all subsequent identical copies are removed. Deduplication results include:
- Master document — the first instance of each unique file, retained in the review set with full metadata and family relationships
- Duplicate group — all copies of the same file, linked back to the master for audit purposes
- Bytes saved — total storage savings from removing duplicate copies
- Custodian tracking — the system records which custodians held copies of each deduplicated file, preserving chain-of-custody information even though the duplicate copies are removed from the active review set
The upload summary report details every duplicate group with file names, sizes, and the master document reference. This provides a defensible record of exactly what was deduplicated and why.
Email Threading
Email threading groups related emails into conversation threads, identifying the hierarchical reply chain. Threading is applied at the point of ingestion, which means non-inclusive emails are identified and can be excluded from the review workspace before any downstream processing occurs. This is a deliberate design choice — by filtering out redundant emails upfront, organisations save significantly on hosting costs (less storage, smaller search indices) and AI processing costs (fewer documents to classify, redact, and summarise).
Each email in a thread is classified as:
- Inclusive — contains unique content or attachments not present in later messages in the thread. These are the messages reviewers should focus on, as they represent the most complete version of each point in the conversation.
- Non-inclusive — the full content of this email is already contained in a later, more complete message in the thread. Reviewing these would be redundant, as the inclusive message already captures everything.
When the Inclusive Only option is enabled during upload, non-inclusive emails are excluded from the active review workspace entirely. They are still retained in the system for audit purposes, but they do not count toward hosting storage, are not indexed for search, and are not processed by AI classification, redaction, or summarisation — directly reducing costs.
Threading uses email headers (Message-ID, In-Reply-To, References) and the Microsoft Exchange Conversation Index to build accurate thread trees. The threading summary reports:
- Total emails processed and how many were threadable
- Number of inclusive vs. non-inclusive messages
- Non-inclusive emails excluded from the review workspace
- Thread groups identified
- Any threading errors encountered
NIST Filtering
NIST filtering removes known system files, operating system components, and software runtime files from the review set. These files are identified by matching their hash values against the NIST National Software Reference Library (NSRL) — a comprehensive database of known, non-relevant system files.
NIST-filtered files are flagged and excluded from the active review workspace but are retained in the system for audit purposes. The upload summary reports the count and details of filtered files.
Processing Exceptions
During ingestion, some documents may encounter processing exceptions. Dezcry tracks and reports these in the upload summary:
| Exception Type | Description |
|---|---|
| Encrypted | Password-protected files that could not be decrypted. Add passwords to the Password Bank and re-process. |
| Corrupt | Files that are malformed, truncated, or otherwise unreadable. |
| Unsupported Format | File types that Dezcry does not currently support for text extraction. |
| Text Extraction Failed | Files where the content could not be extracted despite being a supported format. |
Each exception includes the document ID, filename, exception type, and a descriptive message to help diagnose and resolve the issue.
Upload Batches
Every upload creates a processing batch with a unique display ID (e.g. UPL-001). Navigate to the Uploads page to view all batches for the matter, including:
- Batch status (processing, completed, failed)
- Total files submitted and processed
- Counts by outcome (processed OK, encrypted, corrupt, duplicates, NIST-filtered)
- Decryption results (successful, failed)
- Children extracted (attachments from container files)
- File type distribution
- Processing duration
- Upload set MD5 hash for chain-of-custody verification
Click into any batch to see the detailed processing report, including per-document exception details, deduplication groups, and threading statistics.
Document Review
Document List
The main matter workspace displays all documents in a searchable, sortable table. Each row shows the document's filename, type, status, size, custodian, and any applied tags or decisions. Key features include:
- Full-text search — keyword search across document content, filenames, and email metadata using eDiscovery-grade Elasticsearch
- Column filters — filter by status, file type, custodian, date ranges, tags, relevance coding, and custom decision fields
- Bulk selection — select multiple documents for batch operations like tagging, decision coding, or export
- Sort — sort by any column including filename, date, size, relevance, or type
- Saved searches — save any combination of search query and filters for reuse
Document Viewer
Click any document to open the full document viewer. The viewer provides a rich, multi-panel interface for reviewing individual documents:
- Document display — native rendering of the document with zoom controls (0.25x to 3x)
- Three viewing tabs: Original (native format), Markup (with redaction overlays), and Text (extracted plain text with search highlighting)
- Metadata panel — document properties, email headers, file hashes, and processing info
- Decisions panel — set relevance, hot-document flag, comments, and custom decision fields
- Family panel — view parent/child relationships (e.g. email and attachments)
- Chat panel — ask questions about the document using AI
- Navigation — previous/next buttons with keyboard shortcuts for rapid sequential review
The document viewer uses a prefetch cache that pre-loads adjacent documents (previous and next) in the background. This provides near-instant navigation when reviewing documents sequentially. The cache holds up to 50 documents with a 2-minute TTL.
Native File Viewers
Dezcry includes purpose-built viewers for every supported file type, rendering documents directly in the browser without requiring any plugins or downloads:
| Viewer | File Types | Features |
|---|---|---|
| PDF Viewer | PDF files | Page-by-page rendering, zoom, scroll, text selection, search highlighting |
| Image Viewer | PNG, JPG, TIFF, BMP, GIF | Pan and zoom, fit-to-width/height, full-resolution display |
| DOCX Viewer | Word documents (DOCX) | Formatted text rendering with styles, headers, lists, and tables |
| PPTX Viewer | PowerPoint (PPTX) | Slide-by-slide rendering with layouts and formatting |
| Spreadsheet Viewer | XLSX, XLS, CSV | Multi-sheet tabs, column/row headers, cell formatting, frozen panes |
| Text Viewer | TXT, LOG, HTML, XML, JSON | Syntax-highlighted text with line numbers and search |
| Audio Viewer | MP3, WAV, M4A | Audio player with waveform, playback controls, and timestamp display |
| Video Viewer | MP4, AVI, MOV | Video player with playback controls, full-screen mode |
| Markup Viewer | Any document with redactions | Redaction overlay rendering with colour-coded entity categories |
Metadata Panel
The metadata panel displays all extracted properties for the current document. For email files, this includes:
- From, To, CC, BCC addresses
- Subject line
- Date sent and date received
- Message-ID and conversation threading references
- Attachment count and list
For all documents, the metadata panel shows:
- File size, MIME type, and document type
- MD5 and SHA-256 hashes (for integrity verification)
- Created and modified dates
- Author (when available from document properties)
- Source folder path from the original container
- OCR status and AI summary (when available)
- Processing status and any error messages
Decisions Panel
The decisions panel is where reviewers record their assessments. Every decision is timestamped and logged in the audit trail. Available fields:
- Relevance — mark the document as Responsive, Non-Responsive, or other custom values
- Hot Document — flag important or significant documents for attention
- Decision Comment — free-text annotation explaining the reviewer's reasoning
- Custom Decision Fields — any additional fields configured at the matter level (single-select, multi-select, or free text)
Dezcry uses optimistic locking on document decisions to prevent overwrite conflicts when multiple reviewers work on the same matter. Each document tracks a version number that is incremented on every update. If two reviewers attempt to save changes to the same document simultaneously, the second save will receive a conflict error and be asked to refresh before re-applying their changes.
Family Documents
Documents extracted from container files (emails with attachments, ZIP archives) are automatically grouped into families. A family consists of a parent document (e.g. an email) and its child documents (e.g. attachments).
The family panel in the document viewer shows all related documents, allowing reviewers to quickly navigate between a parent email and its attachments. Family relationships are preserved throughout all workflows — search results can include family expansion, and exports can group family members together.
Tagging
Documents can be tagged with relevance codes and custom decision field values. Tags are set through the decisions panel in the document viewer or via bulk actions on the document list. All tagging actions are logged in the audit trail with the reviewer's identity and timestamp.
Metadata
Overview
Every document ingested into a matter has a rich set of metadata fields automatically extracted during processing. Dezcry captures over 60 metadata fields per document — covering everything from basic file properties and email headers to AI-generated summaries and reviewer decisions. These fields are available for filtering, sorting, column display, search, and export throughout the platform.
Metadata is extracted at the point of ingestion with no manual effort required. For email files, Dezcry parses all standard headers including threading references. For Office documents and PDFs, embedded properties such as author, title, and creation date are captured. For images, EXIF data including camera make, GPS coordinates, and timestamps is preserved. All dates are normalised to UTC for consistent cross-timezone analysis.
Metadata is critical for defensible review workflows. Fields like hash values (MD5, SHA-256) provide chain-of-custody integrity. Date fields enable precise date-range filtering to narrow review sets. Email threading metadata allows reviewers to focus only on inclusive messages. And custodian tracking across duplicates ensures nothing is lost even when redundant copies are removed. All metadata fields listed below are available in load file exports (DAT, CSV, XLSX) for downstream use in Relativity, Nuix, or other review platforms.
Core Document Fields
These fields are present on every document regardless of file type. They provide the fundamental identifiers, file properties, and processing information needed for document management and chain-of-custody tracking.
| Field | Type | Description |
|---|---|---|
| doc_id | String | Unique document identifier within the matter (e.g. DOC-000001). This is the primary reference used across the platform — in search results, exports, audit logs, and cross-references. |
| doc_seq | Integer | Sequential number assigned during ingestion, used for sorting and Bates-style numbering in exports. Sequences are unique within each matter and assigned in upload order. |
| filename | String | Original filename of the document as it existed in the source data. Preserved exactly as found for defensibility — no renaming or sanitisation is applied. |
| mime | String | MIME type of the file (e.g. application/pdf, message/rfc822). Determined by both file extension and magic-byte analysis for accurate identification. |
| document_type | String | Enriched document category — Email, PDF, Word, Excel, PowerPoint, Image, Text, Archive, Audio, Video, or Other. Useful for filtering the document list by file type. |
| size_bytes | Integer | File size in bytes. Displayed in human-readable format (KB, MB) in the UI. Useful for identifying unusually large or suspiciously small files. |
| source_folder | String | Original folder path within the source container — e.g. the PST folder hierarchy (Inbox/Projects/2024), ZIP directory path, or nested archive structure. Preserves the organisational context of the original data. |
| date_created_utc | DateTime | File creation date in UTC. For office documents, extracted from embedded document properties. For other files, derived from filesystem timestamps or container metadata. |
| date_modified_utc | DateTime | File last-modified date in UTC. Critical for date-range filtering in review workflows and for establishing document timelines. |
| md5 | String | MD5 hash of the file content (32 hex characters). Used for deduplication across the matter and for chain-of-custody integrity verification in exports. |
| sha256 | String | SHA-256 hash of the file content (64 hex characters). Provides a cryptographically strong integrity fingerprint for defensible production. |
| status | String | Processing status — queued (awaiting processing), processing (currently being ingested), ready (successfully processed and available for review), or failed (encountered an error). |
| processing_error | String | Detailed error message if processing failed. Helps diagnose issues such as password-protected files, corrupted archives, or unsupported formats. |
| processing_dataset | String | Upload batch identifier (e.g. UPL-001) linking the document to its ingestion batch. Useful for tracking which upload set a document belongs to and viewing batch-level statistics. |
Family & Hierarchy Fields
Documents extracted from container files — such as emails with attachments, ZIP archives, or nested PST folders — are automatically grouped into families. Family relationships are critical for defensible review: reviewers see each email alongside its attachments, and exports can group family members into the same volume for production.
| Field | Type | Description |
|---|---|---|
| family_id | String | Family group identifier. For parent documents (e.g. an email), this equals the document's own doc_id. For child documents (e.g. attachments), this inherits the parent's family_id — linking the entire family together for grouping, export, and review. |
| parent_id | UUID | ID of the parent document (e.g. the email that contained this attachment). Null for top-level standalone documents. Enables the family tree view in the document viewer, where reviewers can navigate between a parent and all of its children. |
When exporting documents, Dezcry preserves family relationships in the load file. Parent documents and their children are linked via the family_id and parent_id fields, allowing downstream review platforms (Relativity, Nuix, etc.) to reconstruct the family hierarchy. The export wizard also supports family-based volume grouping to keep related documents together.
Email Fields
Email is often the most important data type in eDiscovery. Dezcry extracts a comprehensive set of email metadata from both EML and MSG formats, including messages extracted from PST, OST, and MBOX containers. These fields are stored as first-class database columns for efficient filtering, sorting, and field-specific search (e.g. from:john@acme.com).
| Field | Type | Description |
|---|---|---|
| email_from | String | Sender email address and display name (e.g. "John Smith <john@acme.com>"). Searchable via the from: field prefix in keyword search. |
| email_to | String | Recipient email addresses (semicolon-separated). Supports multiple recipients. Searchable via the to: field prefix. |
| email_cc | String | CC (carbon copy) recipient email addresses (semicolon-separated). Searchable via the cc: field prefix. |
| email_bcc | String | BCC (blind carbon copy) recipient email addresses (semicolon-separated). Searchable via the bcc: field prefix. Only available when the source data includes BCC headers (typically only in the sender's mailbox). |
| email_subject | String | Email subject line. Searchable via the subject: field prefix. Commonly used for keyword search and classification workflows. |
| email_message_id | String | RFC 2822 Message-ID header — a globally unique identifier assigned by the sending mail server. Used internally for email threading and deduplication. |
| email_date_sent_utc | DateTime | Date and time the email was sent, normalised to UTC. This is the primary date field used for email date-range filtering and timeline analysis. |
| email_date_received_utc | DateTime | Date and time the email was received, normalised to UTC. May differ from date_sent due to delivery delays or timezone differences between sender and recipient servers. |
| email_attachments_json | JSON | Structured attachment summary containing the count and list of filenames (e.g. {count: 3, names: ["report.pdf", "data.xlsx", "photo.jpg"]}). Useful for quickly identifying emails with specific attachments without opening them. |
| email_in_reply_to | String | Message-ID of the email this is a direct reply to. Used by the threading engine to build the conversation tree. |
| email_references | String | Ordered chain of Message-IDs representing the full conversation history. Each reply appends its parent's Message-ID, creating a breadcrumb trail through the thread. |
| email_conversation_index | String | Microsoft Exchange PR_CONVERSATION_INDEX — a hex-encoded binary value present in Outlook/Exchange-originated messages. Provides precise thread positioning even when standard headers are missing or unreliable. |
| email_thread_index | String | Hierarchical thread position path computed by Dezcry (e.g. "a1b2c3d4+0001+0002"). Encodes the exact tree position for correct chronological sort order and branch identification within conversation views. |
All email metadata fields are indexed in the search engine. You can use field-specific search prefixes to target individual fields — for example, from:john@acme.com AND subject:"quarterly report" or to:legal@company.com AND date >= 2024-01-01. See the Search Syntax section for the full list of supported field prefixes and operators.
Email Threading Fields
These fields are computed by Dezcry's email threading engine during ingestion. Threading groups related messages into conversation trees and identifies which messages areinclusive (containing unique content a reviewer must see) versus non-inclusive (redundant messages whose content is fully captured by a later reply). This can reduce the review set by 40–60% in email-heavy matters, directly lowering review time and AI processing costs.
| Field | Type | Description |
|---|---|---|
| email_thread_group_id | UUID | Identifier of the conversation thread group this email belongs to. All emails in the same conversation share this ID, enabling thread-level grouping and navigation in the document viewer. |
| email_thread_indentation | Integer | Depth within the thread tree (0 = the root/original message, 1 = a direct reply, 2 = a reply to a reply, etc.). Used for visual indentation in conversation views. |
| is_inclusive_email | Boolean | Whether this email is inclusive — meaning it contains unique message content or attachments not present in any later message in the thread. Null if threading was not enabled for this document. Inclusive emails are the minimum set a reviewer needs to see. |
| inclusive_reason | String | Explains why the email is inclusive: unique_message_content (body text not found in later replies), unique_attachment (has an attachment not in later messages), unanalyzed_attachment (attachment could not be compared), root_message (first message in thread), or threading_error (could not determine inclusiveness). |
When "Inclusive Only" is enabled during upload, non-inclusive emails are excluded from the active review workspace entirely. They are still retained in the system and can be accessed via the thread view for context, but they do not appear in the main document list, are not processed by AI classification or redaction, and do not count toward storage. This is the recommended approach for matters with large email volumes where cost efficiency is a priority.
OCR Fields
Dezcry automatically detects documents that contain no extractable text — such as scanned PDFs, photographs of documents, and image files — and flags them for OCR (Optical Character Recognition). Once OCR is run, the extracted text becomes fully searchable and available for AI processing.
| Field | Type | Description |
|---|---|---|
| ocr_required | Boolean | Whether the document requires OCR to extract searchable text. Automatically set to true during ingestion for scanned PDFs, image-only PDFs, and image files (JPEG, PNG, TIFF, BMP). Documents with existing embedded text are set to false. |
| ocr_status | String | Current OCR processing status: not_applicable (document has embedded text, OCR not needed), completed (OCR finished successfully, text extracted), failed (OCR attempted but encountered an error), partial (some pages processed successfully), or skipped (OCR not run yet despite being required). |
Deduplication Fields
When global deduplication is enabled during upload, Dezcry identifies byte-for-byte identical files across the entire matter using hash matching. The first instance is retained as the master document and subsequent copies are flagged as duplicates. Deduplication is applied at the top level — meaning entire families (email + attachments) are deduplicated as a unit, preserving family integrity. See the Deduplication section for full details.
| Field | Type | Description |
|---|---|---|
| is_duplicate | Boolean | Whether this document is a duplicate of another document in the matter. Duplicate documents are excluded from the active review set but retained for audit and export purposes. |
| duplicate_of_id | UUID | ID of the master document this is a duplicate of. Allows reviewers and exports to trace back to the retained copy. The master document is always the first instance ingested. |
| duplicate_custodian_info | String | Records which custodians held copies of this document. Critical for defensibility — even though duplicate copies are removed from the review set, this field preserves a complete record of who possessed the document across all data sources. |
NIST Filtering Fields
NIST filtering (also known as "de-NISTing") removes known system files, operating system components, and application runtime files from the review set by matching file hashes against the NIST National Software Reference Library (NSRL). This is a standard eDiscovery practice that eliminates files that are never relevant to review — such as Windows DLLs, Office templates, and browser cache files — often removing 10–30% of a dataset before review begins.
| Field | Type | Description |
|---|---|---|
| is_nist_filtered | Boolean | Whether this file was identified as a known system or application file via NIST NSRL hash matching. Filtered files are excluded from the active review workspace but retained in the system for audit and reporting. |
| nist_product_name | String | Name of the software product the file belongs to according to the NSRL database (e.g. Microsoft Windows 11, Adobe Acrobat Reader, Google Chrome). Helps identify why a file was filtered and provides context in exception reports. |
Encryption & Integrity Fields
Dezcry performs detailed analysis of every file during ingestion to detect encryption, corruption, and file-type mismatches. These fields provide a complete picture of each document's integrity status — essential for eDiscovery exception reporting and ensuring no documents are silently missed during processing.
| Field | Type | Description |
|---|---|---|
| is_encrypted | Boolean | Whether the document is encrypted or password-protected. Encrypted files cannot be processed until decrypted — add the password to the Password Bank and re-process, or note the exception in reporting. |
| encryption_type | String | Detailed encryption classification: password_protected (standard Office/PDF password), drm_protected (Digital Rights Management), pgp_encrypted (PGP/GPG encryption), smime_encrypted (S/MIME email encryption), or bitlocker (full-disk encryption artefact). Helps IT teams determine the appropriate decryption method. |
| is_corrupt | Boolean | Whether the document is corrupted or malformed. Corrupt files are flagged as processing exceptions and included in exception reports for transparency. |
| corruption_type | String | Detailed corruption classification: truncated (file cut short), malformed_header (invalid file header), invalid_structure (internal structure errors), or zero_byte (empty file). Provides actionable detail for troubleshooting or re-collection from the source. |
| file_signature | String | File magic-bytes signature detected by inspecting the file's binary header (e.g. "PDF-1.4", "PK (ZIP)", "JPEG/JFIF"). Independent of file extension — provides the true format identity. |
| file_signature_mismatch | Boolean | Whether the file extension does not match the actual content detected by magic bytes (e.g. a .docx file that is actually a renamed .exe). Important for identifying potentially suspicious or mis-labelled files in forensic review. |
| is_decrypted | Boolean | Whether the document was successfully decrypted during processing using a password from the Password Bank or provided at upload time. |
| decryption_method | String | How the document was decrypted: global_password_bank (matched against the matter's stored passwords) or upload_password (password provided during the upload that contained this file). Provides an audit trail for decryption actions. |
Dezcry inspects the binary magic bytes of every file to determine its true format, independent of the file extension. When a mismatch is detected (e.g. a .xlsx file that is actually a ZIP archive, or a .pdf that is actually a JPEG image), the file_signature_mismatch flag is set. This is valuable for identifying files that have been intentionally renamed to evade review, a common tactic in investigations and litigation.
Processing Exception Fields
In any eDiscovery matter, a percentage of documents will encounter processing issues. Dezcry categorises every exception with a type and action, providing the structured data needed for defensible exception reporting. These fields are included in exports and processing batch reports so that legal teams have a complete record of what was — and was not — successfully processed.
| Field | Type | Description |
|---|---|---|
| exception_type | String | The category of processing exception: encryption (password-protected or encrypted file), corruption (malformed or damaged file), unsupported_format (file type not supported for text extraction), or text_extraction_failed (supported format but extraction encountered an error). Used for filtering and reporting on processing outcomes. |
| exception_action | String | The action Dezcry took in response to the exception: processed_with_errors (partial processing completed with some issues noted), skipped (document could not be processed at all), partial_extraction (some content was extracted but the process did not complete fully), or placeholder_created (a placeholder entry was created for tracking and reporting purposes). Provides transparency for legal teams assessing completeness. |
AI & Processing Fields
Dezcry uses AI to automatically generate document summaries, apply redactions, and produce document previews. These fields track the status and outputs of each AI-powered workflow, allowing reviewers to quickly see which documents have been summarised, redacted, or are still awaiting processing.
| Field | Type | Description |
|---|---|---|
| llm_summary | String | AI-generated 1–2 sentence summary of the document's content. Summaries are produced automatically after ingestion and displayed in the document list and viewer. Useful for quickly triaging documents without opening them — reviewers can scan summaries to identify relevant documents faster. |
| markup_status | String | Redaction and annotation workflow status: not_started (no redactions applied), pending (redaction in progress), complete (all redactions applied and markup generated), or failed (an error occurred during markup generation). Documents with markup_status of "complete" have a fully redacted preview available. |
| markup_page_count | Integer | Total number of pages in the markup document. Populated after markup generation completes. Useful for estimating review effort and for page-level redaction tracking in production reports. |
| preview_status | String | Document preview generation status: none (no preview requested), queued (awaiting generation), generating (currently being converted), ready (preview available for viewing), or error (generation failed). Previews convert native formats to viewable HTML/PDF for in-browser document review. |
Reviewer Decision Fields
These fields are set by reviewers during document review through the Decisions Panel in the document viewer, or via bulk actions on the document list. Every change to these fields is timestamped, attributed to the reviewer, and logged in the audit trail for full defensibility. Optimistic locking prevents conflicting edits when multiple reviewers work on the same matter simultaneously.
| Field | Type | Description |
|---|---|---|
| relevance | String | Relevance classification assigned by the reviewer — typically Responsive, Non-Responsive, or Privileged, but fully customisable at the matter level. This is the primary coding field used to separate relevant documents from the rest of the dataset. |
| hot_document | Boolean | Flag indicating the document is particularly significant — a "smoking gun" or key evidence that warrants elevated attention. Hot documents are visually highlighted in the document list and can be filtered for quick access. |
| decision_comment | String | Free-text annotation where reviewers explain their reasoning for the relevance decision. Useful for quality control, second-pass review, and providing context to senior reviewers or legal counsel. |
| relevance_coded_at | DateTime | Timestamp of when the relevance decision was last recorded. Used for review progress tracking, productivity metrics, and audit trail purposes. Updated each time the reviewer modifies their decision. |
In addition to the built-in fields above, matters can be configured with custom decision fields — single-select dropdowns, multi-select tags, or free-text fields — to capture matter-specific coding such as issue codes, privilege categories, or confidentiality designations. Custom fields are fully exportable and appear in the decisions panel alongside the standard fields. See Custom Fields for configuration details.
Extended Metadata (metadata_json)
In addition to the first-class fields above, each document contains an extended metadata object with format-specific properties organised by namespace. These fields capture the full depth of information embedded within each file type — from PDF authoring tools to image EXIF geolocation data to email authentication results. Extended metadata is viewable in the metadata panel and included in exports.
| Namespace | Document Types | Fields |
|---|---|---|
| general | All documents | filename, extension, mime, document_type, size_bytes, upload_batch_id. Present on every document as the baseline property set. |
| EML, MSG | from, to, cc, bcc, subject, message_id, in_reply_to, references, conversation_index, date_sent_utc, date_received_utc, attachments (count and names). Also includes email authentication results: dkim_result, spf_result, and dmarc_result — useful for identifying spoofed or unauthenticated messages. | |
| PDF files | title, author, subject, producer (the application that generated the PDF), creator (the originating application), creation_date_utc, modification_date_utc, page_count, is_encrypted. Extracted from both the PDF info dictionary and XMP metadata streams when available. | |
| ooxml | Word, Excel, PowerPoint (DOCX, XLSX, PPTX) | Core properties: created, modified, title, subject, creator, lastModifiedBy, revision, keywords, description, category. Application properties: application (e.g. Microsoft Excel), company, template. These are the properties visible in a file's "Properties" dialog in Microsoft Office. |
| image | JPEG, PNG, TIFF, BMP, GIF | format (e.g. JPEG, PNG), mode (e.g. RGB, RGBA), width, height. EXIF data (when available): DateTimeOriginal, DateTimeDigitized, Make (camera manufacturer), Model (camera model), Software, Orientation, XResolution, YResolution, and GPSInfo (latitude, longitude, altitude). EXIF geolocation data can be critical in investigations involving photographs. |
For email documents, Dezcry extracts the authentication results from email headers when present.DKIM (DomainKeys Identified Mail) verifies the email was not altered in transit.SPF (Sender Policy Framework) checks that the sending server is authorised for the domain. DMARC (Domain-based Message Authentication) combines both checks. These results can help identify spoofed or potentially fraudulent emails during an investigation.
Search
Keyword Search
Dezcry provides eDiscovery-grade keyword search powered by Elasticsearch, delivering capabilities equivalent to dtSearch at scale. The search engine supports millions of documents with sub-second query response times.
Search is available from the main document list via the search bar. Results are ranked by relevance with hit highlighting, and all searches return exact counts (never approximate). Search results can be filtered further using column filters and saved for reuse.
The following fields are indexed and searchable:
- Full document text content
- Filename and file path
- Email fields: subject, from, to, cc, bcc
- Author, custodian, document type, MIME type
- MD5 and SHA-256 hashes
- Tags, dates (created, modified, sent, received)
Search Syntax
Dezcry supports the full range of eDiscovery search syntax:
| Syntax | Example | Description |
|---|---|---|
| Boolean AND | contract AND liability | Both terms must appear in the document |
| Boolean OR | merger OR acquisition | Either term must appear |
| Boolean NOT | confidential NOT public | First term must appear, second must not |
| Grouping | (merger OR acquisition) AND confidential | Parentheses control operator precedence |
| Phrase | "privileged communication" | Exact phrase match, preserving word order |
| Proximity | "contract breach"~5 | Terms must appear within 5 words of each other |
| W/N (dtSearch) | merger W/5 acquisition | dtSearch-style proximity — terms within N words |
| Wildcard (prefix) | priv* | Matches privilege, privileged, privacy, etc. |
| Wildcard (suffix) | Matches email, voicemail, etc. | |
| Wildcard (single) | h?t | Matches hat, hit, hot, hut, etc. |
| Fuzzy | colour~ | Matches similar spellings (Levenshtein distance) |
| Fuzzy (explicit) | colour~2 | Matches within edit distance of 2 |
| Field-specific | subject:"quarterly earnings" | Search within a specific field |
| Field (email) | from:john@acme.com | Search the From email field |
| Field (filename) | filename:report.xlsx | Search by filename |
| Date range | date >= 2020-01-01 | Filter by date |
| Date range | date:2020-01-01..2022-12-31 | Date range with start and end |
Searches automatically apply stemming — searching for "run" will also match "running", "ran", and "runs". This is handled by the Elasticsearch analyzer and provides more comprehensive results without requiring wildcard syntax.
Search Term Sets
Search Term Reports allow you to define a set of keywords and run them against a scope of documents to measure hit rates. This is commonly used for:
- Validating keyword lists before full review
- Measuring the prevalence of specific topics in the collection
- Producing defensible search term hit reports for regulators
- Identifying which custodians or data sources contain relevant material
To create a search term report, navigate to Search Terms within a matter:
- 1Create a report — Give it a name and select the scope (all documents or a saved search).
- 2Add search terms — Enter your keywords one at a time. Each term can be up to 450 characters and supports the full search syntax.
- 3Configure options — Enable "Include family hits" to count documents whose family members match. Enable "Tag hits" to create per-document hit records.
- 4Run the report — Dezcry executes each search term against the scope and records hit counts.
Search Term Reports
Once a search term report has completed, you can view detailed results:
- Per-term hit counts — number of documents matching each search term (direct and family hits)
- Unique hits — documents that match only this specific term
- Colour-coded highlighting — each term can be assigned a custom highlight colour for visual identification in the document viewer
- Scope summary — total documents in scope, total documents with at least one hit
- Term status — individual status tracking for each term (pending, running, completed, error)
When tag hits is enabled, you can filter the document list to show only documents that matched a specific search term, enabling targeted review of keyword-responsive material. Search term highlights persist in the document viewer text tab, showing matching terms with their assigned colours.
Saved Searches
Any combination of search query and column filters can be saved as a named searchfor later reuse. Saved searches are a core building block in Dezcry — they serve as the scope selector for redaction, classification, export, and search term reports.
| Property | Description |
|---|---|
| Name | A unique name within the matter for easy identification |
| Description | Optional long-form description of what the search captures |
| Visibility | Shared (visible to all matter users) or Private (creator only) |
| Pinned | Pin frequently-used searches to the top of the list |
| Tags | Categorise searches (e.g. "Privilege", "Review", "Production") |
| Query + Filters | The full search query and column filter configuration |
When a saved search is used as the scope for a job (redaction, classification, or export), the document set is frozen at the time the job starts. This means the job processes the documents that matched at that moment, even if new documents are added to the matter later — providing defensibility and reproducibility.
AI Classification
Overview
AI Classification lets you automatically categorise documents using custom decision fieldsdefined by your team. Unlike manual review, AI classification processes entire document sets in minutes, producing predictions with calibrated confidence scores so reviewers can focus their attention on genuinely ambiguous items while high-confidence predictions are applied automatically.
Classification runs on large language models within the same Azure environment as the rest of the platform — no document data leaves your deployment. The system includes confidence debiasing to correct for known LLM overconfidence, a verification pass for borderline predictions using a separate model, and intelligent document chunking for long documents. Every prediction includes a calibrated confidence score and rationale, and all decisions are logged in the audit trail.
Classification and redaction serve different purposes. Classification assigns labels to entire documents — categorising them by type, relevance, sensitivity, or any custom taxonomy your team defines. Redaction identifies and removes specific text within documents. Classification helps your team decide what to do with a document; redaction helps you prepare it for disclosure.
Custom Fields
Before running a classification job, you define the decision fields that the AI should predict. These are entirely customisable — you define the field names, types, options, and instructions that are specific to your review. Navigate to Classification within a matter to configure fields.
| Field Type | Description | Example |
|---|---|---|
| Single Select | The AI chooses exactly one value from a predefined list of options. Best for mutually exclusive categories. | Relevance: Responsive / Non-Responsive / Partially Responsive |
| Multi Select | The AI can select one or more applicable values from a list. Best for non-exclusive labels. | Data Categories: Financial / Medical / Employment / Personal |
| Boolean | A simple yes/no decision. | Contains PII: true / false |
| Free Text | The AI provides a short free-text response. Best for summaries or descriptions. | Key Topics: One-sentence description of the document content |
For each field, you provide natural-language instructions that tell the AI exactly how to evaluate documents. The quality of these instructions directly affects classification accuracy. Dezcry provides a real-time quality indicator as you write:
| Quality Level | Length | Guidance |
|---|---|---|
| Poor | Under 10 characters | Too short to be useful — the AI has no context for making decisions. Add specific criteria, examples, and edge case guidance. |
| Fair | 10–50 characters | Basic direction, but lacks nuance. Adding more detail about what qualifies for each option and how to handle ambiguous cases will improve accuracy. |
| Good | 50–200 characters | The AI has enough context to make reliable predictions. Consider adding examples of borderline cases. |
| Excellent | 200+ characters | Detailed instructions with clear criteria, examples, and edge case handling. This produces the most accurate and consistent results. |
Good classification instructions should include:
- Clear criteria — what makes a document qualify for each option
- Examples — concrete examples of what belongs in each category
- Edge cases — how to handle ambiguous or borderline documents
- Context — relevant background about the matter, industry, or regulatory framework
- Negative examples — what should not be classified as a given category
For example, instead of "Is this relevant?", write: "Classify as Responsive if the document contains information about the data subject's employment history, salary, performance reviews, or HR communications. Classify as Non-Responsive if the document is a system-generated notification, marketing material, or relates to a different individual. Classify as Partially Responsive if the document contains some relevant content mixed with unrelated material."
Classification Sets
A classification set is a reusable configuration that defines which fields to predict, how the AI should behave, and what confidence thresholds to apply. Classification sets can be run multiple times — for example, after adding new documents to the matter. To create and run a classification:
- 1Select scope — Choose all documents or a saved search to define which documents to classify. The scope is frozen at run time — new documents added later won't be included in this run.
- 2Name the set — Give the classification set a descriptive name for tracking and audit purposes.
- 3Configure fields — Define one or more custom decision fields with types, options, and natural-language AI instructions.
- 4Set thresholds — Configure the auto-accept threshold (default 0.85) and review threshold (default 0.60) to control how predictions are routed.
- 5System prompt (optional) — Provide an optional system-level prompt that applies to all fields — useful for setting overall context like the matter type, jurisdiction, or review protocol.
- 6Optional sampling — For large document sets, configure prevalence sampling to validate classification quality on a subset before committing to a full run.
- 7Review and launch — Review all settings in a summary view and start the classification job.
Confidence Thresholds and Routing
Dezcry uses a three-tier routing system based on calibrated confidence scores to determine how each prediction is handled:
| Confidence Range | Routing | Description |
|---|---|---|
| Above auto-accept (default: > 0.85) | Auto-applied | The prediction is applied automatically without requiring human review. The AI is highly confident and the prediction is defensible. |
| Between review and auto-accept (default: 0.50–0.85) | Flagged for review | The prediction is saved but flagged as needs_review. A human reviewer must approve, correct, or reject it before it is applied. |
| Below review threshold (default: < 0.50) | Indeterminate | The AI could not make a reliable prediction. The document is flagged for manual coding by a reviewer. |
Both thresholds are configurable per classification set, allowing teams to tune the trade-off between automation and human oversight based on the risk profile of the review. A high-stakes privilege review might use a lower auto-accept threshold (0.95) to ensure more human review, while a routine document-type classification might use a higher threshold (0.80) to maximise automation.
Confidence Calibration (Debiasing)
LLMs are known to be systematically overconfident — they tend to report confidence scores of 0.90 or 0.95 even when their actual accuracy is closer to 0.80–0.85. This is particularly problematic in eDiscovery where confidence thresholds drive review decisions.
Dezcry applies empirical confidence debiasing — a calibration layer that adjusts raw LLM confidence scores to better reflect true accuracy. The calibration is:
- Monotonic — higher raw confidence always produces higher calibrated confidence (preserves ranking)
- Deterministic — the same input always produces the same output (defensible in regulatory contexts)
- Conservative — systematically pulls overconfident scores toward empirical accuracy curves
The calibration is based on published research on LLM confidence calibration and fitted to eDiscovery-specific accuracy measurements. It compresses the overconfident tail (0.85–0.99) more aggressively than the well-calibrated low-confidence range (0.05–0.50).
Verification Pass
For predictions that fall in a borderline confidence range (0.35–0.70 by default), Dezcry automatically triggers a verification pass — a second classification attempt using a different model deployment. This functions as a quality control layer:
- The verification pass uses a different prompt persona ("QC reviewer") to challenge the initial classification
- It uses a separate model deployment for model diversity, reducing correlated errors
- If the verification agrees with the first pass, the confidence scores are averaged (typically increasing the final confidence)
- If the verification disagrees, the lower confidence score is used, the verification's classification is adopted, and the result is force-flagged for human review
Document Chunking for Long Documents
Documents that exceed the model's context budget (default: ~112,000 characters) are automatically split into deterministic chunks for processing. Chunking is designed to maintain classification accuracy:
- Sentence-boundary aware — chunks are split at sentence boundaries, never mid-sentence, preserving semantic coherence
- Overlapping — adjacent chunks share ~200 characters of overlap, ensuring context continuity across chunk boundaries
- Deterministic — the same document always produces the same chunks, ensuring reproducible results
- Fallback splitting — if a single sentence exceeds the chunk limit, it falls back to word-boundary splitting with overlap
When a document is chunked, each chunk is classified independently, and results are aggregated using a weighted voting system:
- Each chunk's prediction is weighted by its confidence score
- Chunks that return null (no classifiable content) are excluded from the vote, not counted as evidence
- The winning prediction is determined by total confidence-weighted score, with tie-breaking by peak single-chunk confidence
- A unanimity bonus increases confidence when all chunks agree; disagreement reduces it
- A dissent penalty is applied when any dissenting chunk has high confidence (≥ 0.70), with a note recommending manual review
When different chunks of a document produce different classifications, this is flagged as chunk disagreement and the document is automatically flagged for human review. This is an important quality signal — it often indicates that a document contains mixed content (e.g. a partially responsive document where some sections are relevant and others are not). The aggregated rationale includes a note about the dissenting chunks and their confidence levels.
Classification sets track runs with detailed progress reporting: total documents, documents processed, errors encountered, and token usage for cost attribution. Completed runs automatically create a saved search containing the classified documents for downstream processing.
Classification runs support parallel processing — multiple documents are classified concurrently (default: 6 simultaneous LLM calls) to maximise throughput while staying within AI rate limits. Runs can be cancelled at any time, and cancellation takes effect cleanly after the current document finishes processing.
The classification progress view shows real-time processing with a live console, document-by-document results including confidence scores, and estimated time remaining. You can continue working while classification runs in the background.
Reviewing Predictions
After a classification run completes, reviewers can examine the results. Each document receives a result for every configured field, containing:
| Field | Description |
|---|---|
| Predicted Value | The AI's chosen classification for this field (e.g. "Responsive", "Financial"). Null if the AI could not determine a classification. |
| Confidence Score | A calibrated 0.0–1.0 score reflecting the AI's certainty. Debiased to correct for LLM overconfidence. |
| Rationale | A short natural-language explanation of why the AI made this prediction, referencing specific content in the document. |
| Needs Review | Boolean flag — true if the confidence is below the auto-accept threshold, if chunks disagreed, or if the verification pass overrode the initial classification. |
| Chunk Count | How many chunks the document was split into (1 for short documents that fit in a single context window). |
| Chunk Disagreement | Whether different chunks of the document produced different predictions — a signal that the document may contain mixed content. |
| Verification Status | Whether the verification pass was triggered and whether it agreed or disagreed with the initial classification. |
Reviewers can take the following actions on any prediction:
- Approve — accept the AI's prediction as the final decision for this document and field
- Correct — override the AI's prediction with a different value chosen by the reviewer. The correction is logged alongside the original AI prediction for audit purposes.
- Reject — dismiss the prediction entirely, leaving the field uncoded for this document
All review actions are logged in the audit trail with the reviewer's identity, timestamp, the original AI prediction, and the reviewer's decision. This provides a defensible record of how every classification decision was made — whether by AI with human approval, by human correction of an AI suggestion, or by purely manual coding.
Prevalence Sampling
For large document sets, Dezcry supports prevalence sampling — classifying a statistically representative subset of documents before committing to a full run. This allows teams to:
- Validate that the classification instructions produce accurate results before processing the full set
- Estimate the prevalence of each category in the collection (e.g. "approximately 30% of documents are responsive")
- Calculate precision and recall metrics by comparing AI predictions against manual coding on the sample
- Refine instructions based on sample results before running the full classification
Sampling results are stored as ClassificationSample records, preserving both the AI prediction and the human-coded ground truth for quality measurement and defensibility.
AI Redaction
Overview
AI Redaction is Dezcry's flagship feature — a 5-layer detection pipeline that identifies personal data, sensitive content, and legally privileged material for redaction. The system is designed as a reviewer aid, not an autonomous tool: every AI suggestion is reviewable, editable, and logged before it is applied.
Redaction runs on large language models within the same Azure environment. No document data is sent to any third-party service. The pipeline combines deterministic pattern matching with LLM analysis and cross-document entity resolution for comprehensive coverage.
Redaction Types
Dezcry supports three redaction protocols, each tailored to a different use case:
| Type | Purpose | Configuration |
|---|---|---|
| DSAR | Remove the data subject's personal information from documents being disclosed. Uses a whitelist approach — you specify the data subject's name, email addresses, and phone numbers, and the AI identifies all instances. | Data subject first/last name, known email addresses, known phone numbers |
| Privilege | Identify and redact legally privileged communications (attorney-client privilege, work product doctrine). Uses domain and keyword filtering to detect privileged material. | Privileged individuals, law firm domains, privilege keywords, custom instructions |
| Ad Hoc | Custom redaction with free-form instructions. Use for any redaction task that doesn't fit the DSAR or privilege templates. | Free-text instructions describing what to redact |
Redaction Models
When creating a redaction set, you select which entity categories the AI should detect. Each category has a distinct colour for visual identification in the review interface:
| Model | Description | Colour |
|---|---|---|
| Names | Personal names, first/last names, initials, nicknames | Red |
| Emails | Email addresses | Orange |
| Phone Numbers | Phone numbers, fax numbers, mobile numbers | Amber |
| Identifiers | SSN, passport numbers, driver licence numbers, national IDs | Green |
| Employment | Job titles, employee IDs, salary information, work history | Blue |
| Company IDs | Company registration numbers, tax IDs, ABN/ACN | Purple |
| Locations | Physical addresses, postal codes, GPS coordinates | Magenta |
| Political Opinions | Political affiliations, party membership, voting records | Light Purple |
| Health Information | Medical conditions, treatments, diagnoses, medications | Red |
| Sexual Orientation | Gender identity, sexual orientation details | Pink |
| Financial | Bank account numbers, credit card numbers, financial data | Green |
| Auth Credentials | Passwords, PINs, API keys, security tokens | Cyan |
| Family Associations | Relationships, dependents, family member details | Light Red |
| Device IDs | IP addresses, MAC addresses, device identifiers | Light Blue |
Sensitive categories — health information, sexual orientation,political opinions, and auth credentials — use a lower default auto-apply confidence threshold (0.70) to ensure more conservative handling.
5-Layer Pipeline
Dezcry processes each document through a 5-layer redaction pipeline, combining multiple detection methods for comprehensive coverage:
| Layer | Name | Method | Description |
|---|---|---|---|
| L1 | Pattern Scan | NER engine (deterministic) | Pattern-matching engine that detects structured PII using regex rules and named entity recognition. Provides a fast, deterministic baseline — catches email addresses, phone numbers, credit card numbers, and standard identifier formats. |
| L2 | AI Analysis | Large language model | The primary AI detection pass. The LLM analyses each document with context from L1 and L4 results, identifying contextual personal data that pattern matching alone would miss — such as names mentioned in natural language, implied relationships, and sensitive content. |
| L3 | AI Double-Check | Independent LLM verification | An independent verification layer using a separate model deployment. Acts as a "senior eDiscovery QA reviewer" — adversarially examines L2 results to confirm, reject, or upgrade redaction entries. Catches false positives and missed items. |
| L4 | Cross-Reference | Entity Resolution (algorithmic) | Fuzzy clustering of entity variants across all documents in the scope. Groups different spellings and formats of the same entity (e.g. "J. Smith", "John Smith", "john.smith@acme.com") into clusters with a canonical form. Ensures consistent redaction across the entire document set. |
| L5 | Smart Routing | Confidence Routing (algorithmic) | Routes each redaction entry based on its confidence score: high-confidence items are auto-applied, medium-confidence items go to the human review queue, and low-confidence items are flagged for manual inspection. |
The layers execute in the order: L4 (entity resolution) → L1 (pattern scan) → L2 (AI analysis) → L3 (verification) → L5 (routing). L4 runs first to build the entity index, which provides context for the subsequent AI layers. Progress is tracked per-phase with real-time status updates in the UI.
Reviewing Redactions
After a redaction set completes processing, navigate to the Review page to examine and approve the AI's suggestions. The review queue presents each detected entity with:
- Original text — the exact text the AI identified for redaction
- Model category — the entity type (names, emails, etc.) with colour-coded badge
- Source layer — which pipeline layer detected it (L1, L2, L3, L4)
- Confidence score — how certain the AI is that this is a genuine entity
- Verification status — confirmed, rejected, upgraded, or new (from L3)
- Page location — the page number and pixel coordinates within the document
Reviewers can filter the queue by layer, model category, and confidence threshold. For each entry, reviewers can:
- Approve — accept the redaction and apply it to the document
- Reject — dismiss the suggestion as a false positive
- Flag for review — escalate to a senior reviewer for a second opinion
The review queue paginates at 100 entries per page. All review decisions are logged in the audit trail with the reviewer's identity, timestamp, and action taken.
Manual Redactions
In addition to AI-assisted redaction, reviewers can manually draw redaction boxes on any document using the markup viewer. Manual redactions are applied directly to the document's markup images and are tracked alongside AI redactions in the audit trail.
For spreadsheet documents, Dezcry provides a specialised spreadsheet markup viewer that allows cell-level redaction — reviewers can select individual cells or ranges to redact.
AI Summaries & Chat
Document Summaries
Dezcry automatically generates LLM-powered summaries for every document in a matter. Summaries are 1–2 sentence overviews that give reviewers quick context to assess relevance, decide on inclusion or exclusion, and move through large review sets faster.
Summaries are generated by a dedicated language model running on GPU infrastructure within the same Azure environment. No document data is sent to third-party services. Summaries are generated in the background and are available alongside the document in the metadata panel.
- Summaries are generated automatically on upload and during background backfill
- The summary language is configurable per matter (English, German, French, Spanish, etc.)
- Summaries are searchable and appear in the document metadata panel
- Administrators can trigger summary regeneration for any document or batch
Document Chat
The Document Chat panel provides conversational AI for asking questions about documents. Available from the document viewer, chat uses Retrieval-Augmented Generation (RAG) to find relevant content and generate accurate answers with source citations.
How it works:
- 1Ask a question — Type a natural-language question in the chat panel (e.g. "What are the key dates mentioned in this document?")
- 2Hybrid search — Dezcry searches for relevant content using both keyword search (Elasticsearch) and semantic search (vector embeddings), combining results via Reciprocal Rank Fusion.
- 3AI generates answer — The LLM reads the relevant document chunks and generates a response with inline citations referencing specific documents.
- 4Source verification — Each response includes clickable source document references (e.g. [DOC-00028]) so reviewers can verify the AI's answer.
Chat is rate-limited to 20 queries per minute per user and 60 queries per minute per matter to ensure fair resource allocation across teams.
AI OCR
Overview
AI OCR (Optical Character Recognition) extracts searchable text from image-based documents — scanned PDFs, photographs, screenshots, and other image files that don't contain embedded text. Dezcry uses the Azure Computer Vision Read API for high-accuracy text extraction.
OCR can be enabled automatically during upload (as a processing option) or run manually on specific documents or batches after ingestion.
Running OCR
Navigate to the AI OCR page within a matter to manage OCR jobs:
- 1Create a job — Select the scope — all documents or a saved search — and start the OCR job.
- 2Processing — Dezcry sends each image document to the Azure Computer Vision API for text extraction. Progress is tracked in real-time with 4-second polling intervals.
- 3Results — Extracted text is stored in the document record and immediately becomes searchable. Per-document results include pages extracted, characters extracted, confidence scores, and processing duration.
OCR job results track each document individually, reporting:
- Pages and characters extracted per document
- Per-document status (completed, failed, skipped)
- Error messages for failed documents
- Processing duration per document
Jobs can be cancelled while running or queued. The AI OCR dashboard shows aggregate metrics: total jobs, completed jobs, active jobs, and total documents processed.
Password Bank
Overview
The Password Bank stores passwords and credentials for encrypted documents within a matter. When Dezcry encounters password-protected files during ingestion (encrypted PDFs, password-protected ZIPs, protected Office documents, encrypted PST files), it attempts to decrypt them using passwords from the Password Bank.
Managing Passwords
Navigate to the Password Bank page within a matter to manage credentials:
- Add passwords — enter passwords with optional labels and tags for organisation
- Labels — human-readable hints to identify what the password is for (the label is visible, the password itself is hidden)
- Tags — categorise passwords (e.g. "client", "custodian-smith", "batch-3")
- Usage tracking — each password tracks when it was last used and how many times it has been applied
- Edit and delete — update or remove passwords with confirmation dialogs
Passwords are reusable across all uploads within the matter. When new documents are uploaded, all passwords in the bank are tried against any encrypted files. The upload summary reports how many files were successfully decrypted and how many failed decryption.
Export
Overview
Dezcry's Export system produces disclosure-ready output packages with Bates numbering, metadata load files, burned redactions, and full decision history. Exports are configured through a multi-step wizard and can be re-run with updated settings.
Two export types are supported:
- Production — formal disclosure packages with Bates numbering, branded headers/footers, and structured volume organisation. Used for regulatory submissions and formal DSAR responses.
- Review — simpler packages for internal review or transfer to external counsel, without production-level numbering requirements.
Export Wizard
The export wizard guides you through a 6-step configuration process:
- 1Scope — Select which documents to export — all documents in the matter or a saved search.
- 2Name & Type — Name the export set and choose Production or Review type.
- 3Output Components — Select which output types to include: metadata load file, natives, images, text files, and/or PDFs.
- 4Numbering & Branding — Configure Bates numbering (prefix, suffix, start number, padding) and optional header/footer branding.
- 5Load File & Volumes — Configure the metadata load file format, encoding, date formats, and volume organisation settings.
- 6Review & Run — Review all settings in a summary view and launch the export.
Scope Selection
Export scope defines which documents are included in the output package. You can choose:
- All documents — exports every document in the matter
- Saved search — exports only documents matching a previously saved search query and filters
The wizard displays a document count for the selected scope so you can verify the volume before proceeding. The scope is frozen at run time — new documents added to the matter after the export starts will not be included.
Output Components
Select which output types to include in the export package:
| Component | Description |
|---|---|
| Metadata Load File | A structured data file (DAT, CSV, or HTML) containing all document metadata, decisions, and Bates numbers. Compatible with Relativity, Concordance, and other review platforms. |
| Natives | Original source files in their native format (DOCX, PDF, XLSX, etc.) |
| Images | Rendered document images (single-page or multi-page TIFF) with optional Opticon or iProrev load files for image cross-referencing. |
| Text Files | Extracted plain text content for each document, useful for downstream text analytics or cross-referencing. |
| PDFs | Rendered PDF versions of each document, optionally with burned-in redactions and Bates number branding. |
Numbering & Branding
Production exports support Bates-style document numbering:
| Setting | Description | Example |
|---|---|---|
| Prefix | Text prepended to every Bates number | ACME- |
| Suffix | Text appended to every Bates number | -PROD |
| Start Number | The first number in the sequence | 1 |
| Digit Padding | Zero-padding width for the numeric portion | 7 → 0000001 |
| Numbering Mode | Document-level (one number per document) or page-level (one number per page) | Document-level |
| Page Separator | Character between document number and page number in page-level mode | _ → ACME-0000001_001 |
| Attachment Grouping | Keep parent documents and attachments numbered sequentially | Enabled |
| Sort Order | How documents are ordered for numbering (sequential, family group, or by field) | doc_seq |
Optional branding adds headers and footers to PDF output:
- Header and footer with left, centre, and right sections
- Template tokens:
{BatesNumber},{PageX},{PageY} - Default footer: "CONFIDENTIAL"
Load Files & Volumes
Load file settings control the metadata output format:
| Setting | Default | Description |
|---|---|---|
| Format | DAT | Load file format — DAT (Concordance), CSV, HTML, or custom TXT |
| Encoding | UTF-8 | Character encoding for the load file |
| Date Format | MM/dd/yyyy | Format for date fields in the load file |
| Time Format | HH:mm:ss | Format for time fields |
Volume settings control the physical organisation of the export package:
| Setting | Default | Description |
|---|---|---|
| Volume Prefix | VOL | Prefix for volume folder names (VOL001, VOL002, etc.) |
| Start Number | 1 | First volume number |
| Digit Padding | 3 | Zero-padding for volume numbers |
| Max Volume Size | 4500 MB | Maximum size per volume folder before splitting |
| Max Files Per Folder | 5000 | Maximum files in a single subfolder |
| File Naming | Control Number | How files are named — by Bates/control number or original filename |
Downloading Exports
Once an export run completes, the output package is available for download. The export page shows:
- Run status — running, completed, failed, or cancelled
- Progress — documents processed vs. total
- Output size — total size of the generated package
- Duration — time taken to generate the export
- Error and warning counts — per-document issues encountered
- Settings snapshot — the exact configuration used for this run
Redaction integration allows you to burn redactions into the export output. Select a completed redaction set and choose the placeholder mode:
- None — no redaction placeholders (redacted areas are simply blacked out)
- Brackets — redacted text replaced with category labels in brackets
- Redaction block — solid black boxes over redacted content
All export actions — creation, run start, download — are logged in the audit trail.
Audit & Reporting
Audit Log
Every significant action in Dezcry is recorded in an immutable audit log, providing a defensible trail for regulators, legal review, and internal governance. The audit log captures:
| Category | Actions Tracked |
|---|---|
| Documents | Viewed, uploaded, downloaded, deleted, summaries regenerated |
| Decisions | Relevance coding updates, bulk decision changes, tag modifications |
| Redactions (Manual) | Redaction boxes drawn, updated, or deleted on documents |
| Redaction Review | AI redaction entries approved, rejected, or escalated |
| Redaction Jobs | Sets created/deleted, runs started/completed/cancelled/failed |
| Classification | Sets created/deleted, runs started/completed/cancelled/failed |
| Export | Sets created/updated/deleted/cloned, runs started/cancelled, downloads |
| Markup | Preview and markup images generated or failed |
| Downloads | PDF downloads, batch PDF downloads, redacted spreadsheet downloads |
| Search | Saved searches created, updated, or deleted |
| Chat | Messages sent, conversations created/updated/deleted |
| Indexing | Documents indexed, matter re-indexed, index cleared |
| Auth | Login success/failure, password changes, account locks |
| Admin | Users created/updated, roles changed, matter access granted/revoked |
| Billing | Usage recalculated, invoices generated |
Each audit entry includes: the action type, target (which document, set, or resource was affected),user identity (who performed it), timestamp, and details (rich context including file names, counts, old/new values). The audit log is filterable by action type, target type, user, and date range, with pagination at 50 entries per page.
Matter-level audit is accessible from the Audit page within each matter. System-wide audit is available to administrators from the Admin section.
Reporting Dashboard
The Reporting page provides analytics dashboards with visualisations across eight tabs:
| Tab | Metrics |
|---|---|
| Overview | Executive summary KPIs — document counts, completion rates, activity summary |
| Processing | Ingestion batch history, volume growth over time, processing throughput |
| Redaction | Redaction runs, entities detected by model, layer statistics, coverage rates |
| Classification | Classification runs, field outcomes, confidence score distributions |
| AI Performance | Token usage, cost attribution, model accuracy and quality metrics |
| Review | Review queue depth, items pending review, reviewer turnaround times |
| Activity | User action trends, audit log summaries, active reviewer counts |
| Exports | Export history, production statistics, deliverable sizes |
Dashboards include KPI cards, bar charts, line charts, pie charts, and area charts. Reports can be exported as PDF with embedded charts, matter information headers, and generation timestamps.
Billing & Usage
The Billing page shows storage usage and costs for each matter. Storage is broken down into seven categories:
| Category | Description |
|---|---|
| Documents | Original uploaded files in their native format |
| Extracted Text | Plain text extracted during processing and OCR |
| Markup Images | Rendered page images for the redaction workflow |
| Redacted PDFs | PDF versions with burned-in redactions and branding |
| Indices | Elasticsearch search indices for the matter |
| Embeddings | Vector embeddings used for AI chat and semantic search |
| Other | Miscellaneous processing artifacts |
The billing dashboard shows current usage (total GB and projected monthly cost), storage breakdown by category, usage history over time, and invoice details. Pricing is per-GB with regional variations and volume tier discounts.
Administration
User Management
The Admin page (accessible to admin and super_admin roles) provides a central interface for managing all users in the organisation. The user list shows:
- Email address and full name
- Assigned role
- Account status (active, inactive, pending, invited, locked, deactivated)
- 2FA/MFA enablement status
- Last login date
- Number of matter assignments
Administrators can search by email or name, and filter by status or role. Available actions include creating users, editing details, changing roles, sending invitations, resetting passwords, and activating or deactivating accounts.
Roles & Permissions
Dezcry uses a hierarchical role-based access control (RBAC) system with four roles. Roles are hierarchical — each role inherits all permissions from the roles below it. Access is enforced at two levels: role-level (what actions a user can perform across the platform) and matter-level (which specific matters a user can access).
Role Hierarchy
| Role | Description | Matter Access |
|---|---|---|
| Super Admin | Full platform control. Can manage all users (including other admins), delete matters, configure system-wide settings, and access every feature. Intended for platform owners and IT administrators. | Implicit access to all matters across the tenant — no explicit assignment required. |
| Admin | Organisation-level management. Can create matters, invite and manage users, assign users to matters, view audit logs, manage the password bank, and configure billing. Cannot delete matters or manage other admins. | Implicit access to all matters across the tenant — no explicit assignment required. |
| Reviewer | The primary working role for legal, privacy, and compliance team members. Can upload documents, review and code documents, run AI classification and redaction jobs, create and manage exports, manage saved searches, and run search term reports. | Must be explicitly assigned to each matter. Can only see and work within matters they have been granted access to. |
| Read Only | View-only access for stakeholders, external counsel, or auditors who need visibility but should not make changes. Can browse documents, view metadata, read reports, use chat, and download exports — but cannot upload, modify, or run any jobs. | Must be explicitly assigned to each matter. Can only see matters they have been granted access to. |
Detailed Permission Matrix
The following table shows the minimum role required for each action in the platform. Higher roles automatically inherit all permissions from lower roles.
| Feature Area | Action | Minimum Role |
|---|---|---|
| Matters | View matters | Read Only |
| Matters | Create new matters | Admin |
| Matters | Update matter settings | Admin |
| Matters | Delete matters | Super Admin |
| Documents | View and search documents | Read Only |
| Documents | Upload documents | Reviewer |
| Documents | Update decisions, tags, and coding | Reviewer |
| Documents | Delete documents | Admin |
| AI Classification | View classification results | Read Only |
| AI Classification | Create sets and run classification jobs | Reviewer |
| AI Redaction | View redaction results | Read Only |
| AI Redaction | Create sets, run jobs, and review entries | Reviewer |
| Export | View export sets and download packages | Read Only |
| Export | Create export sets and run exports | Reviewer |
| Search | View saved searches | Read Only |
| Search | Create and manage saved searches | Reviewer |
| Search Term Reports | View search term reports | Read Only |
| Search Term Reports | Create and run reports | Reviewer |
| Chat / AI Q&A | Ask questions and view chat history | Read Only |
| Reporting | View analytics dashboards | Read Only |
| Billing | View billing and usage | Read Only |
| Billing | Manage billing settings | Admin |
| Password Bank | View stored passwords | Admin |
| Password Bank | Add, edit, and delete passwords | Admin |
| Audit Log | View matter and system audit logs | Admin |
| User Management | View and manage users | Admin |
| User Management | Invite users and assign roles | Admin |
| System Admin | Manage other admins, delete matters, system config | Super Admin |
Matter-Level Access Control
Access to individual matters is controlled separately from role permissions:
- Super Admin and Admin roles have implicit access to every matter in the tenant. They do not need to be explicitly assigned — they can see and manage all matters automatically.
- Reviewer and Read Only roles require explicit assignment to each matter. An administrator must grant access by assigning the user to the matter. Until assigned, the matter is completely invisible to the user — it does not appear in their matter list and cannot be accessed via direct URL.
This two-level model enables organisations to enforce segregation of duties andneed-to-know access. For example, a reviewer handling HR DSARs can be restricted to only HR-related matters, while a different reviewer handles customer DSARs — even though both have the same role, they see entirely different matter sets.
Tenant Isolation
All access controls operate within a tenant boundary. Every database query is scoped to the authenticated user's tenant, and every matter-level operation verifies that the matter belongs to the same tenant. Cross-tenant access is architecturally impossible — there is no mechanism in the application layer to access another organisation's data, even with a Super Admin role.
Document-Level Access
Access to individual documents follows the matter access model. If a user has access to a matter, they can see all documents within that matter (subject to their role permissions for viewing vs. editing). There is no per-document access restriction — access is controlled at the matter level, which is the standard approach in eDiscovery and DSAR review workflows where reviewers need to see the full context of a matter to make defensible decisions.
Permissions are enforced server-side on every API request, not just in the UI. Even if a user manipulates the frontend or constructs API requests directly, the backend validates their role and matter access before processing any operation. Denied requests receive a structured 403 Forbidden response with a clear explanation of why access was refused.
Inviting Users
Administrators invite new users by providing their email address, name, and assigned role. The invitee receives an email with a single-use invitation link that guides them through:
- 1Set password — Create a strong password (minimum 12 characters, must include uppercase, lowercase, and a number).
- 2Configure 2FA — Scan a QR code with an authenticator app (Google Authenticator, Authy, etc.) and enter the verification code.
- 3Complete setup — Account is activated and the user can sign in.
Invitation links are single-use and have an expiration date. The invitation tracks who created it, when it was used, and the IP address of the accepting user.
Admin Dashboard
The Admin Dashboard provides tenant-wide analytics and operational oversight:
- Users overview — total, active, locked, invited users; 2FA adoption rate; role distribution; currently online users
- Matters overview — total matters; status distribution (open/closed/archived); type distribution; document count and storage per matter
- Documents overview — total document count; total storage; status distribution; encrypted, corrupt, and duplicate counts
- Processing status — recent upload batches; active classification, redaction, and export runs
- Storage breakdown — detailed storage usage by category across all matters
- Recent audit activity — latest system-wide audit entries
System Audit
The System Audit page in the Admin section provides a tenant-wide view of all audit log entries across all matters. This allows administrators to monitor platform-wide activity, investigate security events, and produce compliance reports. The same filtering and search capabilities from the matter-level audit are available at the system level.
Security & Compliance
Data Security
Dezcry is hosted entirely on Microsoft Azure, using Azure Container Apps, Azure PostgreSQL, and Azure Storage. All infrastructure runs within a single resource group with network-level isolation. The GPU worker service that handles AI inference runs on internal-only ingress and is not accessible from the public internet.
The platform operates a logically isolated multi-tenant architecture. Each organisation's data — documents, metadata, reviewer decisions, and audit logs — is segregated at the application and database level. Uploaded files are stored in organisation-scoped storage paths. Cross-tenant data access is not possible through the application layer.
Encryption
All data is encrypted in transit using TLS 1.2+ for all connections between services, storage, and the database. Data is encrypted at rest using Azure-managed encryption keys via Azure Storage Service Encryption and Azure Database encryption. Uploaded files, processed outputs, and database records are all covered.
Data Residency
Dezcry supports regional data residency — each matter can be hosted in a specific Azure region to meet local data protection requirements:
- Australia East — default region
- Switzerland North — for Swiss data protection requirements
- Germany — for German/EU data residency
- United Kingdom — for UK data protection requirements
AI models are deployed regionally — Australian data uses Australian AI endpoints, Swiss data uses Swiss endpoints, and so on. Enterprise customers can discuss deployment in additional regions or dedicated/on-premises environments.
AI Data Handling
Dezcry runs its own AI models for redaction, classification, and summarisation. No document data is sent to third-party AI services. All AI inference happens within the same Azure environment as the rest of the platform:
- Classification and redaction use large language models deployed within the Azure environment
- Chat and summaries use a dedicated language model running on GPU infrastructure
- Embeddings are generated on CPU within the same container environment
AI-assisted redaction is designed as a reviewer aid, not an autonomous system. The AI surfaces likely sensitive content for human review. Reviewers approve, reject, or edit every suggestion before it is applied. All AI-generated suggestions and reviewer decisions are logged in the audit trail.
Customer data is never used to train or fine-tune models shared across tenants.