Documentation

Dezcry Platform
Documentation

Everything you need to know about using Dezcry - from ingesting documents through to disclosure-ready export.

Getting Started

Platform Overview

Dezcry is a self-service, AI-powered eDiscovery platform for privacy, legal, and compliance teams. It provides a complete workflow to ingest documents, review responsive material, apply AI-powered redactions, classify documents, search, and export disclosure-ready sets - all with a full audit trail and role-based access controls.

Unlike heavyweight eDiscovery suites, Dezcry is designed for internal teams who need a streamlined, defensible process without specialist eDiscovery admins or outsourced review support. All AI models run on internal infrastructure within the same Azure environment - no document data is sent to third-party AI services.

Key Capabilities

Ingest 100+ file types including PST, EML, ZIP, Office, PDF, images, audio, and video
Automatic deduplication, email threading, and NIST filtering
AI-assisted redaction with a five-step detection process
AI-assisted classification with custom fields and confidence scoring
eDiscovery-grade keyword search (Elasticsearch-powered, dtSearch equivalent)
LLM-powered document summaries and conversational document Q&A
AI OCR for image-heavy documents
Production-ready export with Bates numbering, load files, and burned redactions
Complete audit trail logging every action for regulatory defensibility
Role-based access control with matter-level permissions

Key Concepts

Concept	Description
Matter	A container for a single DSAR or investigation. All documents, redactions, classifications, exports, and audit logs are scoped to a matter. Matters have a unique code, client name, type, and status.
Document	A single file within a matter - an email, attachment, PDF, spreadsheet, image, audio, or video file. Each document has extracted text, metadata, a preview, and can carry reviewer decisions.
Family	A group of related documents - typically an email and its attachments. The parent email and child attachments share a family ID for grouped review.
Custodian	The person or data source from which documents were collected. Tracked per upload batch for chain-of-custody purposes.
Saved Search	A reusable query with filters that can be used as the scope for redaction, classification, export, or search term reports.
Redaction Set	A batch AI redaction job that processes a scope of documents through the five-step redaction process, producing redaction entries for review.
Classification Set	A batch AI classification job that applies custom decision fields to documents with confidence scoring.
Export Set	A configured export template with numbering, branding, and output settings that produces disclosure-ready packages.
Audit Log	An immutable record of every significant action taken in the platform, providing a defensible trail for regulators.

Signing In

Navigate to your Dezcry instance's login page and enter your email address and password. If your organisation has enabled two-factor authentication (2FA), you will be prompted to enter a time-based one-time password (TOTP) from your authenticator app after entering your credentials.

If you have been invited to join Dezcry, you will receive an email with a unique invitation link. Click the link to set up your password and configure 2FA. Invitation links are single-use and expire after a set period.

Session Management

Sessions automatically expire after 30 minutes of inactivity. Your session token is refreshed automatically every 20 minutes while you are active. If your session expires, a full-screen overlay will prompt you to sign in again - any unsaved work in progress is preserved in your browser.

Matters

Creating a Matter

A matter is the top-level container in Dezcry. Each DSAR, investigation, or review project is organised as a separate matter with its own documents, workflows, users, and audit trail.

To create a matter, navigate to the Matters page and click Create Matter (admin role required). You will be asked to provide:

Field	Description
Name	A descriptive name for the matter (e.g. "Smith DSAR - Q1 2025").
Matter Code	A unique 6-character alphanumeric code, auto-generated but editable.
Client Name	The organisation or client the matter relates to.
Matter Type	One of: DSAR, Investigation, Litigation, Cyber, or Other.
Description	Optional long-form description of the matter scope and objectives.
Summary Language	The language for AI-generated summaries (e.g. English, German, French).
Hosting Location	The Azure region for data residency (e.g. Australia, Switzerland, Germany, UK).

Matter Dashboard

Clicking into a matter takes you to the matter dashboard - the central workspace for that matter. The dashboard shows a searchable, filterable table of all documents in the matter, along with access to all matter-scoped features via the sidebar navigation:

Documents - browse, search, filter, and review all documents
Upload - ingest new documents into the matter
Redaction - create and manage AI redaction sets
Classification - configure and run AI classification jobs
Export - build and run disclosure-ready export packages
Search Terms - create keyword search term sets and reports
AI OCR - run optical character recognition on image documents
Password Bank - manage passwords for encrypted files
Audit - view the complete audit trail for this matter
Reporting - view analytics dashboards and metrics
Billing - view storage usage and costs for this matter

The document table supports bulk actions - select multiple documents to apply batch operations such as tagging, classification, or status changes. A background task tray shows the status of any running jobs (redaction, classification, export) in the matter.

Matter Settings

Matter settings control the behaviour of AI features and reviewer workflows within the matter. Administrators can configure:

Decision fields - custom fields that reviewers can set on each document (e.g. "Relevance", "Privilege Status", "Data Category"). Fields can be single-select, multi-select, or free text.
Summary language - the language used for AI-generated document summaries.
Matter status - open, closed, or archived. Closed matters are read-only; archived matters are hidden from the default view.

Document Ingestion

Uploading Documents

Navigate to the Upload page within a matter to ingest documents. Dezcry supports drag-and-drop file upload or traditional file selection. You can upload individual files or container files (PST, ZIP, 7Z, RAR, TAR, GZ) which will be automatically extracted.

Before processing begins, configure the following options:

Option	Description
Deduplication Mode	Choose "Global" to automatically identify and flag duplicate files across the entire matter using SHA-256 hashing. Duplicates are preserved but marked, saving reviewer time.
NIST Filtering	Enable to automatically filter out known system and runtime files (from the NIST National Software Reference Library) that are never relevant to review.
OCR	Enable to run Optical Character Recognition on image-based documents, extracting searchable text from scanned PDFs, photographs, and image files.
Email Threading	Enable to group related emails into conversation threads, identifying which messages are "inclusive" (contain unique content) versus non-inclusive duplicates.
Inclusive Only	When email threading is enabled, optionally exclude non-inclusive emails from the review workspace to reduce volume.

You may also specify custodian information and data source metadata for chain-of-custody tracking. Available data sources include: Laptop, Desktop, Server, O365 Email, O365 OneDrive, SharePoint, Google Workspace, Mobile Device, External Hard Drive, USB Drive, Network Share, Cloud Storage, Backup Tape, Database, and Other.

Supported File Types

Dezcry supports over 100 file types out of the box. During ingestion, all files are extracted, their text content is parsed, metadata is captured, and they are indexed for search.

Category	Formats
Email	PST, OST, EML, MSG, MBOX
Documents	DOCX, DOC, PDF, RTF, TXT, ODT
Spreadsheets	XLSX, XLS, CSV, ODS
Presentations	PPTX, PPT, ODP
Archives	ZIP, RAR, 7Z, TAR, GZ
Images	PNG, JPG, JPEG, TIFF, BMP, GIF (with OCR)
Audio	MP3, WAV, M4A, OGG, FLAC
Video	MP4, AVI, MOV, MKV, WEBM
Web / Data	HTML, XML, JSON, CSV

Deduplication

When global deduplication is enabled, Dezcry performs top-level exact deduplication - the standard approach used in eDiscovery. This is an important distinction: Dezcry identifies and removes files that are byte-for-byte identical based on their MD5 hash, but it does so at the top level of the document hierarchy.

What "Top-Level" Deduplication Means

In eDiscovery, "top-level" deduplication means dedup is applied to standalone documents and parent containers (emails, archives) rather than to individual attachments or child items in isolation. When a top-level file is identified as a duplicate, the entire document and its family (including all attachments) are removed together - preserving the integrity of document families.

This differs from "attachment-level" deduplication, which would independently remove individual attachments that appear across multiple emails. Top-level dedup preserves the complete context of each email and its attachments as a unit, which is critical for defensible review - a reviewer always sees the full email with all of its attachments intact, never a partial family.

It also differs from near-deduplication, which identifies files that are similar but not identical (e.g. different versions of the same document). Dezcry's deduplication is strictly exact-match - only byte-for-byte identical files are flagged.

Deduplication is scoped globally across the entire matter, meaning a file uploaded by one custodian will be deduplicated against files from all other custodians in the same matter. The first instance ingested is kept as the master documentand all subsequent identical copies are removed. Deduplication results include:

Master document - the first instance of each unique file, retained in the review set with full metadata and family relationships
Duplicate group - all copies of the same file, linked back to the master for audit purposes
Bytes saved - total storage savings from removing duplicate copies
Custodian tracking - the system records which custodians held copies of each deduplicated file, preserving chain-of-custody information even though the duplicate copies are removed from the active review set

The upload summary report details every duplicate group with file names, sizes, and the master document reference. This provides a defensible record of exactly what was deduplicated and why.

Email Threading

Email threading groups related emails into conversation threads, identifying the hierarchical reply chain. Threading is applied at the point of ingestion, which means non-inclusive emails are identified and can be excluded from the review workspace before any downstream processing occurs. This is a deliberate design choice - by filtering out redundant emails upfront, organisations save significantly on hosting costs (less storage, smaller search indices) and AI processing costs (fewer documents to classify, redact, and summarise).

Each email in a thread is classified as:

Inclusive - contains unique content or attachments not present in later messages in the thread. These are the messages reviewers should focus on, as they represent the most complete version of each point in the conversation.
Non-inclusive - the full content of this email is already contained in a later, more complete message in the thread. Reviewing these would be redundant, as the inclusive message already captures everything.

When the Inclusive Only option is enabled during upload, non-inclusive emails are excluded from the active review workspace entirely. They are still retained in the system for audit purposes, but they do not count toward hosting storage, are not indexed for search, and are not processed by AI classification, redaction, or summarisation - directly reducing costs.

Threading uses email headers (Message-ID, In-Reply-To, References) and the Microsoft Exchange Conversation Index to build accurate thread trees. The threading summary reports:

Total emails processed and how many were threadable
Number of inclusive vs. non-inclusive messages
Non-inclusive emails excluded from the review workspace
Thread groups identified
Any threading errors encountered

NIST Filtering

NIST filtering removes known system files, operating system components, and software runtime files from the review set. These files are identified by matching their hash values against the NIST National Software Reference Library (NSRL) - a comprehensive database of known, non-relevant system files.

NIST-filtered files are flagged and excluded from the active review workspace but are retained in the system for audit purposes. The upload summary reports the count and details of filtered files.

Processing Exceptions

During ingestion, some documents may encounter processing exceptions. Dezcry tracks and reports these in the upload summary:

Exception Type	Description
Encrypted	Password-protected files that could not be decrypted. Add passwords to the Password Bank and re-process.
Corrupt	Files that are malformed, truncated, or otherwise unreadable.
Unsupported Format	File types that Dezcry does not currently support for text extraction.
Text Extraction Failed	Files where the content could not be extracted despite being a supported format.

Each exception includes the document ID, filename, exception type, and a descriptive message to help diagnose and resolve the issue.

Upload Batches

Every upload creates a processing batch with a unique display ID (e.g. UPL-001). Navigate to the Uploads page to view all batches for the matter, including:

Batch status (processing, completed, failed)
Total files submitted and processed
Counts by outcome (processed OK, encrypted, corrupt, duplicates, NIST-filtered)
Decryption results (successful, failed)
Children extracted (attachments from container files)
File type distribution
Processing duration
Upload set MD5 hash for chain-of-custody verification

Click into any batch to see the detailed processing report, including per-document exception details, deduplication groups, and threading statistics.

Document Review

Document List

The main matter workspace displays all documents in a searchable, sortable table. Each row shows the document's filename, type, status, size, custodian, and any applied tags or decisions. Key features include:

Full-text search - keyword search across document content, filenames, and email metadata using eDiscovery-grade Elasticsearch
Column filters - filter by status, file type, custodian, date ranges, tags, relevance coding, and custom decision fields
Bulk selection - select multiple documents for batch operations like tagging, decision coding, or export
Sort - sort by any column including filename, date, size, relevance, or type
Saved searches - save any combination of search query and filters for reuse

Document Viewer

Click any document to open the full document viewer. The viewer provides a rich, multi-panel interface for reviewing individual documents:

Document display - native rendering of the document with zoom controls (0.25x to 3x)
Three viewing tabs: Original (native format), Markup (with redaction overlays), and Text (extracted plain text with search highlighting)
Metadata panel - document properties, email headers, file hashes, and processing info
Decisions panel - set relevance, hot-document flag, comments, and custom decision fields
Family panel - view parent/child relationships (e.g. email and attachments)
Chat panel - ask questions about the document using AI
Navigation - previous/next buttons with keyboard shortcuts for rapid sequential review

Performance

The document viewer uses a prefetch cache that pre-loads adjacent documents (previous and next) in the background. This provides near-instant navigation when reviewing documents sequentially. The cache holds up to 50 documents with a 2-minute TTL.

Native File Viewers

Dezcry includes purpose-built viewers for every supported file type, rendering documents directly in the browser without requiring any plugins or downloads:

Viewer	File Types	Features
PDF Viewer	PDF files	Page-by-page rendering, zoom, scroll, text selection, search highlighting
Image Viewer	PNG, JPG, TIFF, BMP, GIF	Pan and zoom, fit-to-width/height, full-resolution display
DOCX Viewer	Word documents (DOCX)	Formatted text rendering with styles, headers, lists, and tables
PPTX Viewer	PowerPoint (PPTX)	Slide-by-slide rendering with layouts and formatting
Spreadsheet Viewer	XLSX, XLS, CSV	Multi-sheet tabs, column/row headers, cell formatting, frozen panes
Text Viewer	TXT, LOG, HTML, XML, JSON	Syntax-highlighted text with line numbers and search
Audio Viewer	MP3, WAV, M4A	Audio player with waveform, playback controls, and timestamp display
Video Viewer	MP4, AVI, MOV	Video player with playback controls, full-screen mode
Markup Viewer	Any document with redactions	Redaction overlay rendering with colour-coded entity categories

Metadata Panel

The metadata panel displays all extracted properties for the current document. For email files, this includes:

From, To, CC, BCC addresses
Subject line
Date sent and date received
Message-ID and conversation threading references
Attachment count and list

For all documents, the metadata panel shows:

File size, MIME type, and document type
MD5 and SHA-256 hashes (for integrity verification)
Created and modified dates
Author (when available from document properties)
Source folder path from the original container
OCR status and AI summary (when available)
Processing status and any error messages

Decisions Panel

The decisions panel is where reviewers record their assessments. Every decision is timestamped and logged in the audit trail. Available fields:

Relevance - mark the document as Responsive, Non-Responsive, or other custom values
Hot Document - flag important or significant documents for attention
Decision Comment - free-text annotation explaining the reviewer's reasoning
Custom Decision Fields - any additional fields configured at the matter level (single-select, multi-select, or free text)

Optimistic Locking

Dezcry uses optimistic locking on document decisions to prevent overwrite conflicts when multiple reviewers work on the same matter. Each document tracks a version number that is incremented on every update. If two reviewers attempt to save changes to the same document simultaneously, the second save will receive a conflict error and be asked to refresh before re-applying their changes.

Family Documents

Documents extracted from container files (emails with attachments, ZIP archives) are automatically grouped into families. A family consists of a parent document (e.g. an email) and its child documents (e.g. attachments).

The family panel in the document viewer shows all related documents, allowing reviewers to quickly navigate between a parent email and its attachments. Family relationships are preserved throughout all workflows - search results can include family expansion, and exports can group family members together.

Tagging

Documents can be tagged with relevance codes and custom decision field values. Tags are set through the decisions panel in the document viewer or via bulk actions on the document list. All tagging actions are logged in the audit trail with the reviewer's identity and timestamp.

Metadata

Overview

Every document ingested into a matter has a rich set of metadata fields automatically extracted during processing. Dezcry captures over 60 metadata fields per document - covering everything from basic file properties and email headers to AI-generated summaries and reviewer decisions. These fields are available for filtering, sorting, column display, search, and export throughout the platform.

Metadata is extracted at the point of ingestion with no manual effort required. For email files, Dezcry parses all standard headers including threading references. For Office documents and PDFs, embedded properties such as author, title, and creation date are captured. For images, EXIF data including camera make, GPS coordinates, and timestamps is preserved. All dates are normalised to UTC for consistent cross-timezone analysis.

Why Metadata Matters in eDiscovery

Metadata is critical for defensible review workflows. Fields like hash values (MD5, SHA-256) provide chain-of-custody integrity. Date fields enable precise date-range filtering to narrow review sets. Email threading metadata allows reviewers to focus only on inclusive messages. And custodian tracking across duplicates ensures nothing is lost even when redundant copies are removed. All metadata fields listed below are available in load file exports (DAT, CSV, XLSX) for downstream use in Relativity, Nuix, or other review platforms.

Core Document Fields

These fields are present on every document regardless of file type. They provide the fundamental identifiers, file properties, and processing information needed for document management and chain-of-custody tracking.

Field	Type	Description
doc_id	String	Unique document identifier within the matter (e.g. DOC-000001). This is the primary reference used across the platform - in search results, exports, audit logs, and cross-references.
doc_seq	Integer	Sequential number assigned during ingestion, used for sorting and Bates-style numbering in exports. Sequences are unique within each matter and assigned in upload order.
filename	String	Original filename of the document as it existed in the source data. Preserved exactly as found for defensibility - no renaming or sanitisation is applied.
mime	String	MIME type of the file (e.g. application/pdf, message/rfc822). Determined by both file extension and magic-byte analysis for accurate identification.
document_type	String	Enriched document category - Email, PDF, Word, Excel, PowerPoint, Image, Text, Archive, Audio, Video, or Other. Useful for filtering the document list by file type.
size_bytes	Integer	File size in bytes. Displayed in human-readable format (KB, MB) in the UI. Useful for identifying unusually large or suspiciously small files.
source_folder	String	Original folder path within the source container - e.g. the PST folder hierarchy (Inbox/Projects/2024), ZIP directory path, or nested archive structure. Preserves the organisational context of the original data.
date_created_utc	DateTime	File creation date in UTC. For office documents, extracted from embedded document properties. For other files, derived from filesystem timestamps or container metadata.
date_modified_utc	DateTime	File last-modified date in UTC. Critical for date-range filtering in review workflows and for establishing document timelines.
md5	String	MD5 hash of the file content (32 hex characters). Used for deduplication across the matter and for chain-of-custody integrity verification in exports.
sha256	String	SHA-256 hash of the file content (64 hex characters). Provides a cryptographically strong integrity fingerprint for defensible production.
status	String	Processing status - queued (awaiting processing), processing (currently being ingested), ready (successfully processed and available for review), or failed (encountered an error).
processing_error	String	Detailed error message if processing failed. Helps diagnose issues such as password-protected files, corrupted archives, or unsupported formats.
processing_dataset	String	Upload batch identifier (e.g. UPL-001) linking the document to its ingestion batch. Useful for tracking which upload set a document belongs to and viewing batch-level statistics.

Family & Hierarchy Fields

Documents extracted from container files - such as emails with attachments, ZIP archives, or nested PST folders - are automatically grouped into families. Family relationships are critical for defensible review: reviewers see each email alongside its attachments, and exports can group family members into the same volume for production.

Field	Type	Description
family_id	String	Family group identifier. For parent documents (e.g. an email), this equals the document's own doc_id. For child documents (e.g. attachments), this inherits the parent's family_id - linking the entire family together for grouping, export, and review.
parent_id	UUID	ID of the parent document (e.g. the email that contained this attachment). Null for top-level standalone documents. Enables the family tree view in the document viewer, where reviewers can navigate between a parent and all of its children.

Family Integrity in Exports

When exporting documents, Dezcry preserves family relationships in the load file. Parent documents and their children are linked via the family_id and parent_id fields, allowing downstream review platforms (Relativity, Nuix, etc.) to reconstruct the family hierarchy. The export wizard also supports family-based volume grouping to keep related documents together.

Email Fields

Email is often the most important data type in eDiscovery. Dezcry extracts a comprehensive set of email metadata from both EML and MSG formats, including messages extracted from PST, OST, and MBOX containers. These fields are stored as first-class database columns for efficient filtering, sorting, and field-specific search (e.g. from:john@acme.com).

Field	Type	Description
email_from	String	Sender email address and display name (e.g. "John Smith <john@acme.com>"). Searchable via the from: field prefix in keyword search.
email_to	String	Recipient email addresses (semicolon-separated). Supports multiple recipients. Searchable via the to: field prefix.
email_cc	String	CC (carbon copy) recipient email addresses (semicolon-separated). Searchable via the cc: field prefix.
email_bcc	String	BCC (blind carbon copy) recipient email addresses (semicolon-separated). Searchable via the bcc: field prefix. Only available when the source data includes BCC headers (typically only in the sender's mailbox).
email_subject	String	Email subject line. Searchable via the subject: field prefix. Commonly used for keyword search and classification workflows.
email_message_id	String	RFC 2822 Message-ID header - a globally unique identifier assigned by the sending mail server. Used internally for email threading and deduplication.
email_date_sent_utc	DateTime	Date and time the email was sent, normalised to UTC. This is the primary date field used for email date-range filtering and timeline analysis.
email_date_received_utc	DateTime	Date and time the email was received, normalised to UTC. May differ from date_sent due to delivery delays or timezone differences between sender and recipient servers.
email_attachments_json	JSON	Structured attachment summary containing the count and list of filenames (e.g. {count: 3, names: ["report.pdf", "data.xlsx", "photo.jpg"]}). Useful for quickly identifying emails with specific attachments without opening them.
email_in_reply_to	String	Message-ID of the email this is a direct reply to. Used by the threading engine to build the conversation tree.
email_references	String	Ordered chain of Message-IDs representing the full conversation history. Each reply appends its parent's Message-ID, creating a breadcrumb trail through the thread.
email_conversation_index	String	Microsoft Exchange PR_CONVERSATION_INDEX - a hex-encoded binary value present in Outlook/Exchange-originated messages. Provides precise thread positioning even when standard headers are missing or unreliable.
email_thread_index	String	Hierarchical thread position path computed by Dezcry (e.g. "a1b2c3d4+0001+0002"). Encodes the exact tree position for correct chronological sort order and branch identification within conversation views.

Email Search Capabilities

All email metadata fields are indexed in the search engine. You can use field-specific search prefixes to target individual fields - for example, from:john@acme.com AND subject:"quarterly report" or to:legal@company.com AND date >= 2024-01-01. See the Search Syntax section for the full list of supported field prefixes and operators.

Email Threading Fields

These fields are computed by Dezcry's email threading engine during ingestion. Threading groups related messages into conversation trees and identifies which messages areinclusive (containing unique content a reviewer must see) versus non-inclusive (redundant messages whose content is fully captured by a later reply). This can reduce the review set by 40–60% in email-heavy matters, directly lowering review time and AI processing costs.

Field	Type	Description
email_thread_group_id	UUID	Identifier of the conversation thread group this email belongs to. All emails in the same conversation share this ID, enabling thread-level grouping and navigation in the document viewer.
email_thread_indentation	Integer	Depth within the thread tree (0 = the root/original message, 1 = a direct reply, 2 = a reply to a reply, etc.). Used for visual indentation in conversation views.
is_inclusive_email	Boolean	Whether this email is inclusive - meaning it contains unique message content or attachments not present in any later message in the thread. Null if threading was not enabled for this document. Inclusive emails are the minimum set a reviewer needs to see.
inclusive_reason	String	Explains why the email is inclusive: unique_message_content (body text not found in later replies), unique_attachment (has an attachment not in later messages), unanalyzed_attachment (attachment could not be compared), root_message (first message in thread), or threading_error (could not determine inclusiveness).

Inclusive-Only Review Mode

When "Inclusive Only" is enabled during upload, non-inclusive emails are excluded from the active review workspace entirely. They are still retained in the system and can be accessed via the thread view for context, but they do not appear in the main document list, are not processed by AI classification or redaction, and do not count toward storage. This is the recommended approach for matters with large email volumes where cost efficiency is a priority.

OCR Fields

Dezcry automatically detects documents that contain no extractable text - such as scanned PDFs, photographs of documents, and image files - and flags them for OCR (Optical Character Recognition). Once OCR is run, the extracted text becomes fully searchable and available for AI processing.

Field	Type	Description
ocr_required	Boolean	Whether the document requires OCR to extract searchable text. Automatically set to true during ingestion for scanned PDFs, image-only PDFs, and image files (JPEG, PNG, TIFF, BMP). Documents with existing embedded text are set to false.
ocr_status	String	Current OCR processing status: not_applicable (document has embedded text, OCR not needed), completed (OCR finished successfully, text extracted), failed (OCR attempted but encountered an error), partial (some pages processed successfully), or skipped (OCR not run yet despite being required).

Deduplication Fields

When global deduplication is enabled during upload, Dezcry identifies byte-for-byte identical files across the entire matter using hash matching. The first instance is retained as the master document and subsequent copies are flagged as duplicates. Deduplication is applied at the top level - meaning entire families (email + attachments) are deduplicated as a unit, preserving family integrity. See the Deduplication section for full details.

Field	Type	Description
is_duplicate	Boolean	Whether this document is a duplicate of another document in the matter. Duplicate documents are excluded from the active review set but retained for audit and export purposes.
duplicate_of_id	UUID	ID of the master document this is a duplicate of. Allows reviewers and exports to trace back to the retained copy. The master document is always the first instance ingested.
duplicate_custodian_info	String	Records which custodians held copies of this document. Critical for defensibility - even though duplicate copies are removed from the review set, this field preserves a complete record of who possessed the document across all data sources.

NIST Filtering Fields

NIST filtering (also known as "de-NISTing") removes known system files, operating system components, and application runtime files from the review set by matching file hashes against the NIST National Software Reference Library (NSRL). This is a standard eDiscovery practice that eliminates files that are never relevant to review - such as Windows DLLs, Office templates, and browser cache files - often removing 10–30% of a dataset before review begins.

Field	Type	Description
is_nist_filtered	Boolean	Whether this file was identified as a known system or application file via NIST NSRL hash matching. Filtered files are excluded from the active review workspace but retained in the system for audit and reporting.
nist_product_name	String	Name of the software product the file belongs to according to the NSRL database (e.g. Microsoft Windows 11, Adobe Acrobat Reader, Google Chrome). Helps identify why a file was filtered and provides context in exception reports.

Encryption & Integrity Fields

Dezcry performs detailed analysis of every file during ingestion to detect encryption, corruption, and file-type mismatches. These fields provide a complete picture of each document's integrity status - essential for eDiscovery exception reporting and ensuring no documents are silently missed during processing.

Field	Type	Description
is_encrypted	Boolean	Whether the document is encrypted or password-protected. Encrypted files cannot be processed until decrypted - add the password to the Password Bank and re-process, or note the exception in reporting.
encryption_type	String	Detailed encryption classification: password_protected (standard Office/PDF password), drm_protected (Digital Rights Management), pgp_encrypted (PGP/GPG encryption), smime_encrypted (S/MIME email encryption), or bitlocker (full-disk encryption artefact). Helps IT teams determine the appropriate decryption method.
is_corrupt	Boolean	Whether the document is corrupted or malformed. Corrupt files are flagged as processing exceptions and included in exception reports for transparency.
corruption_type	String	Detailed corruption classification: truncated (file cut short), malformed_header (invalid file header), invalid_structure (internal structure errors), or zero_byte (empty file). Provides actionable detail for troubleshooting or re-collection from the source.
file_signature	String	File magic-bytes signature detected by inspecting the file's binary header (e.g. "PDF-1.4", "PK (ZIP)", "JPEG/JFIF"). Independent of file extension - provides the true format identity.
file_signature_mismatch	Boolean	Whether the file extension does not match the actual content detected by magic bytes (e.g. a .docx file that is actually a renamed .exe). Important for identifying potentially suspicious or mis-labelled files in forensic review.
is_decrypted	Boolean	Whether the document was successfully decrypted during processing using a password from the Password Bank or provided at upload time.
decryption_method	String	How the document was decrypted: global_password_bank (matched against the matter's stored passwords) or upload_password (password provided during the upload that contained this file). Provides an audit trail for decryption actions.

File Signature Analysis

Dezcry inspects the binary magic bytes of every file to determine its true format, independent of the file extension. When a mismatch is detected (e.g. a .xlsx file that is actually a ZIP archive, or a .pdf that is actually a JPEG image), the file_signature_mismatch flag is set. This is valuable for identifying files that have been intentionally renamed to evade review, a common tactic in investigations and litigation.

Processing Exception Fields

In any eDiscovery matter, a percentage of documents will encounter processing issues. Dezcry categorises every exception with a type and action, providing the structured data needed for defensible exception reporting. These fields are included in exports and processing batch reports so that legal teams have a complete record of what was - and was not - successfully processed.

Field	Type	Description
exception_type	String	The category of processing exception: encryption (password-protected or encrypted file), corruption (malformed or damaged file), unsupported_format (file type not supported for text extraction), or text_extraction_failed (supported format but extraction encountered an error). Used for filtering and reporting on processing outcomes.
exception_action	String	The action Dezcry took in response to the exception: processed_with_errors (partial processing completed with some issues noted), skipped (document could not be processed at all), partial_extraction (some content was extracted but the process did not complete fully), or placeholder_created (a placeholder entry was created for tracking and reporting purposes). Provides transparency for legal teams assessing completeness.

AI & Processing Fields

Dezcry uses AI to automatically generate document summaries, apply redactions, and produce document previews. These fields track the status and outputs of each AI-powered workflow, allowing reviewers to quickly see which documents have been summarised, redacted, or are still awaiting processing.

Field	Type	Description
llm_summary	String	AI-generated 1–2 sentence summary of the document's content. Summaries are produced automatically after ingestion and displayed in the document list and viewer. Useful for quickly triaging documents without opening them - reviewers can scan summaries to identify relevant documents faster.
markup_status	String	Redaction and annotation workflow status: not_started (no redactions applied), pending (redaction in progress), complete (all redactions applied and markup generated), or failed (an error occurred during markup generation). Documents with markup_status of "complete" have a fully redacted preview available.
markup_page_count	Integer	Total number of pages in the markup document. Populated after markup generation completes. Useful for estimating review effort and for page-level redaction tracking in production reports.
preview_status	String	Document preview generation status: none (no preview requested), queued (awaiting generation), generating (currently being converted), ready (preview available for viewing), or error (generation failed). Previews convert native formats to viewable HTML/PDF for in-browser document review.

Reviewer Decision Fields

These fields are set by reviewers during document review through the Decisions Panel in the document viewer, or via bulk actions on the document list. Every change to these fields is timestamped, attributed to the reviewer, and logged in the audit trail for full defensibility. Optimistic locking prevents conflicting edits when multiple reviewers work on the same matter simultaneously.

Field	Type	Description
relevance	String	Relevance classification assigned by the reviewer - typically Responsive, Non-Responsive, or Privileged, but fully customisable at the matter level. This is the primary coding field used to separate relevant documents from the rest of the dataset.
hot_document	Boolean	Flag indicating the document is particularly significant - a "smoking gun" or key evidence that warrants elevated attention. Hot documents are visually highlighted in the document list and can be filtered for quick access.
decision_comment	String	Free-text annotation where reviewers explain their reasoning for the relevance decision. Useful for quality control, second-pass review, and providing context to senior reviewers or legal counsel.
relevance_coded_at	DateTime	Timestamp of when the relevance decision was last recorded. Used for review progress tracking, productivity metrics, and audit trail purposes. Updated each time the reviewer modifies their decision.

Custom Decision Fields

In addition to the built-in fields above, matters can be configured with custom decision fields - single-select dropdowns, multi-select tags, or free-text fields - to capture matter-specific coding such as issue codes, privilege categories, or confidentiality designations. Custom fields are fully exportable and appear in the decisions panel alongside the standard fields. See Custom Fields for configuration details.

Extended Metadata (metadata_json)

In addition to the first-class fields above, each document contains an extended metadata object with format-specific properties organised by namespace. These fields capture the full depth of information embedded within each file type - from PDF authoring tools to image EXIF geolocation data to email authentication results. Extended metadata is viewable in the metadata panel and included in exports.

Namespace	Document Types	Fields
general	All documents	filename, extension, mime, document_type, size_bytes, upload_batch_id. Present on every document as the baseline property set.
email	EML, MSG	from, to, cc, bcc, subject, message_id, in_reply_to, references, conversation_index, date_sent_utc, date_received_utc, attachments (count and names). Also includes email authentication results: dkim_result, spf_result, and dmarc_result - useful for identifying spoofed or unauthenticated messages.
pdf	PDF files	title, author, subject, producer (the application that generated the PDF), creator (the originating application), creation_date_utc, modification_date_utc, page_count, is_encrypted. Extracted from both the PDF info dictionary and XMP metadata streams when available.
ooxml	Word, Excel, PowerPoint (DOCX, XLSX, PPTX)	Core properties: created, modified, title, subject, creator, lastModifiedBy, revision, keywords, description, category. Application properties: application (e.g. Microsoft Excel), company, template. These are the properties visible in a file's "Properties" dialog in Microsoft Office.
image	JPEG, PNG, TIFF, BMP, GIF	format (e.g. JPEG, PNG), mode (e.g. RGB, RGBA), width, height. EXIF data (when available): DateTimeOriginal, DateTimeDigitized, Make (camera manufacturer), Model (camera model), Software, Orientation, XResolution, YResolution, and GPSInfo (latitude, longitude, altitude). EXIF geolocation data can be critical in investigations involving photographs.

Email Authentication (DKIM, SPF, DMARC)

For email documents, Dezcry extracts the authentication results from email headers when present.DKIM (DomainKeys Identified Mail) verifies the email was not altered in transit.SPF (Sender Policy Framework) checks that the sending server is authorised for the domain. DMARC (Domain-based Message Authentication) combines both checks. These results can help identify spoofed or potentially fraudulent emails during an investigation.

Search

Keyword Search

Dezcry provides eDiscovery-grade keyword search powered by Elasticsearch, delivering capabilities equivalent to dtSearch at scale. The search engine supports millions of documents with sub-second query response times.

Search is available from the main document list via the search bar. Results are ranked by relevance with hit highlighting, and all searches return exact counts (never approximate). Search results can be filtered further using column filters and saved for reuse.

The following fields are indexed and searchable:

Full document text content
Filename and file path
Email fields: subject, from, to, cc, bcc
Author, custodian, document type, MIME type
MD5 and SHA-256 hashes
Tags, dates (created, modified, sent, received)

Search Syntax

Dezcry supports the full range of eDiscovery search syntax:

Syntax	Example	Description
Boolean AND	contract AND liability	Both terms must appear in the document
Boolean OR	merger OR acquisition	Either term must appear
Boolean NOT	confidential NOT public	First term must appear, second must not
Grouping	(merger OR acquisition) AND confidential	Parentheses control operator precedence
Phrase	"privileged communication"	Exact phrase match, preserving word order
Proximity	"contract breach"~5	Terms must appear within 5 words of each other
W/N (dtSearch)	merger W/5 acquisition	dtSearch-style proximity - terms within N words
Wildcard (prefix)	priv*	Matches privilege, privileged, privacy, etc.
Wildcard (suffix)	*mail	Matches email, voicemail, etc.
Wildcard (single)	h?t	Matches hat, hit, hot, hut, etc.
Fuzzy	colour~	Matches similar spellings (Levenshtein distance)
Fuzzy (explicit)	colour~2	Matches within edit distance of 2
Field-specific	subject:"quarterly earnings"	Search within a specific field
Field (email)	from:john@acme.com	Search the From email field
Field (filename)	filename:report.xlsx	Search by filename
Date range	date >= 2020-01-01	Filter by date
Date range	date:2020-01-01..2022-12-31	Date range with start and end

Stemming

Searches automatically apply stemming - searching for "run" will also match "running", "ran", and "runs". This is handled by the Elasticsearch analyzer and provides more comprehensive results without requiring wildcard syntax.

Search Term Sets

Search Term Reports allow you to define a set of keywords and run them against a scope of documents to measure hit rates. This is commonly used for:

Validating keyword lists before full review
Measuring the prevalence of specific topics in the collection
Producing defensible search term hit reports for regulators
Identifying which custodians or data sources contain relevant material

To create a search term report, navigate to Search Terms within a matter:

1
Create a report - Give it a name and select the scope (all documents or a saved search).
2
Add search terms - Enter your keywords one at a time. Each term can be up to 450 characters and supports the full search syntax.
3
Configure options - Enable "Include family hits" to count documents whose family members match. Enable "Tag hits" to create per-document hit records.
4
Run the report - Dezcry executes each search term against the scope and records hit counts.

Search Term Reports

Once a search term report has completed, you can view detailed results:

Per-term hit counts - number of documents matching each search term (direct and family hits)
Unique hits - documents that match only this specific term
Colour-coded highlighting - each term can be assigned a custom highlight colour for visual identification in the document viewer
Scope summary - total documents in scope, total documents with at least one hit
Term status - individual status tracking for each term (pending, running, completed, error)

When tag hits is enabled, you can filter the document list to show only documents that matched a specific search term, enabling targeted review of keyword-responsive material. Search term highlights persist in the document viewer text tab, showing matching terms with their assigned colours.

Saved Searches

Any combination of search query and column filters can be saved as a named searchfor later reuse. Saved searches are a core building block in Dezcry - they serve as the scope selector for redaction, classification, export, and search term reports.

Property	Description
Name	A unique name within the matter for easy identification
Description	Optional long-form description of what the search captures
Visibility	Shared (visible to all matter users) or Private (creator only)
Pinned	Pin frequently-used searches to the top of the list
Tags	Categorise searches (e.g. "Privilege", "Review", "Production")
Query + Filters	The full search query and column filter configuration

When a saved search is used as the scope for a job (redaction, classification, or export), the document set is frozen at the time the job starts. This means the job processes the documents that matched at that moment, even if new documents are added to the matter later - providing defensibility and reproducibility.

AI Classification

Overview

AI Classification lets you automatically categorise documents using custom decision fieldsdefined by your team. Unlike manual review, AI classification processes entire document sets in minutes, producing predictions with calibrated confidence scores so reviewers can focus their attention on genuinely ambiguous items while high-confidence predictions are applied automatically.

Classification runs on large language models within the same Azure environment as the rest of the platform - no document data leaves your deployment. The system includes confidence debiasing to correct for known LLM overconfidence, a verification pass for borderline predictions using a separate model, and intelligent document chunking for long documents. Every prediction includes a calibrated confidence score and rationale, and all decisions are logged in the audit trail.

How Classification Differs from Redaction

Classification and redaction serve different purposes. Classification assigns labels to entire documents - categorising them by type, relevance, sensitivity, or any custom taxonomy your team defines. Redaction identifies and removes specific text within documents. Classification helps your team decide what to do with a document; redaction helps you prepare it for disclosure.

Custom Fields

Before running a classification job, you define the decision fields that the AI should predict. These are entirely customisable - you define the field names, types, options, and instructions that are specific to your review. Navigate to Classification within a matter to configure fields.

Field Type	Description	Example
Single Select	The AI chooses exactly one value from a predefined list of options. Best for mutually exclusive categories.	Relevance: Responsive / Non-Responsive / Partially Responsive
Multi Select	The AI can select one or more applicable values from a list. Best for non-exclusive labels.	Data Categories: Financial / Medical / Employment / Personal
Boolean	A simple yes/no decision.	Contains PII: true / false
Free Text	The AI provides a short free-text response. Best for summaries or descriptions.	Key Topics: One-sentence description of the document content

For each field, you provide natural-language instructions that tell the AI exactly how to evaluate documents. The quality of these instructions directly affects classification accuracy. Dezcry provides a real-time quality indicator as you write:

Quality Level	Length	Guidance
Poor	Under 10 characters	Too short to be useful - the AI has no context for making decisions. Add specific criteria, examples, and edge case guidance.
Fair	10–50 characters	Basic direction, but lacks nuance. Adding more detail about what qualifies for each option and how to handle ambiguous cases will improve accuracy.
Good	50–200 characters	The AI has enough context to make reliable predictions. Consider adding examples of borderline cases.
Excellent	200+ characters	Detailed instructions with clear criteria, examples, and edge case handling. This produces the most accurate and consistent results.

Writing Effective Instructions

Good classification instructions should include:

Clear criteria - what makes a document qualify for each option
Examples - concrete examples of what belongs in each category
Edge cases - how to handle ambiguous or borderline documents
Context - relevant background about the matter, industry, or regulatory framework
Negative examples - what should not be classified as a given category

For example, instead of "Is this relevant?", write: "Classify as Responsive if the document contains information about the data subject's employment history, salary, performance reviews, or HR communications. Classify as Non-Responsive if the document is a system-generated notification, marketing material, or relates to a different individual. Classify as Partially Responsive if the document contains some relevant content mixed with unrelated material."

Classification Sets

A classification set is a reusable configuration that defines which fields to predict, how the AI should behave, and what confidence thresholds to apply. Classification sets can be run multiple times - for example, after adding new documents to the matter. To create and run a classification:

1
Select scope - Choose all documents or a saved search to define which documents to classify. The scope is frozen at run time - new documents added later won't be included in this run.
2
Name the set - Give the classification set a descriptive name for tracking and audit purposes.
3
Configure fields - Define one or more custom decision fields with types, options, and natural-language AI instructions.
4
Set thresholds - Configure the auto-accept threshold (default 0.85) and review threshold (default 0.60) to control how predictions are routed.
5
System prompt (optional) - Provide an optional system-level prompt that applies to all fields - useful for setting overall context like the matter type, jurisdiction, or review protocol.
6
Optional sampling - For large document sets, configure prevalence sampling to validate classification quality on a subset before committing to a full run.
7
Review and launch - Review all settings in a summary view and start the classification job.

Confidence Thresholds and Routing

Dezcry uses a three-tier routing system based on calibrated confidence scores to determine how each prediction is handled:

Confidence Range	Routing	Description
Above auto-accept (default: > 0.85)	Auto-applied	The prediction is applied automatically without requiring human review. The AI is highly confident and the prediction is defensible.
Between review and auto-accept (default: 0.50–0.85)	Flagged for review	The prediction is saved but flagged as needs_review. A human reviewer must approve, correct, or reject it before it is applied.
Below review threshold (default: < 0.50)	Indeterminate	The AI could not make a reliable prediction. The document is flagged for manual coding by a reviewer.

Both thresholds are configurable per classification set, allowing teams to tune the trade-off between automation and human oversight based on the risk profile of the review. A high-stakes privilege review might use a lower auto-accept threshold (0.95) to ensure more human review, while a routine document-type classification might use a higher threshold (0.80) to maximise automation.

Confidence Calibration (Debiasing)

LLMs are known to be systematically overconfident - they tend to report confidence scores of 0.90 or 0.95 even when their actual accuracy is closer to 0.80–0.85. This is particularly problematic in eDiscovery where confidence thresholds drive review decisions.

Dezcry applies empirical confidence debiasing - a calibration layer that adjusts raw LLM confidence scores to better reflect true accuracy. The calibration is:

Monotonic - higher raw confidence always produces higher calibrated confidence (preserves ranking)
Deterministic - the same input always produces the same output (defensible in regulatory contexts)
Conservative - systematically pulls overconfident scores toward empirical accuracy curves

The calibration is based on published research on LLM confidence calibration and fitted to eDiscovery-specific accuracy measurements. It compresses the overconfident tail (0.85–0.99) more aggressively than the well-calibrated low-confidence range (0.05–0.50).

Verification Pass

For predictions that fall in a borderline confidence range (0.35–0.70 by default), Dezcry automatically triggers a verification pass - a second classification attempt using a different model deployment. This functions as a quality control layer:

The verification pass uses a different prompt persona ("QC reviewer") to challenge the initial classification
It uses a separate model deployment for model diversity, reducing correlated errors
If the verification agrees with the first pass, the confidence scores are averaged (typically increasing the final confidence)
If the verification disagrees, the lower confidence score is used, the verification's classification is adopted, and the result is force-flagged for human review

Document Chunking for Long Documents

Documents that exceed the model's context budget (default: ~112,000 characters) are automatically split into deterministic chunks for processing. Chunking is designed to maintain classification accuracy:

Sentence-boundary aware - chunks are split at sentence boundaries, never mid-sentence, preserving semantic coherence
Overlapping - adjacent chunks share ~200 characters of overlap, ensuring context continuity across chunk boundaries
Deterministic - the same document always produces the same chunks, ensuring reproducible results
Fallback splitting - if a single sentence exceeds the chunk limit, it falls back to word-boundary splitting with overlap

When a document is chunked, each chunk is classified independently, and results are aggregated using a weighted voting system:

Each chunk's prediction is weighted by its confidence score
Chunks that return null (no classifiable content) are excluded from the vote, not counted as evidence
The winning prediction is determined by total confidence-weighted score, with tie-breaking by peak single-chunk confidence
A unanimity bonus increases confidence when all chunks agree; disagreement reduces it
A dissent penalty is applied when any dissenting chunk has high confidence (≥ 0.70), with a note recommending manual review

Chunk Disagreement

When different chunks of a document produce different classifications, this is flagged as chunk disagreement and the document is automatically flagged for human review. This is an important quality signal - it often indicates that a document contains mixed content (e.g. a partially responsive document where some sections are relevant and others are not). The aggregated rationale includes a note about the dissenting chunks and their confidence levels.

Classification sets track runs with detailed progress reporting: total documents, documents processed, errors encountered, and token usage for cost attribution. Completed runs automatically create a saved search containing the classified documents for downstream processing.

Classification runs support parallel processing - multiple documents are classified concurrently (default: 6 simultaneous LLM calls) to maximise throughput while staying within AI rate limits. Runs can be cancelled at any time, and cancellation takes effect cleanly after the current document finishes processing.

The classification progress view shows real-time processing with a live console, document-by-document results including confidence scores, and estimated time remaining. You can continue working while classification runs in the background.

Reviewing Predictions

After a classification run completes, reviewers can examine the results. Each document receives a result for every configured field, containing:

Field	Description
Predicted Value	The AI's chosen classification for this field (e.g. "Responsive", "Financial"). Null if the AI could not determine a classification.
Confidence Score	A calibrated 0.0–1.0 score reflecting the AI's certainty. Debiased to correct for LLM overconfidence.
Rationale	A short natural-language explanation of why the AI made this prediction, referencing specific content in the document.
Needs Review	Boolean flag - true if the confidence is below the auto-accept threshold, if chunks disagreed, or if the verification pass overrode the initial classification.
Chunk Count	How many chunks the document was split into (1 for short documents that fit in a single context window).
Chunk Disagreement	Whether different chunks of the document produced different predictions - a signal that the document may contain mixed content.
Verification Status	Whether the verification pass was triggered and whether it agreed or disagreed with the initial classification.

Reviewers can take the following actions on any prediction:

Approve - accept the AI's prediction as the final decision for this document and field
Correct - override the AI's prediction with a different value chosen by the reviewer. The correction is logged alongside the original AI prediction for audit purposes.
Reject - dismiss the prediction entirely, leaving the field uncoded for this document

All review actions are logged in the audit trail with the reviewer's identity, timestamp, the original AI prediction, and the reviewer's decision. This provides a defensible record of how every classification decision was made - whether by AI with human approval, by human correction of an AI suggestion, or by purely manual coding.

Prevalence Sampling

For large document sets, Dezcry supports prevalence sampling - classifying a statistically representative subset of documents before committing to a full run. This allows teams to:

Validate that the classification instructions produce accurate results before processing the full set
Estimate the prevalence of each category in the collection (e.g. "approximately 30% of documents are responsive")
Calculate precision and recall metrics by comparing AI predictions against manual coding on the sample
Refine instructions based on sample results before running the full classification

Sampling results are stored as ClassificationSample records, preserving both the AI prediction and the human-coded ground truth for quality measurement and defensibility.

AI Redaction

Overview

AI Redaction is Dezcry's flagship feature - a five-step detection process that identifies personal data, sensitive content, and legally privileged material for redaction. The system is designed as a reviewer aid, not an autonomous tool: every AI suggestion is reviewable, editable, and logged before it is applied.

All AI runs inside the same Azure environment as your documents. No document data is sent to any third-party service. The process combines a fast quick-scan pass (email headers, spreadsheet columns and pattern matching) with two AI reads in context and a cross-document check that links the same person across every file, so the same name is treated the same way everywhere.

Redaction Types

Dezcry supports three redaction protocols, each tailored to a different use case:

Type	Purpose	Configuration
DSAR	Remove the data subject's personal information from documents being disclosed. Uses a whitelist approach - you specify the data subject's name, email addresses, and phone numbers, and the AI identifies all instances.	Data subject first/last name, known email addresses, known phone numbers
Privilege	Identify and redact legally privileged communications (attorney-client privilege, work product doctrine). Uses domain and keyword filtering to detect privileged material.	Privileged individuals, law firm domains, privilege keywords, custom instructions
Ad Hoc	Custom redaction with free-form instructions. Use for any redaction task that doesn't fit the DSAR or privilege templates.	Free-text instructions describing what to redact

Redaction Models

When creating a redaction set, you select which entity categories the AI should detect. Each category has a distinct colour for visual identification in the review interface:

Model	Description	Colour
Names	Personal names, first/last names, initials, nicknames	Red
Emails	Email addresses	Orange
Phone Numbers	Phone numbers, fax numbers, mobile numbers	Amber
Identifiers	SSN, passport numbers, driver licence numbers, national IDs	Green
Employment	Job titles, employee IDs, salary information, work history	Blue
Company IDs	Company registration numbers, tax IDs, ABN/ACN	Purple
Locations	Physical addresses, postal codes, GPS coordinates	Magenta
Political Opinions	Political affiliations, party membership, voting records	Light Purple
Health Information	Medical conditions, treatments, diagnoses, medications	Red
Sexual Orientation	Gender identity, sexual orientation details	Pink
Financial	Bank account numbers, credit card numbers, financial data	Green
Auth Credentials	Passwords, PINs, API keys, security tokens	Cyan
Family Associations	Relationships, dependents, family member details	Light Red
Device IDs	IP addresses, MAC addresses, device identifiers	Light Blue

Sensitive categories - health information, sexual orientation,political opinions, and auth credentials - use a lower default auto-apply confidence threshold (0.70) to ensure more conservative handling.

The Five Steps

Dezcry processes each document through a five-step redaction process. Each step catches a different kind of personal information, and every finding is cross-checked before it is applied:

Step	Name	What it does
Step 1	Quick scan	Reads the obvious stuff first: email From/To/Cc/Bcc headers, forwarded-email headers inside message bodies, spreadsheet columns labelled with things like "Name" or "Email", and a fast pattern scan for names, email addresses, phone numbers and common ID formats. These matches come directly from the document's own structure, so they are the most reliable signal we have.
Step 2	AI analysis	The primary AI read. A large language model reads each document in context - using the findings from Step 1 and Step 4 as hints - and identifies personal information that a pattern scan alone would miss, such as names mentioned in natural language, implied relationships, and sensitive content.
Step 3	AI double-check	A second, independent AI re-reads every finding from Step 2 and either confirms it, rejects it, or flags it for review. It acts like a senior QA reviewer reading over the first reviewer's shoulder - catching false positives and surfacing anything that was missed.
Step 4	Cross-reference	Links the same person or detail across every document in the scope. Different spellings and formats of the same person (for example "J. Smith", "John Smith" and "john.smith@acme.com") are grouped together so they are all treated the same way across the whole document set.
Step 5	Sort for review	Sorts every finding by confidence. High-confidence findings are applied automatically, low- or medium-confidence findings are sent to the review queue for a human to approve or reject, and sensitive categories like health information are held for review at a stricter threshold.

How the steps run

The cross-reference step (Step 4) runs first so that every later step benefits from a single consistent picture of who appears in the documents. After that the order is: quick scan, AI analysis, AI double-check, and finally sort for review. Progress is shown step-by-step in the UI with live counts of how many documents have been processed in each step.

Reviewing Redactions

After a redaction set completes processing, navigate to the Review page to examine and approve the AI's suggestions. The review queue presents each detected entity with:

Original text - the exact text the AI identified for redaction
Model category - the entity type (names, emails, etc.) with colour-coded badge
How we found it - a plain-English badge showing whether the match came from an email header, a forwarded-email header, a spreadsheet column, a parent email, the fast pattern scan, or an AI read. Sources that come from the document's own structure are shown in green (trusted), AI-inferred sources are shown in grey (please verify).
Confidence score - how certain the AI is that this is a genuine entity
AI double-check status - whether the second AI confirmed, rejected, upgraded, or newly added this finding
Page location - the page number and pixel coordinates within the document

Reviewers can filter the queue by how we found the item, model category, and confidence threshold. For each entry, reviewers can:

Approve - accept the redaction and apply it to the document
Reject - dismiss the suggestion as a false positive
Flag for review - escalate to a senior reviewer for a second opinion

The review queue paginates at 100 entries per page. All review decisions are logged in the audit trail with the reviewer's identity, timestamp, and action taken.

Manual Redactions

In addition to AI-assisted redaction, reviewers can manually draw redaction boxes on any document using the markup viewer. Manual redactions are applied directly to the document's markup images and are tracked alongside AI redactions in the audit trail.

For spreadsheet documents, Dezcry provides a specialised spreadsheet markup viewer that allows cell-level redaction - reviewers can select individual cells or ranges to redact.

AI Summaries & Chat

Document Summaries

Dezcry automatically generates LLM-powered summaries for every document in a matter. Summaries are 1–2 sentence overviews that give reviewers quick context to assess relevance, decide on inclusion or exclusion, and move through large review sets faster.

Summaries are generated by a dedicated language model running on GPU infrastructure within the same Azure environment. No document data is sent to third-party services. Summaries are generated in the background and are available alongside the document in the metadata panel.

Summaries are generated automatically on upload and during background backfill
The summary language is configurable per matter (English, German, French, Spanish, etc.)
Summaries are searchable and appear in the document metadata panel
Administrators can trigger summary regeneration for any document or batch

Document Chat

The Document Chat panel provides conversational AI for asking questions about documents. Available from the document viewer, chat uses Retrieval-Augmented Generation (RAG) to find relevant content and generate accurate answers with source citations.

How it works:

1
Ask a question - Type a natural-language question in the chat panel (e.g. "What are the key dates mentioned in this document?")
2
Hybrid search - Dezcry searches for relevant content using both keyword search (Elasticsearch) and semantic search (vector embeddings), combining results via Reciprocal Rank Fusion.
3
AI generates answer - The LLM reads the relevant document chunks and generates a response with inline citations referencing specific documents.
4
Source verification - Each response includes clickable source document references (e.g. [DOC-00028]) so reviewers can verify the AI's answer.

Rate Limiting

Chat is rate-limited to 20 queries per minute per user and 60 queries per minute per matter to ensure fair resource allocation across teams.

AI OCR

Overview

AI OCR (Optical Character Recognition) extracts searchable text from image-based documents - scanned PDFs, photographs, screenshots, and other image files that don't contain embedded text. Dezcry uses the Azure Computer Vision Read API for high-accuracy text extraction.

OCR can be enabled automatically during upload (as a processing option) or run manually on specific documents or batches after ingestion.

Running OCR

Navigate to the AI OCR page within a matter to manage OCR jobs:

1
Create a job - Select the scope - all documents or a saved search - and start the OCR job.
2
Processing - Dezcry sends each image document to the Azure Computer Vision API for text extraction. Progress is tracked in real-time with 4-second polling intervals.
3
Results - Extracted text is stored in the document record and immediately becomes searchable. Per-document results include pages extracted, characters extracted, confidence scores, and processing duration.

OCR job results track each document individually, reporting:

Pages and characters extracted per document
Per-document status (completed, failed, skipped)
Error messages for failed documents
Processing duration per document

Jobs can be cancelled while running or queued. The AI OCR dashboard shows aggregate metrics: total jobs, completed jobs, active jobs, and total documents processed.

Password Bank

Overview

The Password Bank stores passwords and credentials for encrypted documents within a matter. When Dezcry encounters password-protected files during ingestion (encrypted PDFs, password-protected ZIPs, protected Office documents, encrypted PST files), it attempts to decrypt them using passwords from the Password Bank.

Managing Passwords

Navigate to the Password Bank page within a matter to manage credentials:

Add passwords - enter passwords with optional labels and tags for organisation
Labels - human-readable hints to identify what the password is for (the label is visible, the password itself is hidden)
Tags - categorise passwords (e.g. "client", "custodian-smith", "batch-3")
Usage tracking - each password tracks when it was last used and how many times it has been applied
Edit and delete - update or remove passwords with confirmation dialogs

Passwords are reusable across all uploads within the matter. When new documents are uploaded, all passwords in the bank are tried against any encrypted files. The upload summary reports how many files were successfully decrypted and how many failed decryption.

Export

Overview

Dezcry's Export system produces disclosure-ready output packages with Bates numbering, metadata load files, burned redactions, and full decision history. Exports are configured through a multi-step wizard and can be re-run with updated settings.

Two export types are supported:

Production - formal disclosure packages with Bates numbering, branded headers/footers, and structured volume organisation. Used for regulatory submissions and formal DSAR responses.
Review - simpler packages for internal review or transfer to external counsel, without production-level numbering requirements.

Export Wizard

The export wizard guides you through a 6-step configuration process:

1
Scope - Select which documents to export - all documents in the matter or a saved search.
2
Name & Type - Name the export set and choose Production or Review type.
3
Output Components - Select which output types to include: metadata load file, natives, images, text files, and/or PDFs.
4
Numbering & Branding - Configure Bates numbering (prefix, suffix, start number, padding) and optional header/footer branding.
5
Load File & Volumes - Configure the metadata load file format, encoding, date formats, and volume organisation settings.
6
Review & Run - Review all settings in a summary view and launch the export.

Scope Selection

Export scope defines which documents are included in the output package. You can choose:

All documents - exports every document in the matter
Saved search - exports only documents matching a previously saved search query and filters

The wizard displays a document count for the selected scope so you can verify the volume before proceeding. The scope is frozen at run time - new documents added to the matter after the export starts will not be included.

Output Components

Select which output types to include in the export package:

Component	Description
Metadata Load File	A structured data file (DAT, CSV, or HTML) containing all document metadata, decisions, and Bates numbers. Compatible with Relativity, Concordance, and other review platforms.
Natives	Original source files in their native format (DOCX, PDF, XLSX, etc.)
Images	Rendered document images (single-page or multi-page TIFF) with optional Opticon or iProrev load files for image cross-referencing.
Text Files	Extracted plain text content for each document, useful for downstream text analytics or cross-referencing.
PDFs	Rendered PDF versions of each document, optionally with burned-in redactions and Bates number branding.

Numbering & Branding

Production exports support Bates-style document numbering:

Setting	Description	Example
Prefix	Text prepended to every Bates number	ACME-
Suffix	Text appended to every Bates number	-PROD
Start Number	The first number in the sequence	1
Digit Padding	Zero-padding width for the numeric portion	7 → 0000001
Numbering Mode	Document-level (one number per document) or page-level (one number per page)	Document-level
Page Separator	Character between document number and page number in page-level mode	_ → ACME-0000001_001
Attachment Grouping	Keep parent documents and attachments numbered sequentially	Enabled
Sort Order	How documents are ordered for numbering (sequential, family group, or by field)	doc_seq

Optional branding adds headers and footers to PDF output:

Header and footer with left, centre, and right sections
Template tokens: {BatesNumber}, {PageX}, {PageY}
Default footer: "CONFIDENTIAL"

Load Files & Volumes

Load file settings control the metadata output format:

Setting	Default	Description
Format	DAT	Load file format - DAT (Concordance), CSV, HTML, or custom TXT
Encoding	UTF-8	Character encoding for the load file
Date Format	MM/dd/yyyy	Format for date fields in the load file
Time Format	HH:mm:ss	Format for time fields

Volume settings control the physical organisation of the export package:

Setting	Default	Description
Volume Prefix	VOL	Prefix for volume folder names (VOL001, VOL002, etc.)
Start Number	1	First volume number
Digit Padding	3	Zero-padding for volume numbers
Max Volume Size	4500 MB	Maximum size per volume folder before splitting
Max Files Per Folder	5000	Maximum files in a single subfolder
File Naming	Control Number	How files are named - by Bates/control number or original filename

Downloading Exports

Once an export run completes, the output package is available for download. The export page shows:

Run status - running, completed, failed, or cancelled
Progress - documents processed vs. total
Output size - total size of the generated package
Duration - time taken to generate the export
Error and warning counts - per-document issues encountered
Settings snapshot - the exact configuration used for this run

Redaction integration allows you to burn redactions into the export output. Select a completed redaction set and choose the placeholder mode:

None - no redaction placeholders (redacted areas are simply blacked out)
Brackets - redacted text replaced with category labels in brackets
Redaction block - solid black boxes over redacted content

All export actions - creation, run start, download - are logged in the audit trail.

Audit & Reporting

Audit Log

Every significant action in Dezcry is recorded in an immutable audit log, providing a defensible trail for regulators, legal review, and internal governance. The audit log captures:

Category	Actions Tracked
Documents	Viewed, uploaded, downloaded, deleted, summaries regenerated
Decisions	Relevance coding updates, bulk decision changes, tag modifications
Redactions (Manual)	Redaction boxes drawn, updated, or deleted on documents
Redaction Review	AI redaction entries approved, rejected, or escalated
Redaction Jobs	Sets created/deleted, runs started/completed/cancelled/failed
Classification	Sets created/deleted, runs started/completed/cancelled/failed
Export	Sets created/updated/deleted/cloned, runs started/cancelled, downloads
Markup	Preview and markup images generated or failed
Downloads	PDF downloads, batch PDF downloads, redacted spreadsheet downloads
Search	Saved searches created, updated, or deleted
Chat	Messages sent, conversations created/updated/deleted
Indexing	Documents indexed, matter re-indexed, index cleared
Auth	Login success/failure, password changes, account locks
Admin	Users created/updated, roles changed, matter access granted/revoked
Billing	Usage recalculated, invoices generated

Each audit entry includes: the action type, target (which document, set, or resource was affected),user identity (who performed it), timestamp, and details (rich context including file names, counts, old/new values). The audit log is filterable by action type, target type, user, and date range, with pagination at 50 entries per page.

Matter-level audit is accessible from the Audit page within each matter. System-wide audit is available to administrators from the Admin section.

Reporting Dashboard

The Reporting page provides analytics dashboards with visualisations across eight tabs:

Tab	Metrics
Overview	Executive summary KPIs - document counts, completion rates, activity summary
Processing	Ingestion batch history, volume growth over time, processing throughput
Redaction	Redaction runs, entities detected by model, step statistics, coverage rates
Classification	Classification runs, field outcomes, confidence score distributions
AI Performance	Token usage, cost attribution, model accuracy and quality metrics
Review	Review queue depth, items pending review, reviewer turnaround times
Activity	User action trends, audit log summaries, active reviewer counts
Exports	Export history, production statistics, deliverable sizes

Dashboards include KPI cards, bar charts, line charts, pie charts, and area charts. Reports can be exported as PDF with embedded charts, matter information headers, and generation timestamps.

Billing & Usage

The Billing page shows storage usage and costs for each matter. Storage is broken down into seven categories:

Category	Description
Documents	Original uploaded files in their native format
Extracted Text	Plain text extracted during processing and OCR
Markup Images	Rendered page images for the redaction workflow
Redacted PDFs	PDF versions with burned-in redactions and branding
Indices	Elasticsearch search indices for the matter
Embeddings	Vector embeddings used for AI chat and semantic search
Other	Miscellaneous processing artifacts

The billing dashboard shows current usage (total GB and projected monthly cost), storage breakdown by category, usage history over time, and invoice details. Pricing is per-GB with regional variations and volume tier discounts.

Administration

User Management

The Admin page (accessible to admin and super_admin roles) provides a central interface for managing all users in the organisation. The user list shows:

Email address and full name
Assigned role
Account status (active, inactive, pending, invited, locked, deactivated)
2FA/MFA enablement status
Last login date
Number of matter assignments

Administrators can search by email or name, and filter by status or role. Available actions include creating users, editing details, changing roles, sending invitations, resetting passwords, and activating or deactivating accounts.

Roles & Permissions

Dezcry uses a hierarchical role-based access control (RBAC) system with four roles. Roles are hierarchical - each role inherits all permissions from the roles below it. Access is enforced at two levels: role-level (what actions a user can perform across the platform) and matter-level (which specific matters a user can access).

Role Hierarchy

Role	Description	Matter Access
Super Admin	Full platform control. Can manage all users (including other admins), delete matters, configure system-wide settings, and access every feature. Intended for platform owners and IT administrators.	Implicit access to all matters across the tenant - no explicit assignment required.
Admin	Organisation-level management. Can create matters, invite and manage users, assign users to matters, view audit logs, manage the password bank, and configure billing. Cannot delete matters or manage other admins.	Implicit access to all matters across the tenant - no explicit assignment required.
Reviewer	The primary working role for legal, privacy, and compliance team members. Can upload documents, review and code documents, run AI classification and redaction jobs, create and manage exports, manage saved searches, and run search term reports.	Must be explicitly assigned to each matter. Can only see and work within matters they have been granted access to.
Read Only	View-only access for stakeholders, external counsel, or auditors who need visibility but should not make changes. Can browse documents, view metadata, read reports, use chat, and download exports - but cannot upload, modify, or run any jobs.	Must be explicitly assigned to each matter. Can only see matters they have been granted access to.

Detailed Permission Matrix

The following table shows the minimum role required for each action in the platform. Higher roles automatically inherit all permissions from lower roles.

Feature Area	Action	Minimum Role
Matters	View matters	Read Only
Matters	Create new matters	Admin
Matters	Update matter settings	Admin
Matters	Delete matters	Super Admin
Documents	View and search documents	Read Only
Documents	Upload documents	Reviewer
Documents	Update decisions, tags, and coding	Reviewer
Documents	Delete documents	Admin
AI Classification	View classification results	Read Only
AI Classification	Create sets and run classification jobs	Reviewer
AI Redaction	View redaction results	Read Only
AI Redaction	Create sets, run jobs, and review entries	Reviewer
Export	View export sets and download packages	Read Only
Export	Create export sets and run exports	Reviewer
Search	View saved searches	Read Only
Search	Create and manage saved searches	Reviewer
Search Term Reports	View search term reports	Read Only
Search Term Reports	Create and run reports	Reviewer
Chat / AI Q&A	Ask questions and view chat history	Read Only
Reporting	View analytics dashboards	Read Only
Billing	View billing and usage	Read Only
Billing	Manage billing settings	Admin
Password Bank	View stored passwords	Admin
Password Bank	Add, edit, and delete passwords	Admin
Audit Log	View matter and system audit logs	Admin
User Management	View and manage users	Admin
User Management	Invite users and assign roles	Admin
System Admin	Manage other admins, delete matters, system config	Super Admin

Matter-Level Access Control

Access to individual matters is controlled separately from role permissions:

Super Admin and Admin roles have implicit access to every matter in the tenant. They do not need to be explicitly assigned - they can see and manage all matters automatically.
Reviewer and Read Only roles require explicit assignment to each matter. An administrator must grant access by assigning the user to the matter. Until assigned, the matter is completely invisible to the user - it does not appear in their matter list and cannot be accessed via direct URL.

This two-level model enables organisations to enforce segregation of duties andneed-to-know access. For example, a reviewer handling HR DSARs can be restricted to only HR-related matters, while a different reviewer handles customer DSARs - even though both have the same role, they see entirely different matter sets.

Tenant Isolation

All access controls operate within a tenant boundary. Every database query is scoped to the authenticated user's tenant, and every matter-level operation verifies that the matter belongs to the same tenant. Cross-tenant access is architecturally impossible - there is no mechanism in the application layer to access another organisation's data, even with a Super Admin role.

Document-Level Access

Access to individual documents follows the matter access model. If a user has access to a matter, they can see all documents within that matter (subject to their role permissions for viewing vs. editing). There is no per-document access restriction - access is controlled at the matter level, which is the standard approach in eDiscovery and DSAR review workflows where reviewers need to see the full context of a matter to make defensible decisions.

Security Enforcement

Permissions are enforced server-side on every API request, not just in the UI. Even if a user manipulates the frontend or constructs API requests directly, the backend validates their role and matter access before processing any operation. Denied requests receive a structured 403 Forbidden response with a clear explanation of why access was refused.

Inviting Users

Administrators invite new users by providing their email address, name, and assigned role. The invitee receives an email with a single-use invitation link that guides them through:

1
Set password - Create a strong password (minimum 12 characters, must include uppercase, lowercase, and a number).
2
Configure 2FA - Scan a QR code with an authenticator app (Google Authenticator, Authy, etc.) and enter the verification code.
3
Complete setup - Account is activated and the user can sign in.

Invitation links are single-use and have an expiration date. The invitation tracks who created it, when it was used, and the IP address of the accepting user.

Admin Dashboard

The Admin Dashboard provides tenant-wide analytics and operational oversight:

Users overview - total, active, locked, invited users; 2FA adoption rate; role distribution; currently online users
Matters overview - total matters; status distribution (open/closed/archived); type distribution; document count and storage per matter
Documents overview - total document count; total storage; status distribution; encrypted, corrupt, and duplicate counts
Processing status - recent upload batches; active classification, redaction, and export runs
Storage breakdown - detailed storage usage by category across all matters
Recent audit activity - latest system-wide audit entries

System Audit

The System Audit page in the Admin section provides a tenant-wide view of all audit log entries across all matters. This allows administrators to monitor platform-wide activity, investigate security events, and produce compliance reports. The same filtering and search capabilities from the matter-level audit are available at the system level.

Security & Compliance

Data Security

Dezcry is hosted entirely on Microsoft Azure, using Azure Container Apps, Azure PostgreSQL, and Azure Storage. All infrastructure runs within a single resource group with network-level isolation. The GPU worker service that handles AI inference runs on internal-only ingress and is not accessible from the public internet.

The platform operates a logically isolated multi-tenant architecture. Each organisation's data - documents, metadata, reviewer decisions, and audit logs - is segregated at the application and database level. Uploaded files are stored in organisation-scoped storage paths. Cross-tenant data access is not possible through the application layer.

Encryption

All data is encrypted in transit using TLS 1.2+ for all connections between services, storage, and the database. Data is encrypted at rest using Azure-managed encryption keys via Azure Storage Service Encryption and Azure Database encryption. Uploaded files, processed outputs, and database records are all covered.

Data Residency

Dezcry supports regional data residency - each matter can be hosted in a specific Azure region to meet local data protection requirements:

Australia East - default region
Switzerland North - for Swiss data protection requirements
Germany - for German/EU data residency
United Kingdom - for UK data protection requirements

AI models are deployed regionally - Australian data uses Australian AI endpoints, Swiss data uses Swiss endpoints, and so on. Enterprise customers can discuss deployment in additional regions or dedicated/on-premises environments.

AI Data Handling

Dezcry runs its own AI models for redaction, classification, and summarisation. No document data is sent to third-party AI services. All AI inference happens within the same Azure environment as the rest of the platform:

Classification and redaction use large language models deployed within the Azure environment
Chat and summaries use a dedicated language model running on GPU infrastructure
Embeddings are generated on CPU within the same container environment

AI-assisted redaction is designed as a reviewer aid, not an autonomous system. The AI surfaces likely sensitive content for human review. Reviewers approve, reject, or edit every suggestion before it is applied. All AI-generated suggestions and reviewer decisions are logged in the audit trail.

Customer data is never used to train or fine-tune models shared across tenants.

Dezcry PlatformDocumentation

Getting Started

Platform Overview

Key Concepts

Signing In

Matters

Creating a Matter

Matter Dashboard

Matter Settings

Document Ingestion

Uploading Documents

Supported File Types

Deduplication

Email Threading

NIST Filtering

Processing Exceptions

Upload Batches

Document Review

Document List

Document Viewer

Native File Viewers

Metadata Panel

Decisions Panel

Family Documents

Tagging

Metadata

Overview

Core Document Fields

Family & Hierarchy Fields

Email Fields

Email Threading Fields

OCR Fields

Deduplication Fields

NIST Filtering Fields

Encryption & Integrity Fields

Processing Exception Fields

AI & Processing Fields

Reviewer Decision Fields

Extended Metadata (metadata_json)

Search

Keyword Search

Search Syntax

Search Term Sets

Search Term Reports

Saved Searches

AI Classification

Overview

Custom Fields

Classification Sets

Confidence Thresholds and Routing

Confidence Calibration (Debiasing)

Verification Pass

Document Chunking for Long Documents

Reviewing Predictions

Prevalence Sampling

AI Redaction

Overview

Redaction Types

Redaction Models

The Five Steps

Reviewing Redactions

Manual Redactions

AI Summaries & Chat

Document Summaries

Document Chat

AI OCR

Overview

Running OCR

Password Bank

Overview

Managing Passwords

Export

Overview

Export Wizard

Scope Selection

Output Components

Numbering & Branding

Load Files & Volumes

Downloading Exports

Audit & Reporting

Dezcry Platform
Documentation