Documentation

Dezcry Platform
Documentation

Everything you need to know about using Dezcry — from ingesting documents through to disclosure-ready export.

Getting Started

Platform Overview

Dezcry is a self-service, AI-powered eDiscovery platform for privacy, legal, and compliance teams. It provides a complete workflow to ingest documents, review responsive material, apply AI-powered redactions, classify documents, search, and export disclosure-ready sets — all with a full audit trail and role-based access controls.

Unlike heavyweight eDiscovery suites, Dezcry is designed for internal teams who need a streamlined, defensible process without specialist eDiscovery admins or outsourced review support. All AI models run on internal infrastructure within the same Azure environment — no document data is sent to third-party AI services.

Key Capabilities
  • Ingest 100+ file types including PST, EML, ZIP, Office, PDF, images, audio, and video
  • Automatic deduplication, email threading, and NIST filtering
  • AI-assisted redaction with a 5-layer detection pipeline
  • AI-assisted classification with custom fields and confidence scoring
  • eDiscovery-grade keyword search (Elasticsearch-powered, dtSearch equivalent)
  • LLM-powered document summaries and conversational document Q&A
  • AI OCR for image-heavy documents
  • Production-ready export with Bates numbering, load files, and burned redactions
  • Complete audit trail logging every action for regulatory defensibility
  • Role-based access control with matter-level permissions

Key Concepts

ConceptDescription
MatterA container for a single DSAR or investigation. All documents, redactions, classifications, exports, and audit logs are scoped to a matter. Matters have a unique code, client name, type, and status.
DocumentA single file within a matter — an email, attachment, PDF, spreadsheet, image, audio, or video file. Each document has extracted text, metadata, a preview, and can carry reviewer decisions.
FamilyA group of related documents — typically an email and its attachments. The parent email and child attachments share a family ID for grouped review.
CustodianThe person or data source from which documents were collected. Tracked per upload batch for chain-of-custody purposes.
Saved SearchA reusable query with filters that can be used as the scope for redaction, classification, export, or search term reports.
Redaction SetA batch AI redaction job that processes a scope of documents through the 5-layer pipeline, producing redaction entries for review.
Classification SetA batch AI classification job that applies custom decision fields to documents with confidence scoring.
Export SetA configured export template with numbering, branding, and output settings that produces disclosure-ready packages.
Audit LogAn immutable record of every significant action taken in the platform, providing a defensible trail for regulators.

Signing In

Navigate to your Dezcry instance's login page and enter your email address and password. If your organisation has enabled two-factor authentication (2FA), you will be prompted to enter a time-based one-time password (TOTP) from your authenticator app after entering your credentials.

If you have been invited to join Dezcry, you will receive an email with a unique invitation link. Click the link to set up your password and configure 2FA. Invitation links are single-use and expire after a set period.

Session Management

Sessions automatically expire after 30 minutes of inactivity. Your session token is refreshed automatically every 20 minutes while you are active. If your session expires, a full-screen overlay will prompt you to sign in again — any unsaved work in progress is preserved in your browser.

Matters

Creating a Matter

A matter is the top-level container in Dezcry. Each DSAR, investigation, or review project is organised as a separate matter with its own documents, workflows, users, and audit trail.

To create a matter, navigate to the Matters page and click Create Matter (admin role required). You will be asked to provide:

FieldDescription
NameA descriptive name for the matter (e.g. "Smith DSAR - Q1 2025").
Matter CodeA unique 6-character alphanumeric code, auto-generated but editable.
Client NameThe organisation or client the matter relates to.
Matter TypeOne of: DSAR, Investigation, Litigation, Cyber, or Other.
DescriptionOptional long-form description of the matter scope and objectives.
Summary LanguageThe language for AI-generated summaries (e.g. English, German, French).
Hosting LocationThe Azure region for data residency (e.g. Australia, Switzerland, Germany, UK).

Matter Dashboard

Clicking into a matter takes you to the matter dashboard — the central workspace for that matter. The dashboard shows a searchable, filterable table of all documents in the matter, along with access to all matter-scoped features via the sidebar navigation:

  • Documents — browse, search, filter, and review all documents
  • Upload — ingest new documents into the matter
  • Redaction — create and manage AI redaction sets
  • Classification — configure and run AI classification jobs
  • Export — build and run disclosure-ready export packages
  • Search Terms — create keyword search term sets and reports
  • AI OCR — run optical character recognition on image documents
  • Password Bank — manage passwords for encrypted files
  • Audit — view the complete audit trail for this matter
  • Reporting — view analytics dashboards and metrics
  • Billing — view storage usage and costs for this matter

The document table supports bulk actions — select multiple documents to apply batch operations such as tagging, classification, or status changes. A background task tray shows the status of any running jobs (redaction, classification, export) in the matter.

Matter Settings

Matter settings control the behaviour of AI features and reviewer workflows within the matter. Administrators can configure:

  • Decision fields — custom fields that reviewers can set on each document (e.g. "Relevance", "Privilege Status", "Data Category"). Fields can be single-select, multi-select, or free text.
  • Summary language — the language used for AI-generated document summaries.
  • Matter status — open, closed, or archived. Closed matters are read-only; archived matters are hidden from the default view.

Document Ingestion

Uploading Documents

Navigate to the Upload page within a matter to ingest documents. Dezcry supports drag-and-drop file upload or traditional file selection. You can upload individual files or container files (PST, ZIP, 7Z, RAR, TAR, GZ) which will be automatically extracted.

Before processing begins, configure the following options:

OptionDescription
Deduplication ModeChoose "Global" to automatically identify and flag duplicate files across the entire matter using SHA-256 hashing. Duplicates are preserved but marked, saving reviewer time.
NIST FilteringEnable to automatically filter out known system and runtime files (from the NIST National Software Reference Library) that are never relevant to review.
OCREnable to run Optical Character Recognition on image-based documents, extracting searchable text from scanned PDFs, photographs, and image files.
Email ThreadingEnable to group related emails into conversation threads, identifying which messages are "inclusive" (contain unique content) versus non-inclusive duplicates.
Inclusive OnlyWhen email threading is enabled, optionally exclude non-inclusive emails from the review workspace to reduce volume.

You may also specify custodian information and data source metadata for chain-of-custody tracking. Available data sources include: Laptop, Desktop, Server, O365 Email, O365 OneDrive, SharePoint, Google Workspace, Mobile Device, External Hard Drive, USB Drive, Network Share, Cloud Storage, Backup Tape, Database, and Other.

Supported File Types

Dezcry supports over 100 file types out of the box. During ingestion, all files are extracted, their text content is parsed, metadata is captured, and they are indexed for search.

CategoryFormats
EmailPST, OST, EML, MSG, MBOX
DocumentsDOCX, DOC, PDF, RTF, TXT, ODT
SpreadsheetsXLSX, XLS, CSV, ODS
PresentationsPPTX, PPT, ODP
ArchivesZIP, RAR, 7Z, TAR, GZ
ImagesPNG, JPG, JPEG, TIFF, BMP, GIF (with OCR)
AudioMP3, WAV, M4A, OGG, FLAC
VideoMP4, AVI, MOV, MKV, WEBM
Web / DataHTML, XML, JSON, CSV

Deduplication

When global deduplication is enabled, Dezcry performs top-level exact deduplication — the standard approach used in eDiscovery. This is an important distinction: Dezcry identifies and removes files that are byte-for-byte identical based on their MD5 hash, but it does so at the top level of the document hierarchy.

What "Top-Level" Deduplication Means

In eDiscovery, "top-level" deduplication means dedup is applied to standalone documents and parent containers (emails, archives) rather than to individual attachments or child items in isolation. When a top-level file is identified as a duplicate, the entire document and its family (including all attachments) are removed together — preserving the integrity of document families.

This differs from "attachment-level" deduplication, which would independently remove individual attachments that appear across multiple emails. Top-level dedup preserves the complete context of each email and its attachments as a unit, which is critical for defensible review — a reviewer always sees the full email with all of its attachments intact, never a partial family.

It also differs from near-deduplication, which identifies files that are similar but not identical (e.g. different versions of the same document). Dezcry's deduplication is strictly exact-match — only byte-for-byte identical files are flagged.

Deduplication is scoped globally across the entire matter, meaning a file uploaded by one custodian will be deduplicated against files from all other custodians in the same matter. The first instance ingested is kept as the master documentand all subsequent identical copies are removed. Deduplication results include:

  • Master document — the first instance of each unique file, retained in the review set with full metadata and family relationships
  • Duplicate group — all copies of the same file, linked back to the master for audit purposes
  • Bytes saved — total storage savings from removing duplicate copies
  • Custodian tracking — the system records which custodians held copies of each deduplicated file, preserving chain-of-custody information even though the duplicate copies are removed from the active review set

The upload summary report details every duplicate group with file names, sizes, and the master document reference. This provides a defensible record of exactly what was deduplicated and why.

Email Threading

Email threading groups related emails into conversation threads, identifying the hierarchical reply chain. Threading is applied at the point of ingestion, which means non-inclusive emails are identified and can be excluded from the review workspace before any downstream processing occurs. This is a deliberate design choice — by filtering out redundant emails upfront, organisations save significantly on hosting costs (less storage, smaller search indices) and AI processing costs (fewer documents to classify, redact, and summarise).

Each email in a thread is classified as:

  • Inclusive — contains unique content or attachments not present in later messages in the thread. These are the messages reviewers should focus on, as they represent the most complete version of each point in the conversation.
  • Non-inclusive — the full content of this email is already contained in a later, more complete message in the thread. Reviewing these would be redundant, as the inclusive message already captures everything.

When the Inclusive Only option is enabled during upload, non-inclusive emails are excluded from the active review workspace entirely. They are still retained in the system for audit purposes, but they do not count toward hosting storage, are not indexed for search, and are not processed by AI classification, redaction, or summarisation — directly reducing costs.

Threading uses email headers (Message-ID, In-Reply-To, References) and the Microsoft Exchange Conversation Index to build accurate thread trees. The threading summary reports:

  • Total emails processed and how many were threadable
  • Number of inclusive vs. non-inclusive messages
  • Non-inclusive emails excluded from the review workspace
  • Thread groups identified
  • Any threading errors encountered

NIST Filtering

NIST filtering removes known system files, operating system components, and software runtime files from the review set. These files are identified by matching their hash values against the NIST National Software Reference Library (NSRL) — a comprehensive database of known, non-relevant system files.

NIST-filtered files are flagged and excluded from the active review workspace but are retained in the system for audit purposes. The upload summary reports the count and details of filtered files.

Processing Exceptions

During ingestion, some documents may encounter processing exceptions. Dezcry tracks and reports these in the upload summary:

Exception TypeDescription
EncryptedPassword-protected files that could not be decrypted. Add passwords to the Password Bank and re-process.
CorruptFiles that are malformed, truncated, or otherwise unreadable.
Unsupported FormatFile types that Dezcry does not currently support for text extraction.
Text Extraction FailedFiles where the content could not be extracted despite being a supported format.

Each exception includes the document ID, filename, exception type, and a descriptive message to help diagnose and resolve the issue.

Upload Batches

Every upload creates a processing batch with a unique display ID (e.g. UPL-001). Navigate to the Uploads page to view all batches for the matter, including:

  • Batch status (processing, completed, failed)
  • Total files submitted and processed
  • Counts by outcome (processed OK, encrypted, corrupt, duplicates, NIST-filtered)
  • Decryption results (successful, failed)
  • Children extracted (attachments from container files)
  • File type distribution
  • Processing duration
  • Upload set MD5 hash for chain-of-custody verification

Click into any batch to see the detailed processing report, including per-document exception details, deduplication groups, and threading statistics.

Document Review

Document List

The main matter workspace displays all documents in a searchable, sortable table. Each row shows the document's filename, type, status, size, custodian, and any applied tags or decisions. Key features include:

  • Full-text search — keyword search across document content, filenames, and email metadata using eDiscovery-grade Elasticsearch
  • Column filters — filter by status, file type, custodian, date ranges, tags, relevance coding, and custom decision fields
  • Bulk selection — select multiple documents for batch operations like tagging, decision coding, or export
  • Sort — sort by any column including filename, date, size, relevance, or type
  • Saved searches — save any combination of search query and filters for reuse

Document Viewer

Click any document to open the full document viewer. The viewer provides a rich, multi-panel interface for reviewing individual documents:

  • Document display — native rendering of the document with zoom controls (0.25x to 3x)
  • Three viewing tabs: Original (native format), Markup (with redaction overlays), and Text (extracted plain text with search highlighting)
  • Metadata panel — document properties, email headers, file hashes, and processing info
  • Decisions panel — set relevance, hot-document flag, comments, and custom decision fields
  • Family panel — view parent/child relationships (e.g. email and attachments)
  • Chat panel — ask questions about the document using AI
  • Navigation — previous/next buttons with keyboard shortcuts for rapid sequential review
Performance

The document viewer uses a prefetch cache that pre-loads adjacent documents (previous and next) in the background. This provides near-instant navigation when reviewing documents sequentially. The cache holds up to 50 documents with a 2-minute TTL.

Native File Viewers

Dezcry includes purpose-built viewers for every supported file type, rendering documents directly in the browser without requiring any plugins or downloads:

ViewerFile TypesFeatures
PDF ViewerPDF filesPage-by-page rendering, zoom, scroll, text selection, search highlighting
Image ViewerPNG, JPG, TIFF, BMP, GIFPan and zoom, fit-to-width/height, full-resolution display
DOCX ViewerWord documents (DOCX)Formatted text rendering with styles, headers, lists, and tables
PPTX ViewerPowerPoint (PPTX)Slide-by-slide rendering with layouts and formatting
Spreadsheet ViewerXLSX, XLS, CSVMulti-sheet tabs, column/row headers, cell formatting, frozen panes
Text ViewerTXT, LOG, HTML, XML, JSONSyntax-highlighted text with line numbers and search
Audio ViewerMP3, WAV, M4AAudio player with waveform, playback controls, and timestamp display
Video ViewerMP4, AVI, MOVVideo player with playback controls, full-screen mode
Markup ViewerAny document with redactionsRedaction overlay rendering with colour-coded entity categories

Metadata Panel

The metadata panel displays all extracted properties for the current document. For email files, this includes:

  • From, To, CC, BCC addresses
  • Subject line
  • Date sent and date received
  • Message-ID and conversation threading references
  • Attachment count and list

For all documents, the metadata panel shows:

  • File size, MIME type, and document type
  • MD5 and SHA-256 hashes (for integrity verification)
  • Created and modified dates
  • Author (when available from document properties)
  • Source folder path from the original container
  • OCR status and AI summary (when available)
  • Processing status and any error messages

Decisions Panel

The decisions panel is where reviewers record their assessments. Every decision is timestamped and logged in the audit trail. Available fields:

  • Relevance — mark the document as Responsive, Non-Responsive, or other custom values
  • Hot Document — flag important or significant documents for attention
  • Decision Comment — free-text annotation explaining the reviewer's reasoning
  • Custom Decision Fields — any additional fields configured at the matter level (single-select, multi-select, or free text)
Optimistic Locking

Dezcry uses optimistic locking on document decisions to prevent overwrite conflicts when multiple reviewers work on the same matter. Each document tracks a version number that is incremented on every update. If two reviewers attempt to save changes to the same document simultaneously, the second save will receive a conflict error and be asked to refresh before re-applying their changes.

Family Documents

Documents extracted from container files (emails with attachments, ZIP archives) are automatically grouped into families. A family consists of a parent document (e.g. an email) and its child documents (e.g. attachments).

The family panel in the document viewer shows all related documents, allowing reviewers to quickly navigate between a parent email and its attachments. Family relationships are preserved throughout all workflows — search results can include family expansion, and exports can group family members together.

Tagging

Documents can be tagged with relevance codes and custom decision field values. Tags are set through the decisions panel in the document viewer or via bulk actions on the document list. All tagging actions are logged in the audit trail with the reviewer's identity and timestamp.

Metadata

Overview

Every document ingested into a matter has a rich set of metadata fields automatically extracted during processing. Dezcry captures over 60 metadata fields per document — covering everything from basic file properties and email headers to AI-generated summaries and reviewer decisions. These fields are available for filtering, sorting, column display, search, and export throughout the platform.

Metadata is extracted at the point of ingestion with no manual effort required. For email files, Dezcry parses all standard headers including threading references. For Office documents and PDFs, embedded properties such as author, title, and creation date are captured. For images, EXIF data including camera make, GPS coordinates, and timestamps is preserved. All dates are normalised to UTC for consistent cross-timezone analysis.

Why Metadata Matters in eDiscovery

Metadata is critical for defensible review workflows. Fields like hash values (MD5, SHA-256) provide chain-of-custody integrity. Date fields enable precise date-range filtering to narrow review sets. Email threading metadata allows reviewers to focus only on inclusive messages. And custodian tracking across duplicates ensures nothing is lost even when redundant copies are removed. All metadata fields listed below are available in load file exports (DAT, CSV, XLSX) for downstream use in Relativity, Nuix, or other review platforms.

Core Document Fields

These fields are present on every document regardless of file type. They provide the fundamental identifiers, file properties, and processing information needed for document management and chain-of-custody tracking.

FieldTypeDescription
doc_idStringUnique document identifier within the matter (e.g. DOC-000001). This is the primary reference used across the platform — in search results, exports, audit logs, and cross-references.
doc_seqIntegerSequential number assigned during ingestion, used for sorting and Bates-style numbering in exports. Sequences are unique within each matter and assigned in upload order.
filenameStringOriginal filename of the document as it existed in the source data. Preserved exactly as found for defensibility — no renaming or sanitisation is applied.
mimeStringMIME type of the file (e.g. application/pdf, message/rfc822). Determined by both file extension and magic-byte analysis for accurate identification.
document_typeStringEnriched document category — Email, PDF, Word, Excel, PowerPoint, Image, Text, Archive, Audio, Video, or Other. Useful for filtering the document list by file type.
size_bytesIntegerFile size in bytes. Displayed in human-readable format (KB, MB) in the UI. Useful for identifying unusually large or suspiciously small files.
source_folderStringOriginal folder path within the source container — e.g. the PST folder hierarchy (Inbox/Projects/2024), ZIP directory path, or nested archive structure. Preserves the organisational context of the original data.
date_created_utcDateTimeFile creation date in UTC. For office documents, extracted from embedded document properties. For other files, derived from filesystem timestamps or container metadata.
date_modified_utcDateTimeFile last-modified date in UTC. Critical for date-range filtering in review workflows and for establishing document timelines.
md5StringMD5 hash of the file content (32 hex characters). Used for deduplication across the matter and for chain-of-custody integrity verification in exports.
sha256StringSHA-256 hash of the file content (64 hex characters). Provides a cryptographically strong integrity fingerprint for defensible production.
statusStringProcessing status — queued (awaiting processing), processing (currently being ingested), ready (successfully processed and available for review), or failed (encountered an error).
processing_errorStringDetailed error message if processing failed. Helps diagnose issues such as password-protected files, corrupted archives, or unsupported formats.
processing_datasetStringUpload batch identifier (e.g. UPL-001) linking the document to its ingestion batch. Useful for tracking which upload set a document belongs to and viewing batch-level statistics.

Family & Hierarchy Fields

Documents extracted from container files — such as emails with attachments, ZIP archives, or nested PST folders — are automatically grouped into families. Family relationships are critical for defensible review: reviewers see each email alongside its attachments, and exports can group family members into the same volume for production.

FieldTypeDescription
family_idStringFamily group identifier. For parent documents (e.g. an email), this equals the document's own doc_id. For child documents (e.g. attachments), this inherits the parent's family_id — linking the entire family together for grouping, export, and review.
parent_idUUIDID of the parent document (e.g. the email that contained this attachment). Null for top-level standalone documents. Enables the family tree view in the document viewer, where reviewers can navigate between a parent and all of its children.
Family Integrity in Exports

When exporting documents, Dezcry preserves family relationships in the load file. Parent documents and their children are linked via the family_id and parent_id fields, allowing downstream review platforms (Relativity, Nuix, etc.) to reconstruct the family hierarchy. The export wizard also supports family-based volume grouping to keep related documents together.

Email Fields

Email is often the most important data type in eDiscovery. Dezcry extracts a comprehensive set of email metadata from both EML and MSG formats, including messages extracted from PST, OST, and MBOX containers. These fields are stored as first-class database columns for efficient filtering, sorting, and field-specific search (e.g. from:john@acme.com).

FieldTypeDescription
email_fromStringSender email address and display name (e.g. "John Smith <john@acme.com>"). Searchable via the from: field prefix in keyword search.
email_toStringRecipient email addresses (semicolon-separated). Supports multiple recipients. Searchable via the to: field prefix.
email_ccStringCC (carbon copy) recipient email addresses (semicolon-separated). Searchable via the cc: field prefix.
email_bccStringBCC (blind carbon copy) recipient email addresses (semicolon-separated). Searchable via the bcc: field prefix. Only available when the source data includes BCC headers (typically only in the sender's mailbox).
email_subjectStringEmail subject line. Searchable via the subject: field prefix. Commonly used for keyword search and classification workflows.
email_message_idStringRFC 2822 Message-ID header — a globally unique identifier assigned by the sending mail server. Used internally for email threading and deduplication.
email_date_sent_utcDateTimeDate and time the email was sent, normalised to UTC. This is the primary date field used for email date-range filtering and timeline analysis.
email_date_received_utcDateTimeDate and time the email was received, normalised to UTC. May differ from date_sent due to delivery delays or timezone differences between sender and recipient servers.
email_attachments_jsonJSONStructured attachment summary containing the count and list of filenames (e.g. {count: 3, names: ["report.pdf", "data.xlsx", "photo.jpg"]}). Useful for quickly identifying emails with specific attachments without opening them.
email_in_reply_toStringMessage-ID of the email this is a direct reply to. Used by the threading engine to build the conversation tree.
email_referencesStringOrdered chain of Message-IDs representing the full conversation history. Each reply appends its parent's Message-ID, creating a breadcrumb trail through the thread.
email_conversation_indexStringMicrosoft Exchange PR_CONVERSATION_INDEX — a hex-encoded binary value present in Outlook/Exchange-originated messages. Provides precise thread positioning even when standard headers are missing or unreliable.
email_thread_indexStringHierarchical thread position path computed by Dezcry (e.g. "a1b2c3d4+0001+0002"). Encodes the exact tree position for correct chronological sort order and branch identification within conversation views.
Email Search Capabilities

All email metadata fields are indexed in the search engine. You can use field-specific search prefixes to target individual fields — for example, from:john@acme.com AND subject:"quarterly report" or to:legal@company.com AND date >= 2024-01-01. See the Search Syntax section for the full list of supported field prefixes and operators.

Email Threading Fields

These fields are computed by Dezcry's email threading engine during ingestion. Threading groups related messages into conversation trees and identifies which messages areinclusive (containing unique content a reviewer must see) versus non-inclusive (redundant messages whose content is fully captured by a later reply). This can reduce the review set by 40–60% in email-heavy matters, directly lowering review time and AI processing costs.

FieldTypeDescription
email_thread_group_idUUIDIdentifier of the conversation thread group this email belongs to. All emails in the same conversation share this ID, enabling thread-level grouping and navigation in the document viewer.
email_thread_indentationIntegerDepth within the thread tree (0 = the root/original message, 1 = a direct reply, 2 = a reply to a reply, etc.). Used for visual indentation in conversation views.
is_inclusive_emailBooleanWhether this email is inclusive — meaning it contains unique message content or attachments not present in any later message in the thread. Null if threading was not enabled for this document. Inclusive emails are the minimum set a reviewer needs to see.
inclusive_reasonStringExplains why the email is inclusive: unique_message_content (body text not found in later replies), unique_attachment (has an attachment not in later messages), unanalyzed_attachment (attachment could not be compared), root_message (first message in thread), or threading_error (could not determine inclusiveness).
Inclusive-Only Review Mode

When "Inclusive Only" is enabled during upload, non-inclusive emails are excluded from the active review workspace entirely. They are still retained in the system and can be accessed via the thread view for context, but they do not appear in the main document list, are not processed by AI classification or redaction, and do not count toward storage. This is the recommended approach for matters with large email volumes where cost efficiency is a priority.

OCR Fields

Dezcry automatically detects documents that contain no extractable text — such as scanned PDFs, photographs of documents, and image files — and flags them for OCR (Optical Character Recognition). Once OCR is run, the extracted text becomes fully searchable and available for AI processing.

FieldTypeDescription
ocr_requiredBooleanWhether the document requires OCR to extract searchable text. Automatically set to true during ingestion for scanned PDFs, image-only PDFs, and image files (JPEG, PNG, TIFF, BMP). Documents with existing embedded text are set to false.
ocr_statusStringCurrent OCR processing status: not_applicable (document has embedded text, OCR not needed), completed (OCR finished successfully, text extracted), failed (OCR attempted but encountered an error), partial (some pages processed successfully), or skipped (OCR not run yet despite being required).

Deduplication Fields

When global deduplication is enabled during upload, Dezcry identifies byte-for-byte identical files across the entire matter using hash matching. The first instance is retained as the master document and subsequent copies are flagged as duplicates. Deduplication is applied at the top level — meaning entire families (email + attachments) are deduplicated as a unit, preserving family integrity. See the Deduplication section for full details.

FieldTypeDescription
is_duplicateBooleanWhether this document is a duplicate of another document in the matter. Duplicate documents are excluded from the active review set but retained for audit and export purposes.
duplicate_of_idUUIDID of the master document this is a duplicate of. Allows reviewers and exports to trace back to the retained copy. The master document is always the first instance ingested.
duplicate_custodian_infoStringRecords which custodians held copies of this document. Critical for defensibility — even though duplicate copies are removed from the review set, this field preserves a complete record of who possessed the document across all data sources.

NIST Filtering Fields

NIST filtering (also known as "de-NISTing") removes known system files, operating system components, and application runtime files from the review set by matching file hashes against the NIST National Software Reference Library (NSRL). This is a standard eDiscovery practice that eliminates files that are never relevant to review — such as Windows DLLs, Office templates, and browser cache files — often removing 10–30% of a dataset before review begins.

FieldTypeDescription
is_nist_filteredBooleanWhether this file was identified as a known system or application file via NIST NSRL hash matching. Filtered files are excluded from the active review workspace but retained in the system for audit and reporting.
nist_product_nameStringName of the software product the file belongs to according to the NSRL database (e.g. Microsoft Windows 11, Adobe Acrobat Reader, Google Chrome). Helps identify why a file was filtered and provides context in exception reports.

Encryption & Integrity Fields

Dezcry performs detailed analysis of every file during ingestion to detect encryption, corruption, and file-type mismatches. These fields provide a complete picture of each document's integrity status — essential for eDiscovery exception reporting and ensuring no documents are silently missed during processing.

FieldTypeDescription
is_encryptedBooleanWhether the document is encrypted or password-protected. Encrypted files cannot be processed until decrypted — add the password to the Password Bank and re-process, or note the exception in reporting.
encryption_typeStringDetailed encryption classification: password_protected (standard Office/PDF password), drm_protected (Digital Rights Management), pgp_encrypted (PGP/GPG encryption), smime_encrypted (S/MIME email encryption), or bitlocker (full-disk encryption artefact). Helps IT teams determine the appropriate decryption method.
is_corruptBooleanWhether the document is corrupted or malformed. Corrupt files are flagged as processing exceptions and included in exception reports for transparency.
corruption_typeStringDetailed corruption classification: truncated (file cut short), malformed_header (invalid file header), invalid_structure (internal structure errors), or zero_byte (empty file). Provides actionable detail for troubleshooting or re-collection from the source.
file_signatureStringFile magic-bytes signature detected by inspecting the file's binary header (e.g. "PDF-1.4", "PK (ZIP)", "JPEG/JFIF"). Independent of file extension — provides the true format identity.
file_signature_mismatchBooleanWhether the file extension does not match the actual content detected by magic bytes (e.g. a .docx file that is actually a renamed .exe). Important for identifying potentially suspicious or mis-labelled files in forensic review.
is_decryptedBooleanWhether the document was successfully decrypted during processing using a password from the Password Bank or provided at upload time.
decryption_methodStringHow the document was decrypted: global_password_bank (matched against the matter's stored passwords) or upload_password (password provided during the upload that contained this file). Provides an audit trail for decryption actions.
File Signature Analysis

Dezcry inspects the binary magic bytes of every file to determine its true format, independent of the file extension. When a mismatch is detected (e.g. a .xlsx file that is actually a ZIP archive, or a .pdf that is actually a JPEG image), the file_signature_mismatch flag is set. This is valuable for identifying files that have been intentionally renamed to evade review, a common tactic in investigations and litigation.

Processing Exception Fields

In any eDiscovery matter, a percentage of documents will encounter processing issues. Dezcry categorises every exception with a type and action, providing the structured data needed for defensible exception reporting. These fields are included in exports and processing batch reports so that legal teams have a complete record of what was — and was not — successfully processed.

FieldTypeDescription
exception_typeStringThe category of processing exception: encryption (password-protected or encrypted file), corruption (malformed or damaged file), unsupported_format (file type not supported for text extraction), or text_extraction_failed (supported format but extraction encountered an error). Used for filtering and reporting on processing outcomes.
exception_actionStringThe action Dezcry took in response to the exception: processed_with_errors (partial processing completed with some issues noted), skipped (document could not be processed at all), partial_extraction (some content was extracted but the process did not complete fully), or placeholder_created (a placeholder entry was created for tracking and reporting purposes). Provides transparency for legal teams assessing completeness.

AI & Processing Fields

Dezcry uses AI to automatically generate document summaries, apply redactions, and produce document previews. These fields track the status and outputs of each AI-powered workflow, allowing reviewers to quickly see which documents have been summarised, redacted, or are still awaiting processing.

FieldTypeDescription
llm_summaryStringAI-generated 1–2 sentence summary of the document's content. Summaries are produced automatically after ingestion and displayed in the document list and viewer. Useful for quickly triaging documents without opening them — reviewers can scan summaries to identify relevant documents faster.
markup_statusStringRedaction and annotation workflow status: not_started (no redactions applied), pending (redaction in progress), complete (all redactions applied and markup generated), or failed (an error occurred during markup generation). Documents with markup_status of "complete" have a fully redacted preview available.
markup_page_countIntegerTotal number of pages in the markup document. Populated after markup generation completes. Useful for estimating review effort and for page-level redaction tracking in production reports.
preview_statusStringDocument preview generation status: none (no preview requested), queued (awaiting generation), generating (currently being converted), ready (preview available for viewing), or error (generation failed). Previews convert native formats to viewable HTML/PDF for in-browser document review.

Reviewer Decision Fields

These fields are set by reviewers during document review through the Decisions Panel in the document viewer, or via bulk actions on the document list. Every change to these fields is timestamped, attributed to the reviewer, and logged in the audit trail for full defensibility. Optimistic locking prevents conflicting edits when multiple reviewers work on the same matter simultaneously.

FieldTypeDescription
relevanceStringRelevance classification assigned by the reviewer — typically Responsive, Non-Responsive, or Privileged, but fully customisable at the matter level. This is the primary coding field used to separate relevant documents from the rest of the dataset.
hot_documentBooleanFlag indicating the document is particularly significant — a "smoking gun" or key evidence that warrants elevated attention. Hot documents are visually highlighted in the document list and can be filtered for quick access.
decision_commentStringFree-text annotation where reviewers explain their reasoning for the relevance decision. Useful for quality control, second-pass review, and providing context to senior reviewers or legal counsel.
relevance_coded_atDateTimeTimestamp of when the relevance decision was last recorded. Used for review progress tracking, productivity metrics, and audit trail purposes. Updated each time the reviewer modifies their decision.
Custom Decision Fields

In addition to the built-in fields above, matters can be configured with custom decision fields — single-select dropdowns, multi-select tags, or free-text fields — to capture matter-specific coding such as issue codes, privilege categories, or confidentiality designations. Custom fields are fully exportable and appear in the decisions panel alongside the standard fields. See Custom Fields for configuration details.

Extended Metadata (metadata_json)

In addition to the first-class fields above, each document contains an extended metadata object with format-specific properties organised by namespace. These fields capture the full depth of information embedded within each file type — from PDF authoring tools to image EXIF geolocation data to email authentication results. Extended metadata is viewable in the metadata panel and included in exports.

NamespaceDocument TypesFields
generalAll documentsfilename, extension, mime, document_type, size_bytes, upload_batch_id. Present on every document as the baseline property set.
emailEML, MSGfrom, to, cc, bcc, subject, message_id, in_reply_to, references, conversation_index, date_sent_utc, date_received_utc, attachments (count and names). Also includes email authentication results: dkim_result, spf_result, and dmarc_result — useful for identifying spoofed or unauthenticated messages.
pdfPDF filestitle, author, subject, producer (the application that generated the PDF), creator (the originating application), creation_date_utc, modification_date_utc, page_count, is_encrypted. Extracted from both the PDF info dictionary and XMP metadata streams when available.
ooxmlWord, Excel, PowerPoint (DOCX, XLSX, PPTX)Core properties: created, modified, title, subject, creator, lastModifiedBy, revision, keywords, description, category. Application properties: application (e.g. Microsoft Excel), company, template. These are the properties visible in a file's "Properties" dialog in Microsoft Office.
imageJPEG, PNG, TIFF, BMP, GIFformat (e.g. JPEG, PNG), mode (e.g. RGB, RGBA), width, height. EXIF data (when available): DateTimeOriginal, DateTimeDigitized, Make (camera manufacturer), Model (camera model), Software, Orientation, XResolution, YResolution, and GPSInfo (latitude, longitude, altitude). EXIF geolocation data can be critical in investigations involving photographs.
Email Authentication (DKIM, SPF, DMARC)

For email documents, Dezcry extracts the authentication results from email headers when present.DKIM (DomainKeys Identified Mail) verifies the email was not altered in transit.SPF (Sender Policy Framework) checks that the sending server is authorised for the domain. DMARC (Domain-based Message Authentication) combines both checks. These results can help identify spoofed or potentially fraudulent emails during an investigation.

AI Classification

Overview

AI Classification lets you automatically categorise documents using custom decision fieldsdefined by your team. Unlike manual review, AI classification processes entire document sets in minutes, producing predictions with calibrated confidence scores so reviewers can focus their attention on genuinely ambiguous items while high-confidence predictions are applied automatically.

Classification runs on large language models within the same Azure environment as the rest of the platform — no document data leaves your deployment. The system includes confidence debiasing to correct for known LLM overconfidence, a verification pass for borderline predictions using a separate model, and intelligent document chunking for long documents. Every prediction includes a calibrated confidence score and rationale, and all decisions are logged in the audit trail.

How Classification Differs from Redaction

Classification and redaction serve different purposes. Classification assigns labels to entire documents — categorising them by type, relevance, sensitivity, or any custom taxonomy your team defines. Redaction identifies and removes specific text within documents. Classification helps your team decide what to do with a document; redaction helps you prepare it for disclosure.

Custom Fields

Before running a classification job, you define the decision fields that the AI should predict. These are entirely customisable — you define the field names, types, options, and instructions that are specific to your review. Navigate to Classification within a matter to configure fields.

Field TypeDescriptionExample
Single SelectThe AI chooses exactly one value from a predefined list of options. Best for mutually exclusive categories.Relevance: Responsive / Non-Responsive / Partially Responsive
Multi SelectThe AI can select one or more applicable values from a list. Best for non-exclusive labels.Data Categories: Financial / Medical / Employment / Personal
BooleanA simple yes/no decision.Contains PII: true / false
Free TextThe AI provides a short free-text response. Best for summaries or descriptions.Key Topics: One-sentence description of the document content

For each field, you provide natural-language instructions that tell the AI exactly how to evaluate documents. The quality of these instructions directly affects classification accuracy. Dezcry provides a real-time quality indicator as you write:

Quality LevelLengthGuidance
PoorUnder 10 charactersToo short to be useful — the AI has no context for making decisions. Add specific criteria, examples, and edge case guidance.
Fair10–50 charactersBasic direction, but lacks nuance. Adding more detail about what qualifies for each option and how to handle ambiguous cases will improve accuracy.
Good50–200 charactersThe AI has enough context to make reliable predictions. Consider adding examples of borderline cases.
Excellent200+ charactersDetailed instructions with clear criteria, examples, and edge case handling. This produces the most accurate and consistent results.
Writing Effective Instructions

Good classification instructions should include:

  • Clear criteria — what makes a document qualify for each option
  • Examples — concrete examples of what belongs in each category
  • Edge cases — how to handle ambiguous or borderline documents
  • Context — relevant background about the matter, industry, or regulatory framework
  • Negative examples — what should not be classified as a given category

For example, instead of "Is this relevant?", write: "Classify as Responsive if the document contains information about the data subject's employment history, salary, performance reviews, or HR communications. Classify as Non-Responsive if the document is a system-generated notification, marketing material, or relates to a different individual. Classify as Partially Responsive if the document contains some relevant content mixed with unrelated material."

Classification Sets

A classification set is a reusable configuration that defines which fields to predict, how the AI should behave, and what confidence thresholds to apply. Classification sets can be run multiple times — for example, after adding new documents to the matter. To create and run a classification:

  1. 1
    Select scopeChoose all documents or a saved search to define which documents to classify. The scope is frozen at run time — new documents added later won't be included in this run.
  2. 2
    Name the setGive the classification set a descriptive name for tracking and audit purposes.
  3. 3
    Configure fieldsDefine one or more custom decision fields with types, options, and natural-language AI instructions.
  4. 4
    Set thresholdsConfigure the auto-accept threshold (default 0.85) and review threshold (default 0.60) to control how predictions are routed.
  5. 5
    System prompt (optional)Provide an optional system-level prompt that applies to all fields — useful for setting overall context like the matter type, jurisdiction, or review protocol.
  6. 6
    Optional samplingFor large document sets, configure prevalence sampling to validate classification quality on a subset before committing to a full run.
  7. 7
    Review and launchReview all settings in a summary view and start the classification job.

Confidence Thresholds and Routing

Dezcry uses a three-tier routing system based on calibrated confidence scores to determine how each prediction is handled:

Confidence RangeRoutingDescription
Above auto-accept (default: > 0.85)Auto-appliedThe prediction is applied automatically without requiring human review. The AI is highly confident and the prediction is defensible.
Between review and auto-accept (default: 0.50–0.85)Flagged for reviewThe prediction is saved but flagged as needs_review. A human reviewer must approve, correct, or reject it before it is applied.
Below review threshold (default: < 0.50)IndeterminateThe AI could not make a reliable prediction. The document is flagged for manual coding by a reviewer.

Both thresholds are configurable per classification set, allowing teams to tune the trade-off between automation and human oversight based on the risk profile of the review. A high-stakes privilege review might use a lower auto-accept threshold (0.95) to ensure more human review, while a routine document-type classification might use a higher threshold (0.80) to maximise automation.

Confidence Calibration (Debiasing)

LLMs are known to be systematically overconfident — they tend to report confidence scores of 0.90 or 0.95 even when their actual accuracy is closer to 0.80–0.85. This is particularly problematic in eDiscovery where confidence thresholds drive review decisions.

Dezcry applies empirical confidence debiasing — a calibration layer that adjusts raw LLM confidence scores to better reflect true accuracy. The calibration is:

  • Monotonic — higher raw confidence always produces higher calibrated confidence (preserves ranking)
  • Deterministic — the same input always produces the same output (defensible in regulatory contexts)
  • Conservative — systematically pulls overconfident scores toward empirical accuracy curves

The calibration is based on published research on LLM confidence calibration and fitted to eDiscovery-specific accuracy measurements. It compresses the overconfident tail (0.85–0.99) more aggressively than the well-calibrated low-confidence range (0.05–0.50).

Verification Pass

For predictions that fall in a borderline confidence range (0.35–0.70 by default), Dezcry automatically triggers a verification pass — a second classification attempt using a different model deployment. This functions as a quality control layer:

  • The verification pass uses a different prompt persona ("QC reviewer") to challenge the initial classification
  • It uses a separate model deployment for model diversity, reducing correlated errors
  • If the verification agrees with the first pass, the confidence scores are averaged (typically increasing the final confidence)
  • If the verification disagrees, the lower confidence score is used, the verification's classification is adopted, and the result is force-flagged for human review

Document Chunking for Long Documents

Documents that exceed the model's context budget (default: ~112,000 characters) are automatically split into deterministic chunks for processing. Chunking is designed to maintain classification accuracy:

  • Sentence-boundary aware — chunks are split at sentence boundaries, never mid-sentence, preserving semantic coherence
  • Overlapping — adjacent chunks share ~200 characters of overlap, ensuring context continuity across chunk boundaries
  • Deterministic — the same document always produces the same chunks, ensuring reproducible results
  • Fallback splitting — if a single sentence exceeds the chunk limit, it falls back to word-boundary splitting with overlap

When a document is chunked, each chunk is classified independently, and results are aggregated using a weighted voting system:

  • Each chunk's prediction is weighted by its confidence score
  • Chunks that return null (no classifiable content) are excluded from the vote, not counted as evidence
  • The winning prediction is determined by total confidence-weighted score, with tie-breaking by peak single-chunk confidence
  • A unanimity bonus increases confidence when all chunks agree; disagreement reduces it
  • A dissent penalty is applied when any dissenting chunk has high confidence (≥ 0.70), with a note recommending manual review
Chunk Disagreement

When different chunks of a document produce different classifications, this is flagged as chunk disagreement and the document is automatically flagged for human review. This is an important quality signal — it often indicates that a document contains mixed content (e.g. a partially responsive document where some sections are relevant and others are not). The aggregated rationale includes a note about the dissenting chunks and their confidence levels.

Classification sets track runs with detailed progress reporting: total documents, documents processed, errors encountered, and token usage for cost attribution. Completed runs automatically create a saved search containing the classified documents for downstream processing.

Classification runs support parallel processing — multiple documents are classified concurrently (default: 6 simultaneous LLM calls) to maximise throughput while staying within AI rate limits. Runs can be cancelled at any time, and cancellation takes effect cleanly after the current document finishes processing.

The classification progress view shows real-time processing with a live console, document-by-document results including confidence scores, and estimated time remaining. You can continue working while classification runs in the background.

Reviewing Predictions

After a classification run completes, reviewers can examine the results. Each document receives a result for every configured field, containing:

FieldDescription
Predicted ValueThe AI's chosen classification for this field (e.g. "Responsive", "Financial"). Null if the AI could not determine a classification.
Confidence ScoreA calibrated 0.0–1.0 score reflecting the AI's certainty. Debiased to correct for LLM overconfidence.
RationaleA short natural-language explanation of why the AI made this prediction, referencing specific content in the document.
Needs ReviewBoolean flag — true if the confidence is below the auto-accept threshold, if chunks disagreed, or if the verification pass overrode the initial classification.
Chunk CountHow many chunks the document was split into (1 for short documents that fit in a single context window).
Chunk DisagreementWhether different chunks of the document produced different predictions — a signal that the document may contain mixed content.
Verification StatusWhether the verification pass was triggered and whether it agreed or disagreed with the initial classification.

Reviewers can take the following actions on any prediction:

  • Approve — accept the AI's prediction as the final decision for this document and field
  • Correct — override the AI's prediction with a different value chosen by the reviewer. The correction is logged alongside the original AI prediction for audit purposes.
  • Reject — dismiss the prediction entirely, leaving the field uncoded for this document

All review actions are logged in the audit trail with the reviewer's identity, timestamp, the original AI prediction, and the reviewer's decision. This provides a defensible record of how every classification decision was made — whether by AI with human approval, by human correction of an AI suggestion, or by purely manual coding.

Prevalence Sampling

For large document sets, Dezcry supports prevalence sampling — classifying a statistically representative subset of documents before committing to a full run. This allows teams to:

  • Validate that the classification instructions produce accurate results before processing the full set
  • Estimate the prevalence of each category in the collection (e.g. "approximately 30% of documents are responsive")
  • Calculate precision and recall metrics by comparing AI predictions against manual coding on the sample
  • Refine instructions based on sample results before running the full classification

Sampling results are stored as ClassificationSample records, preserving both the AI prediction and the human-coded ground truth for quality measurement and defensibility.

AI Redaction

Overview

AI Redaction is Dezcry's flagship feature — a 5-layer detection pipeline that identifies personal data, sensitive content, and legally privileged material for redaction. The system is designed as a reviewer aid, not an autonomous tool: every AI suggestion is reviewable, editable, and logged before it is applied.

Redaction runs on large language models within the same Azure environment. No document data is sent to any third-party service. The pipeline combines deterministic pattern matching with LLM analysis and cross-document entity resolution for comprehensive coverage.

Redaction Types

Dezcry supports three redaction protocols, each tailored to a different use case:

TypePurposeConfiguration
DSARRemove the data subject's personal information from documents being disclosed. Uses a whitelist approach — you specify the data subject's name, email addresses, and phone numbers, and the AI identifies all instances.Data subject first/last name, known email addresses, known phone numbers
PrivilegeIdentify and redact legally privileged communications (attorney-client privilege, work product doctrine). Uses domain and keyword filtering to detect privileged material.Privileged individuals, law firm domains, privilege keywords, custom instructions
Ad HocCustom redaction with free-form instructions. Use for any redaction task that doesn't fit the DSAR or privilege templates.Free-text instructions describing what to redact

Redaction Models

When creating a redaction set, you select which entity categories the AI should detect. Each category has a distinct colour for visual identification in the review interface:

ModelDescriptionColour
NamesPersonal names, first/last names, initials, nicknamesRed
EmailsEmail addressesOrange
Phone NumbersPhone numbers, fax numbers, mobile numbersAmber
IdentifiersSSN, passport numbers, driver licence numbers, national IDsGreen
EmploymentJob titles, employee IDs, salary information, work historyBlue
Company IDsCompany registration numbers, tax IDs, ABN/ACNPurple
LocationsPhysical addresses, postal codes, GPS coordinatesMagenta
Political OpinionsPolitical affiliations, party membership, voting recordsLight Purple
Health InformationMedical conditions, treatments, diagnoses, medicationsRed
Sexual OrientationGender identity, sexual orientation detailsPink
FinancialBank account numbers, credit card numbers, financial dataGreen
Auth CredentialsPasswords, PINs, API keys, security tokensCyan
Family AssociationsRelationships, dependents, family member detailsLight Red
Device IDsIP addresses, MAC addresses, device identifiersLight Blue

Sensitive categories — health information, sexual orientation,political opinions, and auth credentials — use a lower default auto-apply confidence threshold (0.70) to ensure more conservative handling.

5-Layer Pipeline

Dezcry processes each document through a 5-layer redaction pipeline, combining multiple detection methods for comprehensive coverage:

LayerNameMethodDescription
L1Pattern ScanNER engine (deterministic)Pattern-matching engine that detects structured PII using regex rules and named entity recognition. Provides a fast, deterministic baseline — catches email addresses, phone numbers, credit card numbers, and standard identifier formats.
L2AI AnalysisLarge language modelThe primary AI detection pass. The LLM analyses each document with context from L1 and L4 results, identifying contextual personal data that pattern matching alone would miss — such as names mentioned in natural language, implied relationships, and sensitive content.
L3AI Double-CheckIndependent LLM verificationAn independent verification layer using a separate model deployment. Acts as a "senior eDiscovery QA reviewer" — adversarially examines L2 results to confirm, reject, or upgrade redaction entries. Catches false positives and missed items.
L4Cross-ReferenceEntity Resolution (algorithmic)Fuzzy clustering of entity variants across all documents in the scope. Groups different spellings and formats of the same entity (e.g. "J. Smith", "John Smith", "john.smith@acme.com") into clusters with a canonical form. Ensures consistent redaction across the entire document set.
L5Smart RoutingConfidence Routing (algorithmic)Routes each redaction entry based on its confidence score: high-confidence items are auto-applied, medium-confidence items go to the human review queue, and low-confidence items are flagged for manual inspection.
Pipeline Execution

The layers execute in the order: L4 (entity resolution) → L1 (pattern scan) → L2 (AI analysis) → L3 (verification) → L5 (routing). L4 runs first to build the entity index, which provides context for the subsequent AI layers. Progress is tracked per-phase with real-time status updates in the UI.

Reviewing Redactions

After a redaction set completes processing, navigate to the Review page to examine and approve the AI's suggestions. The review queue presents each detected entity with:

  • Original text — the exact text the AI identified for redaction
  • Model category — the entity type (names, emails, etc.) with colour-coded badge
  • Source layer — which pipeline layer detected it (L1, L2, L3, L4)
  • Confidence score — how certain the AI is that this is a genuine entity
  • Verification status — confirmed, rejected, upgraded, or new (from L3)
  • Page location — the page number and pixel coordinates within the document

Reviewers can filter the queue by layer, model category, and confidence threshold. For each entry, reviewers can:

  • Approve — accept the redaction and apply it to the document
  • Reject — dismiss the suggestion as a false positive
  • Flag for review — escalate to a senior reviewer for a second opinion

The review queue paginates at 100 entries per page. All review decisions are logged in the audit trail with the reviewer's identity, timestamp, and action taken.

Manual Redactions

In addition to AI-assisted redaction, reviewers can manually draw redaction boxes on any document using the markup viewer. Manual redactions are applied directly to the document's markup images and are tracked alongside AI redactions in the audit trail.

For spreadsheet documents, Dezcry provides a specialised spreadsheet markup viewer that allows cell-level redaction — reviewers can select individual cells or ranges to redact.

AI Summaries & Chat

Document Summaries

Dezcry automatically generates LLM-powered summaries for every document in a matter. Summaries are 1–2 sentence overviews that give reviewers quick context to assess relevance, decide on inclusion or exclusion, and move through large review sets faster.

Summaries are generated by a dedicated language model running on GPU infrastructure within the same Azure environment. No document data is sent to third-party services. Summaries are generated in the background and are available alongside the document in the metadata panel.

  • Summaries are generated automatically on upload and during background backfill
  • The summary language is configurable per matter (English, German, French, Spanish, etc.)
  • Summaries are searchable and appear in the document metadata panel
  • Administrators can trigger summary regeneration for any document or batch

Document Chat

The Document Chat panel provides conversational AI for asking questions about documents. Available from the document viewer, chat uses Retrieval-Augmented Generation (RAG) to find relevant content and generate accurate answers with source citations.

How it works:

  1. 1
    Ask a questionType a natural-language question in the chat panel (e.g. "What are the key dates mentioned in this document?")
  2. 2
    Hybrid searchDezcry searches for relevant content using both keyword search (Elasticsearch) and semantic search (vector embeddings), combining results via Reciprocal Rank Fusion.
  3. 3
    AI generates answerThe LLM reads the relevant document chunks and generates a response with inline citations referencing specific documents.
  4. 4
    Source verificationEach response includes clickable source document references (e.g. [DOC-00028]) so reviewers can verify the AI's answer.
Rate Limiting

Chat is rate-limited to 20 queries per minute per user and 60 queries per minute per matter to ensure fair resource allocation across teams.

AI OCR

Overview

AI OCR (Optical Character Recognition) extracts searchable text from image-based documents — scanned PDFs, photographs, screenshots, and other image files that don't contain embedded text. Dezcry uses the Azure Computer Vision Read API for high-accuracy text extraction.

OCR can be enabled automatically during upload (as a processing option) or run manually on specific documents or batches after ingestion.

Running OCR

Navigate to the AI OCR page within a matter to manage OCR jobs:

  1. 1
    Create a jobSelect the scope — all documents or a saved search — and start the OCR job.
  2. 2
    ProcessingDezcry sends each image document to the Azure Computer Vision API for text extraction. Progress is tracked in real-time with 4-second polling intervals.
  3. 3
    ResultsExtracted text is stored in the document record and immediately becomes searchable. Per-document results include pages extracted, characters extracted, confidence scores, and processing duration.

OCR job results track each document individually, reporting:

  • Pages and characters extracted per document
  • Per-document status (completed, failed, skipped)
  • Error messages for failed documents
  • Processing duration per document

Jobs can be cancelled while running or queued. The AI OCR dashboard shows aggregate metrics: total jobs, completed jobs, active jobs, and total documents processed.

Password Bank

Overview

The Password Bank stores passwords and credentials for encrypted documents within a matter. When Dezcry encounters password-protected files during ingestion (encrypted PDFs, password-protected ZIPs, protected Office documents, encrypted PST files), it attempts to decrypt them using passwords from the Password Bank.

Managing Passwords

Navigate to the Password Bank page within a matter to manage credentials:

  • Add passwords — enter passwords with optional labels and tags for organisation
  • Labels — human-readable hints to identify what the password is for (the label is visible, the password itself is hidden)
  • Tags — categorise passwords (e.g. "client", "custodian-smith", "batch-3")
  • Usage tracking — each password tracks when it was last used and how many times it has been applied
  • Edit and delete — update or remove passwords with confirmation dialogs

Passwords are reusable across all uploads within the matter. When new documents are uploaded, all passwords in the bank are tried against any encrypted files. The upload summary reports how many files were successfully decrypted and how many failed decryption.

Export

Overview

Dezcry's Export system produces disclosure-ready output packages with Bates numbering, metadata load files, burned redactions, and full decision history. Exports are configured through a multi-step wizard and can be re-run with updated settings.

Two export types are supported:

  • Production — formal disclosure packages with Bates numbering, branded headers/footers, and structured volume organisation. Used for regulatory submissions and formal DSAR responses.
  • Review — simpler packages for internal review or transfer to external counsel, without production-level numbering requirements.

Export Wizard

The export wizard guides you through a 6-step configuration process:

  1. 1
    ScopeSelect which documents to export — all documents in the matter or a saved search.
  2. 2
    Name & TypeName the export set and choose Production or Review type.
  3. 3
    Output ComponentsSelect which output types to include: metadata load file, natives, images, text files, and/or PDFs.
  4. 4
    Numbering & BrandingConfigure Bates numbering (prefix, suffix, start number, padding) and optional header/footer branding.
  5. 5
    Load File & VolumesConfigure the metadata load file format, encoding, date formats, and volume organisation settings.
  6. 6
    Review & RunReview all settings in a summary view and launch the export.

Scope Selection

Export scope defines which documents are included in the output package. You can choose:

  • All documents — exports every document in the matter
  • Saved search — exports only documents matching a previously saved search query and filters

The wizard displays a document count for the selected scope so you can verify the volume before proceeding. The scope is frozen at run time — new documents added to the matter after the export starts will not be included.

Output Components

Select which output types to include in the export package:

ComponentDescription
Metadata Load FileA structured data file (DAT, CSV, or HTML) containing all document metadata, decisions, and Bates numbers. Compatible with Relativity, Concordance, and other review platforms.
NativesOriginal source files in their native format (DOCX, PDF, XLSX, etc.)
ImagesRendered document images (single-page or multi-page TIFF) with optional Opticon or iProrev load files for image cross-referencing.
Text FilesExtracted plain text content for each document, useful for downstream text analytics or cross-referencing.
PDFsRendered PDF versions of each document, optionally with burned-in redactions and Bates number branding.

Numbering & Branding

Production exports support Bates-style document numbering:

SettingDescriptionExample
PrefixText prepended to every Bates numberACME-
SuffixText appended to every Bates number-PROD
Start NumberThe first number in the sequence1
Digit PaddingZero-padding width for the numeric portion7 → 0000001
Numbering ModeDocument-level (one number per document) or page-level (one number per page)Document-level
Page SeparatorCharacter between document number and page number in page-level mode_ → ACME-0000001_001
Attachment GroupingKeep parent documents and attachments numbered sequentiallyEnabled
Sort OrderHow documents are ordered for numbering (sequential, family group, or by field)doc_seq

Optional branding adds headers and footers to PDF output:

  • Header and footer with left, centre, and right sections
  • Template tokens: {BatesNumber}, {PageX}, {PageY}
  • Default footer: "CONFIDENTIAL"

Load Files & Volumes

Load file settings control the metadata output format:

SettingDefaultDescription
FormatDATLoad file format — DAT (Concordance), CSV, HTML, or custom TXT
EncodingUTF-8Character encoding for the load file
Date FormatMM/dd/yyyyFormat for date fields in the load file
Time FormatHH:mm:ssFormat for time fields

Volume settings control the physical organisation of the export package:

SettingDefaultDescription
Volume PrefixVOLPrefix for volume folder names (VOL001, VOL002, etc.)
Start Number1First volume number
Digit Padding3Zero-padding for volume numbers
Max Volume Size4500 MBMaximum size per volume folder before splitting
Max Files Per Folder5000Maximum files in a single subfolder
File NamingControl NumberHow files are named — by Bates/control number or original filename

Downloading Exports

Once an export run completes, the output package is available for download. The export page shows:

  • Run status — running, completed, failed, or cancelled
  • Progress — documents processed vs. total
  • Output size — total size of the generated package
  • Duration — time taken to generate the export
  • Error and warning counts — per-document issues encountered
  • Settings snapshot — the exact configuration used for this run

Redaction integration allows you to burn redactions into the export output. Select a completed redaction set and choose the placeholder mode:

  • None — no redaction placeholders (redacted areas are simply blacked out)
  • Brackets — redacted text replaced with category labels in brackets
  • Redaction block — solid black boxes over redacted content

All export actions — creation, run start, download — are logged in the audit trail.

Audit & Reporting

Audit Log

Every significant action in Dezcry is recorded in an immutable audit log, providing a defensible trail for regulators, legal review, and internal governance. The audit log captures:

CategoryActions Tracked
DocumentsViewed, uploaded, downloaded, deleted, summaries regenerated
DecisionsRelevance coding updates, bulk decision changes, tag modifications
Redactions (Manual)Redaction boxes drawn, updated, or deleted on documents
Redaction ReviewAI redaction entries approved, rejected, or escalated
Redaction JobsSets created/deleted, runs started/completed/cancelled/failed
ClassificationSets created/deleted, runs started/completed/cancelled/failed
ExportSets created/updated/deleted/cloned, runs started/cancelled, downloads
MarkupPreview and markup images generated or failed
DownloadsPDF downloads, batch PDF downloads, redacted spreadsheet downloads
SearchSaved searches created, updated, or deleted
ChatMessages sent, conversations created/updated/deleted
IndexingDocuments indexed, matter re-indexed, index cleared
AuthLogin success/failure, password changes, account locks
AdminUsers created/updated, roles changed, matter access granted/revoked
BillingUsage recalculated, invoices generated

Each audit entry includes: the action type, target (which document, set, or resource was affected),user identity (who performed it), timestamp, and details (rich context including file names, counts, old/new values). The audit log is filterable by action type, target type, user, and date range, with pagination at 50 entries per page.

Matter-level audit is accessible from the Audit page within each matter. System-wide audit is available to administrators from the Admin section.

Reporting Dashboard

The Reporting page provides analytics dashboards with visualisations across eight tabs:

TabMetrics
OverviewExecutive summary KPIs — document counts, completion rates, activity summary
ProcessingIngestion batch history, volume growth over time, processing throughput
RedactionRedaction runs, entities detected by model, layer statistics, coverage rates
ClassificationClassification runs, field outcomes, confidence score distributions
AI PerformanceToken usage, cost attribution, model accuracy and quality metrics
ReviewReview queue depth, items pending review, reviewer turnaround times
ActivityUser action trends, audit log summaries, active reviewer counts
ExportsExport history, production statistics, deliverable sizes

Dashboards include KPI cards, bar charts, line charts, pie charts, and area charts. Reports can be exported as PDF with embedded charts, matter information headers, and generation timestamps.

Billing & Usage

The Billing page shows storage usage and costs for each matter. Storage is broken down into seven categories:

CategoryDescription
DocumentsOriginal uploaded files in their native format
Extracted TextPlain text extracted during processing and OCR
Markup ImagesRendered page images for the redaction workflow
Redacted PDFsPDF versions with burned-in redactions and branding
IndicesElasticsearch search indices for the matter
EmbeddingsVector embeddings used for AI chat and semantic search
OtherMiscellaneous processing artifacts

The billing dashboard shows current usage (total GB and projected monthly cost), storage breakdown by category, usage history over time, and invoice details. Pricing is per-GB with regional variations and volume tier discounts.

Administration

User Management

The Admin page (accessible to admin and super_admin roles) provides a central interface for managing all users in the organisation. The user list shows:

  • Email address and full name
  • Assigned role
  • Account status (active, inactive, pending, invited, locked, deactivated)
  • 2FA/MFA enablement status
  • Last login date
  • Number of matter assignments

Administrators can search by email or name, and filter by status or role. Available actions include creating users, editing details, changing roles, sending invitations, resetting passwords, and activating or deactivating accounts.

Roles & Permissions

Dezcry uses a hierarchical role-based access control (RBAC) system with four roles. Roles are hierarchical — each role inherits all permissions from the roles below it. Access is enforced at two levels: role-level (what actions a user can perform across the platform) and matter-level (which specific matters a user can access).

Role Hierarchy

RoleDescriptionMatter Access
Super AdminFull platform control. Can manage all users (including other admins), delete matters, configure system-wide settings, and access every feature. Intended for platform owners and IT administrators.Implicit access to all matters across the tenant — no explicit assignment required.
AdminOrganisation-level management. Can create matters, invite and manage users, assign users to matters, view audit logs, manage the password bank, and configure billing. Cannot delete matters or manage other admins.Implicit access to all matters across the tenant — no explicit assignment required.
ReviewerThe primary working role for legal, privacy, and compliance team members. Can upload documents, review and code documents, run AI classification and redaction jobs, create and manage exports, manage saved searches, and run search term reports.Must be explicitly assigned to each matter. Can only see and work within matters they have been granted access to.
Read OnlyView-only access for stakeholders, external counsel, or auditors who need visibility but should not make changes. Can browse documents, view metadata, read reports, use chat, and download exports — but cannot upload, modify, or run any jobs.Must be explicitly assigned to each matter. Can only see matters they have been granted access to.

Detailed Permission Matrix

The following table shows the minimum role required for each action in the platform. Higher roles automatically inherit all permissions from lower roles.

Feature AreaActionMinimum Role
MattersView mattersRead Only
MattersCreate new mattersAdmin
MattersUpdate matter settingsAdmin
MattersDelete mattersSuper Admin
DocumentsView and search documentsRead Only
DocumentsUpload documentsReviewer
DocumentsUpdate decisions, tags, and codingReviewer
DocumentsDelete documentsAdmin
AI ClassificationView classification resultsRead Only
AI ClassificationCreate sets and run classification jobsReviewer
AI RedactionView redaction resultsRead Only
AI RedactionCreate sets, run jobs, and review entriesReviewer
ExportView export sets and download packagesRead Only
ExportCreate export sets and run exportsReviewer
SearchView saved searchesRead Only
SearchCreate and manage saved searchesReviewer
Search Term ReportsView search term reportsRead Only
Search Term ReportsCreate and run reportsReviewer
Chat / AI Q&amp;AAsk questions and view chat historyRead Only
ReportingView analytics dashboardsRead Only
BillingView billing and usageRead Only
BillingManage billing settingsAdmin
Password BankView stored passwordsAdmin
Password BankAdd, edit, and delete passwordsAdmin
Audit LogView matter and system audit logsAdmin
User ManagementView and manage usersAdmin
User ManagementInvite users and assign rolesAdmin
System AdminManage other admins, delete matters, system configSuper Admin

Matter-Level Access Control

Access to individual matters is controlled separately from role permissions:

  • Super Admin and Admin roles have implicit access to every matter in the tenant. They do not need to be explicitly assigned — they can see and manage all matters automatically.
  • Reviewer and Read Only roles require explicit assignment to each matter. An administrator must grant access by assigning the user to the matter. Until assigned, the matter is completely invisible to the user — it does not appear in their matter list and cannot be accessed via direct URL.

This two-level model enables organisations to enforce segregation of duties andneed-to-know access. For example, a reviewer handling HR DSARs can be restricted to only HR-related matters, while a different reviewer handles customer DSARs — even though both have the same role, they see entirely different matter sets.

Tenant Isolation

All access controls operate within a tenant boundary. Every database query is scoped to the authenticated user's tenant, and every matter-level operation verifies that the matter belongs to the same tenant. Cross-tenant access is architecturally impossible — there is no mechanism in the application layer to access another organisation's data, even with a Super Admin role.

Document-Level Access

Access to individual documents follows the matter access model. If a user has access to a matter, they can see all documents within that matter (subject to their role permissions for viewing vs. editing). There is no per-document access restriction — access is controlled at the matter level, which is the standard approach in eDiscovery and DSAR review workflows where reviewers need to see the full context of a matter to make defensible decisions.

Security Enforcement

Permissions are enforced server-side on every API request, not just in the UI. Even if a user manipulates the frontend or constructs API requests directly, the backend validates their role and matter access before processing any operation. Denied requests receive a structured 403 Forbidden response with a clear explanation of why access was refused.

Inviting Users

Administrators invite new users by providing their email address, name, and assigned role. The invitee receives an email with a single-use invitation link that guides them through:

  1. 1
    Set passwordCreate a strong password (minimum 12 characters, must include uppercase, lowercase, and a number).
  2. 2
    Configure 2FAScan a QR code with an authenticator app (Google Authenticator, Authy, etc.) and enter the verification code.
  3. 3
    Complete setupAccount is activated and the user can sign in.

Invitation links are single-use and have an expiration date. The invitation tracks who created it, when it was used, and the IP address of the accepting user.

Admin Dashboard

The Admin Dashboard provides tenant-wide analytics and operational oversight:

  • Users overview — total, active, locked, invited users; 2FA adoption rate; role distribution; currently online users
  • Matters overview — total matters; status distribution (open/closed/archived); type distribution; document count and storage per matter
  • Documents overview — total document count; total storage; status distribution; encrypted, corrupt, and duplicate counts
  • Processing status — recent upload batches; active classification, redaction, and export runs
  • Storage breakdown — detailed storage usage by category across all matters
  • Recent audit activity — latest system-wide audit entries

System Audit

The System Audit page in the Admin section provides a tenant-wide view of all audit log entries across all matters. This allows administrators to monitor platform-wide activity, investigate security events, and produce compliance reports. The same filtering and search capabilities from the matter-level audit are available at the system level.

Security & Compliance

Data Security

Dezcry is hosted entirely on Microsoft Azure, using Azure Container Apps, Azure PostgreSQL, and Azure Storage. All infrastructure runs within a single resource group with network-level isolation. The GPU worker service that handles AI inference runs on internal-only ingress and is not accessible from the public internet.

The platform operates a logically isolated multi-tenant architecture. Each organisation's data — documents, metadata, reviewer decisions, and audit logs — is segregated at the application and database level. Uploaded files are stored in organisation-scoped storage paths. Cross-tenant data access is not possible through the application layer.

Encryption

All data is encrypted in transit using TLS 1.2+ for all connections between services, storage, and the database. Data is encrypted at rest using Azure-managed encryption keys via Azure Storage Service Encryption and Azure Database encryption. Uploaded files, processed outputs, and database records are all covered.

Data Residency

Dezcry supports regional data residency — each matter can be hosted in a specific Azure region to meet local data protection requirements:

  • Australia East — default region
  • Switzerland North — for Swiss data protection requirements
  • Germany — for German/EU data residency
  • United Kingdom — for UK data protection requirements

AI models are deployed regionally — Australian data uses Australian AI endpoints, Swiss data uses Swiss endpoints, and so on. Enterprise customers can discuss deployment in additional regions or dedicated/on-premises environments.

AI Data Handling

Dezcry runs its own AI models for redaction, classification, and summarisation. No document data is sent to third-party AI services. All AI inference happens within the same Azure environment as the rest of the platform:

  • Classification and redaction use large language models deployed within the Azure environment
  • Chat and summaries use a dedicated language model running on GPU infrastructure
  • Embeddings are generated on CPU within the same container environment

AI-assisted redaction is designed as a reviewer aid, not an autonomous system. The AI surfaces likely sensitive content for human review. Reviewers approve, reject, or edit every suggestion before it is applied. All AI-generated suggestions and reviewer decisions are logged in the audit trail.

Customer data is never used to train or fine-tune models shared across tenants.