Adobe - Legal
Contract Analysis in 5 Minutes Instead of 5 Hours
How we built an AI contract analysis engine for Adobe Document Cloud that processes contracts 60x faster than manual review.
Duration
10 weeks
Team
3 engineers, 1 ML engineer, 1 PM
Tech Stack
The Challenge
Adobe's Document Cloud team needed to build a core AI feature capable of analyzing complex commercial contracts - NDAs, MSAs, SaaS agreements, licensing deals, and procurement contracts - for risks, non-standard terms, and compliance issues. Their target users were mid-market legal departments reviewing 200-500 contracts per month with teams of 3-5 attorneys. Manual review of a single contract took 3-5 hours depending on complexity, creating a hard ceiling on throughput and making it economically unviable to review lower-value agreements at all.
Adobe's internal engineering team had attempted to build the first version using GPT-4 with prompt engineering. While the prototype could identify some clause types, it had three critical problems: (1) it hallucinated clause references that didn't exist in the contract roughly 8% of the time, (2) it couldn't maintain context across long contracts exceeding 40 pages, and (3) it had no mechanism for handling company-specific playbooks - the custom risk thresholds and preferred language that vary from one legal department to another. After 4 months of iteration, accuracy on clause detection was stuck at 78% and their beta users were losing trust in the output.
Our Approach
We started by building a ground truth dataset. We partnered with Adobe's legal advisory board of practicing attorneys to annotate 2,400 commercial contracts across 12 contract types, tagging 47 distinct clause categories, risk levels, and non-standard language patterns. This annotation effort took 4 weeks with a team of 6 contract attorneys and produced the highest-quality labeled legal dataset we've worked with.
For the model architecture, we evaluated three approaches: (1) fine-tuning GPT-4 via the OpenAI API, (2) fine-tuning an open-source LLM (Llama 2 70B) for full control, and (3) a hybrid pipeline combining a purpose-trained clause classifier with an LLM for natural language explanation. We chose option 3 after benchmarking showed that a fine-tuned DeBERTa-v3 classifier achieved 99.1% accuracy on clause detection - outperforming both GPT-4 fine-tuned (94.2%) and Llama 2 (91.7%) - while being 200x cheaper to run at inference time. The LLM (GPT-4 Turbo) was reserved for the explanation and risk-summary layer, where its natural language capabilities were genuinely needed, and we implemented a retrieval-augmented generation (RAG) approach to ground its output in the actual contract text, eliminating hallucinations.
For handling long contracts, we built a hierarchical chunking pipeline: contracts are first segmented into sections using a layout-aware parser (handling headers, numbered clauses, exhibits, and schedules), then each section is classified and analyzed independently, and finally a synthesis step assembles the full contract report. This approach handles contracts up to 200+ pages without context window limitations. The playbook system was built as a configurable rules layer - customers define their preferred positions, fallback positions, and red lines for each clause type, and the system scores contract language against those benchmarks.
The Solution
The production system is a multi-stage pipeline deployed on AWS. Document ingestion supports PDF, DOCX, and scanned images (via AWS Textract OCR). The layout-aware parser, built with spaCy and custom heuristics, segments contracts into structural components. The DeBERTa-v3 clause classifier runs on GPU-backed SageMaker endpoints and tags each section with clause type and confidence score. Classified sections are then passed through the risk analysis layer, which scores language against the customer's playbook using a combination of semantic similarity (sentence-transformers) and rule-based checks. Finally, the GPT-4 Turbo explanation layer generates plain-language summaries and risk explanations, grounded via RAG with the source contract text stored in Pinecone. The entire pipeline processes a 30-page contract in under 5 minutes. The frontend is a React application with a split-pane interface: AI analysis on the left, source contract with highlighted clauses on the right.
Results
- 60x faster contract review (average 5 hours down to 5 minutes per contract), validated across 4,200 contracts reviewed by beta customers in the first 3 months
- 99.1% clause detection accuracy on the 47-category taxonomy, independently validated against attorney annotations on a held-out test set of 600 contracts
- Feature became Adobe Document Cloud's primary enterprise differentiator - adopted by 34 enterprise customers within 9 months of launch
- Drove a measurable increase in Document Cloud enterprise subscriptions, with the AI engine cited as the key differentiator by customers
Key Insight
Using an LLM for everything is the most expensive path to mediocre accuracy - the best results came from using a purpose-trained classifier where precision matters and reserving the LLM for tasks that genuinely require language generation.
“We spent four months trying to make GPT-4 work as our entire pipeline and couldn't get past 78% accuracy. They rebuilt the architecture in ten weeks and hit 99.1%. The key insight - using the right model for each task instead of forcing one model to do everything - seems obvious in hindsight, but it changed our entire product trajectory.”
EGEly Greenfield
CTO & SVP at Adobe