NYU / Rutgers University · Finance
SECBert
A BERT model pre-trained on SEC EDGAR filings for understanding regulatory financial documents and compliance text.
Overview
SECBert is a domain-specific BERT model trained on a large corpus of SEC EDGAR filings including 10-K, 10-Q, and 8-K documents. The model captures the unique language patterns, legal terminology, and regulatory conventions found in securities filings. It is particularly effective for tasks involving regulatory document analysis, compliance monitoring, and risk factor extraction from public company disclosures.
Parameters
110M
Architecture
BERT-Base
Training Data
SEC EDGAR filings (10-K, 10-Q, 8-K)
Context Window
512 tokens
License
Research use
Capabilities
SEC filing analysis and classification
Risk factor extraction from 10-K filings
Regulatory compliance text understanding
Financial entity recognition in SEC documents
Filing section identification and parsing
Use Cases
Automated analysis of 10-K risk factor disclosures
Monitoring SEC filings for material changes in company disclosures
Extracting key financial metrics from quarterly filings
Compliance document review and classification
Pros
- +Specialized in regulatory financial document understanding
- +Captures SEC-specific language patterns and conventions
- +Effective for compliance and risk analysis workflows
- +Lightweight deployment requirements
Cons
- -Narrow focus on SEC filings limits broader financial use
- -512-token context insufficient for full filing sections
- -Encoder-only; cannot generate regulatory text
- -Limited to U.S. SEC regulatory documents
Pricing
Free for research use. Available on Hugging Face. Lightweight enough to run on standard compute infrastructure.