This document describes the AI system that powers The Insurance Professor's consumer-facing responses, the constraints under which it operates, and the failure modes its operators acknowledge openly. It is published as a companion to the Corpus Accuracy Report.
Q2 2026 Model Card · Published May 2, 2026 · Updated quarterly
The Insurance Professor is a consumer education platform for insurance policyholders. It is not a licensed insurance producer, agent, broker, adjuster, or attorney. It does not sell, recommend, or place insurance products. It does not issue legal opinions. It explains regulatory and policy concepts in plain language, grounded in public-domain state regulatory content.
A user submits a question through the chat interface. The system retrieves relevant content from a curated corpus of public-domain regulatory materials, passes the user's question and the retrieved content to a large language model running locally, and delivers the model's response back to the user — after the response passes through a verification layer described in section 5.
The platform operates from the principle that consumers benefit from being able to ask plain-language questions about insurance and receive plain-language explanations grounded in real regulatory text — but only if those explanations cite real text, do not direct the consumer toward specific products or actions that constitute the practice of insurance or law, and openly disclose what they cannot do.
mlx-community/Llama-3.3-70B-Instruct-4bit). Open-weights model used under Meta's Llama 3.3 Community License. The 4-bit quantization reduces memory footprint at a measurable cost in raw model accuracy compared to the full-precision base; the platform's evaluation harness measures performance on the quantized variant directly so that internal benchmarks reflect what users actually receive.Public-domain state insurance statutes, administrative rules, and consumer guides; the platform's own educational explainer content; and question-answer pairs constructed by the founder from publicly available reference materials drawn from the insurance industry.
All source materials used to construct training pairs were publicly available at the time they were accessed. The source materials were used as input from which question-answer training pairs were constructed in the founder's own words; the original source materials are not present in the fine-tune dataset in their original form. The deployed model is designed not to reproduce source materials, and the platform's voice and citation verification layers (section 5) further constrain outputs. The platform's operators intend to test the deployed model empirically against the question of source-material reproduction and will document the results in a future quarterly cycle.
The platform relies on the transformative-use doctrine of fair use as that doctrine has been applied to AI training in two recent federal decisions: Bartz v. Anthropic PBC (N.D. Cal. June 23, 2025, Judge Alsup) and Kadrey v. Meta Platforms, Inc. (N.D. Cal. June 25, 2025, Judge Chhabria). Both courts held, on the facts before them, that training large language models on copyrighted reference materials was “highly transformative” and constituted fair use, with Judge Alsup describing the use as “quintessentially transformative.”
The platform's use of publicly available reference materials in fine-tuning is consistent with the factual posture that produced fair-use findings in those cases:
The platform also does not maintain a permanent “shadow library” of source materials beyond what is required for the training process itself.
Conversations are stored in the platform's database for the purpose of providing chat history to authenticated users; they are not exported, transmitted to third parties, or used to update any model. If a customer requests deletion, their conversation history is deleted from the database.
No fine-tune iteration of the model is or will be trained on customer conversations without explicit, opt-in consent from the customers whose conversations would be used and a clear public disclosure of that policy change.
A consumer submits a question. The platform's flow:
This flow is referred to internally as RAG (retrieval-augmented generation). It exists specifically to ground the model's outputs in real regulatory text, rather than relying on the model's own potentially-stale or potentially-fabricated knowledge of state insurance law.
The platform applies three independent constraints to model outputs before delivery. These mechanisms are described in detail in the corpus accuracy report; this section summarizes them.
The platform's operators acknowledge the following failure modes openly: