NIST AI RMF Model Card Generator

Fill out the structured form and export a complete, NIST AI RMF-aligned model card as Markdown, HTML, or PDF. No AI required — just a smart form and a clear template.

■ GOVERN ■ MAP ■ MEASURE ■ MANAGE Free & No Login Saves to Browser
Model Card Completeness
0% ⚠ Fill required fields
Quick-fill template:
Saved!
1
Model Details
2
Intended Use
What is this model designed to do? Be specific about tasks and domains.
What should this model NOT be used for?
3
Factors
Which demographic or user groups are relevant to model evaluation?
Where and how will this model be used?
4
Metrics
Which metrics are used to evaluate model performance?
What thresholds are used for classification or action decisions?
5
Evaluation Data
6
Training Data
7
Quantitative Analyses
Overall performance across the full evaluation dataset.
Performance broken down by relevant subgroups. Include fairness/disaggregated metrics.
8
Ethical Considerations
9
Caveats and Recommendations
10
NIST AI RMF Actions Completed

Check all NIST AI RMF actions your team has completed for this model.

Live Markdown Preview

Why Model Cards Matter + NIST AI RMF Overview

Model cards, introduced by Mitchell et al. (2019) and adopted by Google, Hugging Face, and federal agencies, provide a standardized way to document AI systems for transparency and accountability. The NIST AI Risk Management Framework (AI RMF 1.0) organizes AI risk into four functions that align directly with model card sections.

GOVERN

Establishes organizational practices, accountability structures, and risk policies. Covers roles, culture, and policies for responsible AI development and deployment.

MAP

Categorizes AI risks in context. Identifies intended use, stakeholders, environmental assumptions, and potential negative impacts before deployment.

MEASURE

Quantifies AI risks using metrics, testing, and evaluation. Includes bias/fairness measurement, uncertainty quantification, and performance benchmarking.

MANAGE

Prioritizes and mitigates identified risks. Covers incident response, monitoring, decommissioning plans, and ongoing risk treatment across the model lifecycle.

Key references: NIST AI RMF 1.0Google Model Cards (Mitchell et al.)NIST AI RMF PlaybookHugging Face Model Cards Guide

Related Compliance Tools

`; downloadFile(name + '-model-card.html', html, 'text/html'); } function escapeHtml(s){ return s.replace(/&/g,'&').replace(//g,'>'); } // ─── Clipboard ──────────────────────────────────────────────────────────────── function copyToClipboard(){ const d = getData(); navigator.clipboard.writeText(buildMarkdown(d)).then(() => { const btn = document.querySelector('.preview-copy'); btn.textContent = 'Copied!'; setTimeout(() => btn.textContent = 'Copy', 2000); }); } // ─── Save / Load ────────────────────────────────────────────────────────────── function saveToStorage(){ localStorage.setItem('nist-model-card', JSON.stringify(getData())); const fb = document.getElementById('saveFeedback'); fb.classList.add('show'); setTimeout(() => fb.classList.remove('show'), 2000); } function loadFromStorage(){ try{ const raw = localStorage.getItem('nist-model-card'); if(raw) setData(JSON.parse(raw)); } catch(e){} } // ─── Quick fill templates ───────────────────────────────────────────────────── const TEMPLATES = { 'image-classifier': { modelName:'FedImageClassifier', modelVersion:'v1.0.0', modelDate:'2026-04-11', modelType:'Image Classification', modelLicense:'Apache 2.0', modelContact:'[email protected]', modelOrg:'Federal AI Division', primaryUses:'Classify facility imagery into 8 infrastructure categories (road, bridge, building, fence, vehicle, personnel, equipment, other) to support base operations and security personnel.', primaryUsers:'Base security officers, facility managers, infrastructure assessment teams', outOfScope:'Not for facial recognition or personnel identification. Not for real-time surveillance. Not for use outside continental US military installations without re-evaluation.', factorGroups:'Day vs. night imagery, seasonal variation, camera resolution tiers (HD/4K/legacy)', factorEnvs:'Edge deployment (NVIDIA Jetson), GovCloud inference endpoint, ATAK tablet integration', factorInstrumentation:'JPEG/PNG images (minimum 640×480), RGB and thermal IR channels', perfMeasures:'Top-1 Accuracy, [email protected], Precision, Recall, F1-score', decisionThresholds:'Confidence ≥0.80 for auto-alert. 0.60–0.79 flagged for review. Below 0.60 discarded.', uncertaintyApproach:'Monte Carlo dropout ensemble (n=10). Confidence intervals reported with each classification.', evalDatasets:'InstEval-2024 (n=8,400 images, 8 classes, held-out 20%)', evalMotivation:'Covers all 8 infrastructure categories with balanced class distribution and edge-case oversampling.', evalPreprocessing:'Normalization (ImageNet mean/std), resize to 512×512, augmentation disabled during eval', trainDatasets:'InstTrain-2023 (n=62,000 images), public COCO subset (infrastructure categories)', trainMotivation:'Large-scale training set covering diverse lighting, weather, and seasonal conditions.', trainPreprocessing:'Random crop, horizontal flip, color jitter, resize to 512×512, ImageNet normalization', unitaryResults:'Top-1 Accuracy: 92.7% | [email protected]: 0.891 | Precision: 0.918 | Recall: 0.876 | F1: 0.897', intersectionalResults:'Day imagery: 94.1% | Night/IR: 88.3% | Winter (snow): 86.9% | Summer: 93.8%', stakeholders:'Security officers, base commanders, facility assessment teams, maintenance crews', dataSensitivity:'Imagery may contain sensitive facility layouts. All training data sanitized of personnel PII.', knownRisks:'Reduced accuracy in heavy fog or rain. Adversarial patch attacks not formally evaluated. Limited training data for legacy camera resolutions.', mitigations:'Human review required for all alerts. Quarterly retraining with new imagery. Red-team evaluation planned Q3 2026.', caveats:'Trained on CONUS installations only. Not validated in desert or arctic environments. English UI only.', recommendations:'Pair all outputs with human review. Re-evaluate before deploying to new installation types.' }, 'nlp-sentiment': { modelName:'GovSentimentAnalyzer', modelVersion:'v2.1.0', modelDate:'2026-04-11', modelType:'Sentiment Analysis', modelLicense:'MIT', modelContact:'[email protected]', modelOrg:'Digital Services Division', primaryUses:'Classify citizen feedback and survey responses into positive/neutral/negative sentiment categories to support service delivery improvement and constituent satisfaction tracking.', primaryUsers:'Program analysts, customer experience teams, policy officers, OMB performance analysts', outOfScope:'Not for individual constituent profiling or surveillance. Not for personnel performance evaluation. Not for legal or adjudicatory decisions.', factorGroups:'Age groups (18-35, 36-55, 56+), geographic regions (urban/rural/tribal), service channel (web/phone/in-person)', factorEnvs:'Cloud API (AWS GovCloud), batch processing pipeline, real-time dashboard integration', factorInstrumentation:'Plain text (UTF-8), 10–500 word responses, English-only', perfMeasures:'Accuracy, Macro F1, Precision, Recall per class, Confusion matrix', decisionThresholds:'Confidence ≥0.75 auto-labeled. Below 0.75 flagged for analyst review.', uncertaintyApproach:'Softmax probability calibration (Platt scaling). Low-confidence outputs surfaced to dashboard.', evalDatasets:'CitizenFeedback-Test-2024 (n=12,000), balanced across 3 sentiment classes and 6 service types', evalMotivation:'Reflects realistic distribution of citizen feedback across federal service programs.', evalPreprocessing:'PII redaction (names, SSNs, addresses), Unicode normalization, max 512 token truncation', trainDatasets:'CitizenFeedback-Train-2023 (n=85,000), SSA/VA survey responses (anonymized)', trainMotivation:'Multi-agency training corpus ensures generalization across diverse government service contexts.', trainPreprocessing:'PII redaction, deduplication, class balancing via oversampling, BERT tokenization', unitaryResults:'Accuracy: 89.3% | Macro F1: 0.881 | Positive Precision: 0.912 | Negative Recall: 0.874', intersectionalResults:'Urban respondents: 90.1% | Rural: 87.6% | Phone (transcribed): 85.2% | Web form: 91.0%', stakeholders:'Program managers, constituent services staff, OMB oversight, congressional oversight committees', dataSensitivity:'PII removed before processing. Individual responses not linked to identifiable constituents in outputs.', knownRisks:'Lower accuracy on colloquial or dialectal language. May misclassify sarcasm. Not validated on non-English input.', mitigations:'Analyst review for all negative-flagged feedback above threshold. Quarterly bias audits across demographic groups.', caveats:'English-only. Not validated on tribal language transcriptions. May underperform on very short responses (<10 words).', recommendations:'Use aggregate trends, not individual outputs, for policy decisions. Report confidence scores in all dashboards.' }, 'rag-system': { modelName:'FedKnowledgeAssist', modelVersion:'v1.3.0', modelDate:'2026-04-11', modelType:'Question Answering / RAG', modelLicense:'Proprietary', modelContact:'[email protected]', modelOrg:'Digital Transformation Office', primaryUses:'Retrieval-augmented generation system for answering policy and regulatory questions using agency knowledge base. Supports contracting officers, HR staff, and program managers in navigating complex federal regulations.', primaryUsers:'Contracting officers, HR specialists, program managers, legal counsel', outOfScope:'Not for generating binding legal interpretations. Not a replacement for legal counsel. Not for classified regulatory content. Not for PII-containing queries.', factorGroups:'User role (CO, HR, PM, Legal), document currency (current vs. superseded regs), query complexity', factorEnvs:'Azure Government cloud, Teams integration, web chat interface, API for enterprise tools', factorInstrumentation:'Natural language text queries (English), max 2048 tokens, 14 supported document corpora', perfMeasures:'Answer accuracy (human eval), ROUGE-L, citation accuracy, hallucination rate, latency p95', decisionThresholds:'Source confidence ≥0.80 required for direct answer. Below threshold returns "Consult policy office."', uncertaintyApproach:'Retrieval score thresholding + answer confidence scoring. Uncertain responses include disclaimer and source citation.', evalDatasets:'FedRegQA-2024 (n=2,000 Q&A pairs, human-curated from FAR/DFARS/CFR)', evalMotivation:'Expert-curated Q&A pairs covering the 20 most common regulatory inquiry types.', evalPreprocessing:'Document chunking (512 token overlapping windows), embedding normalization, metadata tagging', trainDatasets:'FAR (Federal Acquisition Regulation), DFARS, 5 CFR HR regs, agency policy memos (2020-2024)', trainMotivation:'Primary regulatory corpora covering 95% of typical CO and HR query topics.', trainPreprocessing:'PDF extraction, table parsing, section tagging, embedding via text-embedding-3-large, FAISS indexing', unitaryResults:'Answer Accuracy (human eval): 87.4% | Citation Accuracy: 94.1% | Hallucination Rate: 2.3% | p95 Latency: 1.8s', intersectionalResults:'FAR queries: 90.2% | DFARS: 85.7% | HR/5 CFR: 86.1% | Multi-document: 83.4%', stakeholders:'Contracting officers, HR staff, agency leadership, IG office, GAO oversight', dataSensitivity:'No PII stored. Query logs retained 30 days (encrypted). All corpora are unclassified, publicly available regs.', knownRisks:'May cite superseded regulation versions if corpus not updated. Hallucination risk on edge-case regulatory intersections. Not validated on classified or CUI regs.', mitigations:'Monthly corpus refresh. Human-in-loop for all contract award-related queries. Hallucination monitoring via automated citation verification.', caveats:'English-only. Corpus covers through Q4 2024 — post-2024 rule changes may not be reflected. Not legal advice.', recommendations:'Always cite outputs with source document and date. Never use as sole basis for contract award decisions.' }, 'llm-finetune': { modelName:'AgencyDocDrafter', modelVersion:'v0.9.0', modelDate:'2026-04-11', modelType:'Text Generation / LLM', modelLicense:'Proprietary', modelContact:'[email protected]', modelOrg:'Office of Digital Innovation', primaryUses:'Fine-tuned LLM for drafting standard federal documents including SOWs, memos, and acquisition plans. Reduces drafting time by 60% for acquisition and program office staff.', primaryUsers:'Program managers, contracting officers, administrative staff, grants management officers', outOfScope:'Not for generating classified or CUI content. Not for autonomous document signing or approval. Not for external-facing public communications without human review.', factorGroups:'Document type (SOW, memo, plan, report), agency (DOD, civilian), security level (unclassified only)', factorEnvs:'Azure Government OpenAI Service, Microsoft 365 Copilot integration, web interface', factorInstrumentation:'Natural language prompts + template parameters (text/JSON), English-only', perfMeasures:'ROUGE-L, BERTScore, human preference rate (Likert 1-5), factual accuracy (human eval), edit distance from final approved doc', decisionThresholds:'All outputs require human review and approval. No autonomous document submission.', uncertaintyApproach:'Temperature 0.3 for consistency. Users shown confidence indicators. Low-confidence sections flagged with review prompts.', evalDatasets:'FedDocEval-2024 (n=800 human-evaluated document pairs), DoD acquisition SME panel review', evalMotivation:'Expert evaluation by acquisition professionals covering 12 document types across 4 agencies.', evalPreprocessing:'Sensitive content redacted from evaluation set. Document anonymization applied.', trainDatasets:'Publicly available federal SOWs (SAM.gov), declassified acquisition plans (FOIA releases), agency-approved memo templates', trainMotivation:'Training corpus reflects authentic federal document language, structure, and compliance requirements.', trainPreprocessing:'PII/CUI scrubbing, format normalization, deduplication, RLHF preference data from acquisition SMEs', unitaryResults:'Human Preference Rate: 4.1/5.0 | ROUGE-L: 0.72 | BERTScore F1: 0.891 | Factual Accuracy: 94.3%', intersectionalResults:'SOW drafts: 4.3/5.0 | Memos: 4.0/5.0 | Acquisition Plans: 3.8/5.0 | First-time users: 3.9/5.0 | Expert users: 4.4/5.0', stakeholders:'Program staff, contracting officers, legal counsel, IG, privacy officers, GAO', dataSensitivity:'No PII processed. Prompts logged for safety monitoring (30-day retention). Users warned not to enter PII.', knownRisks:'May generate plausible but incorrect regulatory references. Hallucination risk on novel policy intersections. Fine-tuning data may introduce agency-specific biases.', mitigations:'Mandatory attorney/CO review before any document submission. Citation verification module. Quarterly fine-tuning with corrected examples.', caveats:'Not trained on post-2024 regulatory changes. May underperform on highly technical engineering SOWs. English-only.', recommendations:'Treat outputs as first drafts only. Always verify regulatory citations. Senior reviewer approval required for all submissions.' }, 'cv-detector': { modelName:'ThreatObjectDetector', modelVersion:'v3.0.1', modelDate:'2026-04-11', modelType:'Object Detection', modelLicense:'Proprietary', modelContact:'[email protected]', modelOrg:'Physical Security Division', primaryUses:'Real-time detection of prohibited objects (weapons, contraband, unattended bags) in security checkpoint imagery and video feeds at federal facilities.', primaryUsers:'Security screening officers, facility security managers, SOC analysts', outOfScope:'Not for facial recognition or biometric identification. Not for personnel surveillance beyond checkpoint screening. Not for use as sole basis for detention or law enforcement action.', factorGroups:'Lighting conditions (bright/dim/IR), object size (macro/micro), occlusion level, camera angle', factorEnvs:'Edge hardware (NVIDIA A2), CCTV integration, X-ray scanner API, real-time streaming (RTSP)', factorInstrumentation:'Video frames (JPEG/H.264), X-ray imagery (grayscale), thermal IR, minimum 1080p', perfMeasures:'[email protected], [email protected]:0.95, precision, recall per class, false positive rate, inference latency', decisionThresholds:'Confidence ≥0.90 triggers immediate alert. 0.75–0.89 flags for secondary officer review. Below 0.75 logged only.', uncertaintyApproach:'Ensemble of 3 model checkpoints. Bounding box uncertainty via IoU variance. All detections include confidence score.', evalDatasets:'SecEval-2024 (n=15,600 annotated frames), NIST Weapons Detection Benchmark v2', evalMotivation:'Covers full range of prohibited item categories and environmental conditions present at federal facilities.', evalPreprocessing:'Frame extraction at 5fps, annotation validation, class balancing, augmentation disabled', trainDatasets:'SecTrain-2023 (n=180,000 frames), synthetic X-ray augmentation data, publicly released threat object datasets', trainMotivation:'Large-scale training ensures robustness across diverse facility types, lighting conditions, and occlusion scenarios.', trainPreprocessing:'Mosaic augmentation, multi-scale training (416–640px), normalization, hard negative mining', unitaryResults:'[email protected]: 0.941 | [email protected]:0.95: 0.872 | False Positive Rate: 0.8% | p95 Latency: 28ms (edge GPU)', intersectionalResults:'Bright lighting: mAP 0.954 | Low light: mAP 0.911 | X-ray: mAP 0.937 | Small objects (<64px²): mAP 0.849', stakeholders:'Security officers, facility managers, privacy officers, legal counsel, civil liberties oversight', dataSensitivity:'Video feeds not retained beyond 24h rolling buffer. No biometric data processed. Training data sourced from law enforcement with MOUs.', knownRisks:'Higher false positive rate in crowded scenes. Novel/improvised threat items may evade detection. Adversarial camouflage not formally evaluated.', mitigations:'Dual-officer alert confirmation required. Monthly false positive audits. Annual red-team adversarial evaluation. Civil liberties review board oversight.', caveats:'Not validated for outdoor perimeter surveillance. Designed for checkpoint screening only. Not a substitute for trained security personnel.', recommendations:'Always pair automated detection with trained officer confirmation. Review false positive logs monthly. Do not use as sole basis for enforcement action.' } }; function quickFill(key){ const t = TEMPLATES[key]; if(!t) return; setData(t); // Open first few sections ['s1','s2','s3','s4'].forEach(id => { document.getElementById(id).classList.add('open'); document.getElementById('chev-'+id).classList.add('open'); }); } // ─── Validation on export ───────────────────────────────────────────────────── function validateRequired(){ let ok = true; REQUIRED.forEach(id => { const el = document.getElementById(id); if(!el) return; if(!el.value.trim()){ el.classList.add('invalid'); ok = false; setTimeout(() => el.classList.remove('invalid'), 800); } }); return ok; } // Override export to validate const origExportMd = exportMarkdown; window.exportMarkdown = function(){ if(!validateRequired()){ alert('Please fill in the required fields (marked with *) before exporting.'); return; } origExportMd(); }; // ─── Init ───────────────────────────────────────────────────────────────────── document.addEventListener('DOMContentLoaded', () => { loadFromStorage(); updatePreview(); // Auto-save every 60s setInterval(saveToStorage, 60000); });