🧬 DIA AI Knowledge Platform
Global AI terminology harmonised from FDA, ISO, EMA, MHRA, IMDRF and other authoritative sources — for life sciences and healthcare regulatory professionals.
Concept Grouping Methodology
🎯 Why This Domain Structure?
The platform uses 6 balanced domains (25–55 concepts each) rather than the earlier 7-domain structure, which had Machine Learning (51 concepts) and Deep Learning (8 concepts) as sibling domains — creating imbalance and overlap. The new structure follows ISO 25964 guidance that top-level categories should be mutually exclusive, collectively exhaustive, and broadly comparable in scope.
Deep learning architectures, NLP, and generative AI are now grouped under Neural Architectures & Generative AI. Trust and governance is kept as a focused cross-cutting domain of 7 core concepts rather than being diluted.
❓ Domain Assignment Protocol — 6 Questions + 1 Cross-Cutting Rule
There are 6 domains and 6 decision questions — one per domain. Answer them in order; the first "yes" assigns the concept to that domain. A seventh rule handles concepts that genuinely span multiple domains.
Why 7 is mentioned historically: the cross-cutting rule was originally listed as "Question 7" but it is not a domain selector — it is a placement rule that applies after the domain is decided. The protocol has always mapped to exactly 6 domains.
📏 Concept Creation Rules
- Scope test: A concept must be definable in 1–3 sentences without requiring another undefined concept
- Granularity: Not too broad ("AI") and not too narrow ("ReLU activation function"). Target: level 2–3 in a hierarchy
- Preferred label: Use the most widely-used name from ISO or FDA. Where the same concept has variant names across sources (e.g. "Continual" vs "Continuous" Machine Learning), record the preferred term in the spreadsheet Term column; alternative names appear in Related Terms
- Minimum one source: Every concept needs at least one authoritative source (ISO, FDA, EMA, MHRA, IMDRF, ITU). DIA Consortium definitions are permitted as a supplement, not a sole source
- Domain balance target: Each domain should have 20–70 concepts. Flag for editorial review if a domain exceeds 70 or falls below 15
- Multi-source = one concept: If two sources define the same idea differently, that is one concept with multiple definitions — not two separate entries. The first definition listed is marked ★ Preferred in the UI. Record alignment degree in the Related Terms field (skos:exactMatch / skos:closeMatch notation)
- Source spreadsheet is authoritative: All edits must go through the source
.xlsxfile first; theglossary_data.jsbuild output (same shape asknowledge_graph_data.js) is regenerated from it, never edited directly
🔗 Relationship Vocabulary (SKOS + DIA extensions)
Relationships are recorded in the Related Terms field of the source spreadsheet and displayed as clickable chips in the Concept Detail panel. The table below defines the vocabulary to use when populating that field.
Currently implemented — rendered in the UI
| Notation | Meaning | How it appears |
|---|---|---|
| skos:related | Concepts are conceptually linked (default — use for most entries) | Clickable chip in Related Terms; clicking navigates to that concept |
| skos:exactMatch | Same concept defined by two different sources with near-identical meaning | Both definitions appear side-by-side under one concept card; first is ★ Preferred |
| skos:closeMatch | Near-equivalent concepts with meaningful nuance between sources | Both definitions visible in concept detail; distinction noted in source attribution |
Roadmap — defined for future implementation
| Notation | Intended use | Example |
|---|---|---|
| skos:broader | Concept is a sub-type of another (hierarchical parent) | CNN → broader → Neural Network |
| skos:narrower | Concept has more specific sub-types (hierarchical children) | ML → narrower → Supervised ML |
| dia:usedIn | Concept is a component or technique used within another concept | Differential Privacy → usedIn → Federated Learning |
| dia:prerequisite | Concept A must be understood before concept B | Neural Network → prerequisite → CNN |
| dia:regulatoryBasis | Links a concept to a specific regulatory document or article | Validation → regulatoryBasis → FDA 21 CFR Part 11 |
The dia: namespace extensions use OWL-Lite semantics and will enable machine-readable inference once a SPARQL/RDF export layer is added. Until then, record them in the Related Terms field as plain text annotations (e.g. dia:prerequisite: Neural Network) so the intent is preserved for future tooling.
⚖️ Standards Basis
- W3C SKOS: Primary standard for the domain taxonomy and inter-concept relationships
- ISO 25964: Thesaurus standard — governs synonym handling (
skos:altLabel) and cross-reference policies - ANSI/NISO Z39.19: Controls vocabulary construction — term selection, scope notes, hierarchy depth (max 3 levels recommended)
- OWL-Lite: Used for the
dia:relationship extensions requiring machine inference - ISO/IEC 22989: AI terminology standard — preferred source for AI concept definitions
- ISO/IEC 42001: AI Management System — governance framework referenced throughout