There are two valid paths from the Source Layer into a Semantic Model. The correct path is determined by matching the reporting requirement against the source refresh cadence.
Import via Dataflow is correct when the reporting requirement can be satisfied by data current as of the last scheduled refresh. This covers the large majority of enterprise reporting.
DirectQuery direct to source is correct only when both conditions are true: the requirement is demonstrably near-real-time, and the source updates more than once per day. If only one condition is met, Import is the answer. Dataflows do not support DirectQuery passthrough — DirectQuery always means a Semantic Model connecting directly to the Source Layer.
A dedicated master Dataflow contains all slow-changing enterprise reference tables consumed across multiple models. No workstream builds its own version of these tables.
Contents include: Dim Calendar, Dim Time, Exchange rates, Org hierarchy, User/RLS security mapping, Status & code lookups, Dim Geography, project & workstream metadata.
No workstream builds its own.
A Dataflow is a shared enterprise asset. The design goal is centralised, purpose-built Dataflows that serve multiple models across multiple workstreams.
Before adding a new Dataflow entity, check whether an existing one already surfaces that data. Dataflow sprawl creates the same inconsistency problems at the ETL layer as model sprawl at the model layer: the same table in multiple Dataflows with subtly different shapes and no single point of correction.
Dataflow sprawl is model sprawl at the ETL layer.
Import mode brings data into the model's in-memory store. Its performance advantages depend entirely on what is being imported. A model that pulls full transaction history at the lowest grain — all years, all products, all geographies — is not leveraging Import mode; it is fighting it.
Scope the import deliberately: pull only the columns required for reporting, apply the tightest viable row filters, and where history is needed, aggregate at the source before loading. As an example: last 7 days at daily grain, weeks 2–13 at weekly snapshot, beyond 13 weeks at monthly snapshot. The right tiers depend on the reporting requirement. Aggregate at source — do not import raw history at full grain.
Aggregation happens at source. Not in Power BI.
Incremental refresh partitions a table into a rolling window that refreshes on every cycle and a historical archive that does not. When a model reaches the point where full refresh becomes a bottleneck — consuming excessive capacity, blocking other workloads, or exceeding acceptable refresh windows — incremental refresh is the designed solution.
The trigger is operational, not size-based. A model refreshing in 8 minutes on available hardware does not need incremental refresh regardless of row count. A model taking 3 hours and blocking downstream workloads does. More hardware is sometimes the answer — but incremental refresh should be evaluated before scaling capacity, because it solves the problem at the data design level rather than the infrastructure level.
Reports connect to semantic models via Live Connection only. They carry no embedded model, no transformation logic, and no direct source connections. A report is a presentation layer — it arranges certified data into a visual narrative for a specific audience.
Because reports connect via Live Connection, they never need to be refreshed. The moment the semantic model refreshes, every report built on it reflects the updated data automatically. Reports are always current. Any exception to the thin report pattern should be reviewed and understood by the developer before being accepted as a deliberate design choice.
Exceptions to the thin pattern are valid when understood and deliberate.
Report-specific calculations live in a dedicated table — named, for example, Report Measures — and carry a distinguishing prefix — for example (R). The specific names are examples; the team should agree on a convention and apply it consistently. Model objects carry no prefix — they are canonical and certified. This convention allows any developer to immediately identify what is local to this report versus what is part of the certified model.
Report-local measures are a valid pattern for presentation-layer calculations — formatting, report-specific comparisons, visual-specific aggregations. They are not a pathway to embed business logic that belongs in the model.
Report-local measures: dedicated table + distinguishing prefix (names are examples — agree a team convention).
Before a single visual is placed on a canvas, the following decisions must be documented: the report's primary question, its intended audience, the grain of the data it presents, the key metrics it must surface, and the interaction model users will follow. Layout follows from these decisions.
Reports built without upfront design decisions are built around what the data contains rather than what the audience needs. The result is a canvas full of visuals that answers no clear question. Redesigning after layout is expensive. Designing before layout costs nothing.
All decided before the first visual is placed.
Including a dedicated page in every published report — named, for example, About This Report — is strongly recommended. It serves as the report's single source of supporting information and is worth including as a DEV → UAT checklist item; the payoff is significant when reports are handed over or revisited months later.
For operational reports where users may screenshot or export individual pages, a lightweight Data as of [timestamp] label is included on each relevant page. This handles the data currency requirement without cluttering the main canvas.
A single enterprise theme JSON file defines the visual language for all reports in the programme: colour palette, font family, font sizes, background colours, visual border styles, and default visual properties. Every report imports and applies this theme file. No developer overrides theme settings manually at the visual level.
Without a centralised theme, ten developers produce ten visual styles. Users cannot tell whether two reports belong to the same programme. Visual inconsistency destroys trust faster than data inconsistency because users see it immediately. The theme file should live in a shared location accessible to all developers — the Git repository is the natural choice when reports are already version-controlled there, but a shared drive or designated folder are valid alternatives, provided access is consistent and the file is not duplicated.
Reports are built using Microsoft's native visual library by default. Third-party visuals from AppSource are maintained by independent developers, not Microsoft. They update on their own schedule, can break after Power BI Desktop updates, run code inside the report, and introduce external dependencies into certified production content.
If a genuine business requirement cannot be met by the native library, a custom visual may be proposed to the CoE for evaluation. The CoE assesses stability, certification status, update cadence, and security posture before approving. Approved visuals are added to a programme-level approved list. Nothing outside that list enters a certified report.
Accessibility is not a post-launch checklist item — it is part of the design standard applied from the first visual placed on a canvas. Every certified report must meet four enforceable requirements verifiable in Power BI Desktop without external tooling.
Every visual carries a meaningful title and alt text — these are the primary mechanism by which screen readers communicate chart content to users who cannot see the canvas. No information is conveyed by color alone — icons, labels, or patterns accompany color coding so that the report is fully readable for color-blind users. Contrast ratios meet WCAG 4.5:1 minimum, which is enforced structurally through the enterprise theme file (R-05) rather than checked visual-by-visual. Tab order is set deliberately in the Selection Pane, defining a logical reading sequence for keyboard navigation users.
These four requirements are UAT checklist items alongside the security design (M-04) and the About page (R-04). A report that fails any of them is not eligible for promotion.
Verified at UAT. Failure blocks promotion.
Business rules. Joins. Aggregations.
Power BI never rewrites what the source defines.
path
DirectQuery
Connection
Documented · CoE approved · Logged
All business logic — joins, aggregations, calculations, cleansing, and rule application — belongs in the source platform. Power BI is a consumption and presentation layer. Complex M or DAX that replicates upstream logic is a defect, not a solution.
In enterprise implementations, the source must be a governed data platform — a database, lakehouse, warehouse, or equivalent system with ownership, access control, and auditability. Files are not an enterprise source. Data arriving as files must be loaded into the source platform before Power BI touches it. Dependency on files as a data source is one of the most reliable ways to destroy a data organisation.
Load them into a governed platform first.
The architecture enforces a three-layer sequence within the Power BI perimeter: ETL, Modelling, Visualization. Each layer has a defined role. Dataflows add governed ingestion and shape. Semantic models add business logic, security, and reuse. Reports add presentation.
Skipping a layer is permitted in named, bounded circumstances — for example, a report reading directly from a Dataflow for a single purpose with a small audience. But skipping means explicitly giving up what that layer provides: reuse, RLS, certified metrics, and the finite-to-infinite principle. That trade-off must be acknowledged, justified, and logged.
Skipping a layer is a documented trade-off, not a shortcut.
Before any new semantic model is built, developers check the CoE model register via the Data Hub. Model sprawl — every workstream building its own version of the same data with subtly different metric definitions — is the most common failure in large Power BI deployments.
Semantic models are scoped to a business domain — Finance, Supply Chain, HR — not to a workstream, team, or report. A model scoped narrowly to one workstream's current needs will be duplicated the moment the next workstream needs similar data. Domain scoping prevents this by design.
Domains are defined by the CoE before the first model workspace is created. Domain definition cannot be deferred.
Every semantic model implements a strict star schema. Relationships are one-to-many from dimension to fact. Filters flow one way only: dimension to fact. Dimension tables contain unique, complete values. Fact tables are grain-consistent.
Many-to-many relationships should be avoided — they are a primary source of model performance problems and incorrect DAX results. Bidirectional filters are permitted when the developer understands their impact on filter propagation and has validated that the behaviour is intentional. If a many-to-many situation appears to be required, the correct first response is to review whether the data model can be redesigned to avoid it.
Row-Level Security and Object-Level Security are defined in the semantic model. Reports inherit these controls automatically through Live Connection. Security is never compensated for by hiding pages, filtering at report level, or restricting visual access.
Every model must have a documented security design before promotion to UAT. RLS is validated with test users before any PROD deployment. The User/RLS mapping table from the Master Dataflow is the authoritative source for role membership — it is never hardcoded inside a DAX rule.
Where a model contains PII fields, the security design must explicitly document which roles can see which fields, and the RLS design must enforce those boundaries at the model layer.
PII access boundaries enforced at the model layer.
A finite number of certified semantic models supports an unlimited number of reports. This is the stated architectural goal — not a consequence to be discovered later, but a design target held from the start. Every report built on a certified model inherits its security, grain, metric definitions, and refresh.
The model is the investment. Reports are the return on it. A team that builds one well-designed certified model and builds twenty reports against it has achieved something. A team that builds twenty models for twenty reports has built twenty maintenance obligations.
Certified once. Consumed without limit.
The semantic model diagram is not a byproduct of the order in which tables were added. It is a designed, maintained artefact that communicates the model's structure at a glance. Any developer joining six months in should be able to audit relationship direction, verify grain, and understand filter flow without clicking through individual objects.
An unmanaged diagram is a maintenance liability. When tables accumulate organically, relationships cross each other, filter direction becomes ambiguous, and schema reviews require significant investigation. The diagram must be updated whenever a table or relationship is added or changed.
auditable without clicking. Updated with every structural change.
Every certified semantic model includes a mandatory companion report that documents the model for developers and analysts who consume it. The companion report is a certification requirement.
A dedicated validation workspace — [Program] | Models | Validation — contains cross-environment comparison reports, one per certified model, that surface differences between DEV, UAT, and PROD: record counts, key measure values, and row-level samples. This workspace has no environment tier because it is a cross-environment tool, not a stage in the promotion chain.
Before any PROD promotion, the validation report must confirm alignment across five standard checks: row counts reconcile to the Databricks Gold source; all RLS roles have been tested with named test accounts; no broken, inactive, or unapproved many-to-many relationships exist; the model has completed at least one successful refresh in DEV; and the companion report (M-07) renders without errors in the DEV workspace.
All five verified before UAT promotion. Unexplained divergence blocks promotion.
Every measure in a certified semantic model must have a Description property populated in plain business English. The description states what the measure calculates, what grain it operates at, what filters it assumes, and how it should be interpreted. Measures without descriptions are not eligible for certification.
A developer inheriting a model with undescribed measures cannot confidently reuse or extend them. They will either misuse them or duplicate them — both outcomes undermine the principle that a certified model is the single source of truth.
A certified semantic model is the foundation for self-service analytics — users connecting via Excel, Analyze in Excel, or Explore This Data should encounter a clean, easy-to-explore field list that makes sense to a business user. Foreign keys, internal sort columns, technical IDs, bridge table columns, and any field that exists only to support model mechanics must be hidden from the report view.
The test is simple: if a business user connecting to this model in Excel would be confused or misled by seeing a field, it should not be visible. The visible field list is the model's public API. It should be curated with the same care as the measures themselves.
The field list is the model's public API.
Everything is built in .pbip format and stored in Git. TMDL files define the semantic model — each table, role, and perspective in a separate plain-text file. PBIR defines the report layout — each page and visual in its own file. Both formats are human-readable, Git-diffable, and reviewable in a pull request. If Git and the Power BI Service disagree, Git wins.
A .pbix file is a binary — it cannot be diffed, reviewed, or meaningfully version-controlled. Its use is incompatible with this framework.
Access control enforces the promotion discipline. Developers have Admin rights in DEV workspaces, Contributor in UAT, and Viewer in PROD. Promotion is handled by Deployment Pipelines, not by manual republishing.
DEV → UAT is self-serve for developers within pre-approved conditions. UAT → PROD is admin-only, on a defined schedule, and requires documented business sign-off. Once a report is in PROD, every change to it passes through the full DEV → UAT → PROD chain. There are no hotfix exceptions that bypass the chain without admin approval and documentation.
Pipelines promote. Developers do not.
Changes to semantic models and reports should go through a feature branch and a pull request with at least one reviewer — including small fixes, measure renames, and visual adjustments. Establishing this habit from the first day of development, rather than introducing it at go-live, makes it a natural part of the workflow rather than an overhead.
In production, a direct commit that bypasses review is not just a process violation — it is a live risk. Reports in PROD are being read by real users making real decisions. The pull request is the last line of defence before a change reaches them. Feature branches are short-lived. A branch open for more than two to three days without a merge needs a conversation.
This applies from day one. Not from go-live.
A Power BI .pbip project generates files that must never enter the Git repository. These files are local, machine-specific, or contain cached data — committing them causes merge conflicts, bloats the repository, and can expose sensitive configuration or credentials to every team member.
Power BI Desktop creates a default — do not delete or override it.
All workspaces follow the pattern [Program] | [Layer] | [Domain] | [Environment]. Spaces around pipe characters. Proper Case throughout. No abbreviations, no shorthand, no personal naming choices.
Consistency is functional, not aesthetic. A developer or admin joining six months in should be able to navigate the workspace list without a guide. Inconsistent naming creates shadow workspaces, orphaned content, and governance gaps that accumulate silently.
Refresh schedules, orchestration chains, and failure handling are designed before the first model reaches UAT — not retrofitted after PROD deployment. The default pattern is a staggered Import refresh chain: source job completes → triggers Dataflow refresh → Dataflow completion triggers Semantic Model refresh via API or Power Automate.
Domain Dataflows run after their upstream source job completes. Semantic Models run after their Dataflow completes. Reports are always current via Live Connection and have no refresh schedule of their own.
Reports never refresh — they are always current.
Capacity assignments are reviewed quarterly against usage data from the CoE monitoring report. Over-provisioned DEV capacity wastes budget. Under-provisioned PROD capacity degrades performance for users. Neither is acceptable as a steady state.
The CoE maintains a capacity monitoring report covering: refresh duration trends, query performance by workspace, peak vs off-peak usage, and capacity headroom. Decisions to scale up or reassign capacity are data-driven, not reactive.
Capacity assignment is a deliberate design decision made at workspace creation. Development and production workloads never share capacity. DEV runs on lower SKUs than PROD. The ETL workspace requires Fabric capacity — Pro is not acceptable for production ETL workloads. Deployment Pipelines require Fabric capacity on all three workspace tiers; this is a hard Microsoft constraint and must be accounted for before the first pipeline is created.
ETL: Fabric capacity required. Pipelines require Fabric capacity on all three tiers.
Every production semantic model has a configured refresh failure alert delivered to a named owner within minutes of failure. Refresh failures are silent by default in Power BI unless alerts are configured. In production, a failed refresh means users are working with stale data — and in most cases they do not know it.
A refresh failure is not a background event to be discovered during the next manual check. It is an incident. It has a named owner. It has a response expectation. The alert is the trigger that starts the clock.
Silence is not an acceptable default in production.
Reports and models outlive their creators. Without active lifecycle management, the environment accumulates artefacts that nobody uses — but that still consume capacity, still refresh on schedule, still appear in the Data Hub, and still create the impression that they are current and trusted.
The CoE conducts quarterly R.O.T.C. reviews using usage metrics from the Admin API. Content with zero active users for 90 days is flagged. Owners are notified. Content is either recommissioned with a new owner or retired. Retired content is archived, not deleted — it can be recovered if needed.
Archive, don't delete. Ownership transfer before retirement.
The Certified endorsement is applied by CoE admins only, after a formal review against a defined checklist: schema, naming conventions, security design, performance, documentation, metric descriptions, companion report, and ownership registration. Promoted is used for stable workstream-level content not yet ready for enterprise certification.
Certified model descriptions in the Data Hub include: domain, consuming workstreams, grain, data range, owner name, and Git repository link. The Data Hub is the discovery surface — developers check it before any new model build.
Promoted: stable but not enterprise-certified. Uncertified: workspace-only.
All source connections use shared service account tokens managed by the CoE. No personal credentials appear in any connection in any environment. Credentials are stored at the gateway or Dataflow level only — never in .pbip files, never committed to Git.
When a developer leaves the team, no connection breaks. When a service account token is rotated, it is rotated in one place and propagated everywhere. Personal credentials create invisible single points of failure.
Personally Identifiable Information — names, IDs, contact details, location data, or any field that could identify an individual — may appear in reports where there is a documented business use case and appropriate access controls. Its presence is not prohibited; its presence without documentation and without enforced access controls is.
The security design for any model containing PII must explicitly identify which fields are PII, which RLS roles can see them in full, which roles see them masked, and the legal or regulatory basis for each access grant. GDPR and equivalent regulations are constraints, not context — they shape the security design before any data is loaded.
GDPR is a constraint on design.
The CoE maintains an Ownership Register — a live document listing every certified model and report, its current designated owner, and the date ownership was last confirmed or transferred. Every certified artefact has exactly one named owner at all times.
Ownership is a role, not a permanent assignment. An owner can hold multiple models. An owner can transfer models when they change roles or leave the team. The CoE processes ownership transfers formally — no certified artefact is ever left unowned, and ownership is never assumed to be shared between multiple people. Shared ownership is unowned ownership.
Ownership transfers formally.
Any deviation from a principle in this framework — whether architectural, structural, or procedural — must be requested in writing, reviewed by the CoE, approved or rejected with a documented rationale, and logged in the exceptions register. Informal workarounds discovered after the fact are not exceptions — they are violations.
In a multi-developer team, an undocumented exception made by one developer becomes precedent for every developer who follows. The framework erodes principle by principle, each deviation enabled by the last. The exception process exists not to block legitimate needs but to ensure that every deviation is a conscious choice made by someone accountable for it.
Documentation has two distinct obligations before DEV → UAT promotion.
The repository obligation: every .pbip repository contains a CHANGELOG.md updated on every pull request. Format: date | developer | change type (New / Enhancement / Fix / Breaking) | plain business English description of what changed and why. A completed changelog entry is mandatory before DEV → UAT promotion. Git history is the technical record. The changelog is the human record. Both are required.
The in-model obligation: the semantic model itself is a documented artefact. Every measure has a Description property populated in plain business English. Every table has a description. Display folders organise the field list into logical groupings. Foreign keys, bridge columns, and any field that exists only to support model mechanics are hidden from the report view. The visible field list is the model's public API and must be curated accordingly. A developer joining six months in should be able to navigate the model and understand every visible object without asking anyone.
Both are required.
The standards document, naming conventions, Git workflow guide, onboarding checklist, domain register, exception request process, and ownership register all live in /docs in the main repository. Git is the single source of truth for documentation as well as code.
A new developer onboarding to the project follows one step: clone the repository and read /docs/onboarding.md. If that document is not sufficient to get them productive, it is incomplete — not the developer's fault.
If it is not in Git, it does not officially exist.