Skip to main content

Data Lineage Graph

Overview

The lineage graph is a machine- and human-readable map of how data flows through your platform — from sources, through Bronze / Silver / Gold, into the Tabular model, including notebook-driven transformations. The EasyFabric Generator emits it as a set of small *-graph.json files (one per producer) that a bundled viewer and query script union into a single view by node id. Use it to answer "what feeds this table?", "where is this column used?", and "is this picture still up to date?" — for impact analysis, onboarding, or as grounding context for an LLM.

What gets generated

StageComponentOutput
BuildEasyFabric Generator (EFG)GenerateFabricObjectsGraph, GenerateModelGraph, GenerateFabricDatamartGraph, GenerateWiki*-graph.json + viewer.html + explain_graph.py

The graph is produced by calling the Generator API in graph mode (the x-output-type: graph request header) instead of normal SQL/item generation. Four files land side by side in the wiki output folder:

FileProducerWhat's in it
fabric-graph.jsonAPI (GenerateFabricObjects in graph mode)source / bronze / silver / history table nodes + ingest & lineage edges
model-graph.jsonAPI (GenerateModel in graph mode)model-object nodes + Reference (model) edges
datamart-graph.jsonAPI (GenerateFabricDatamart in graph mode)gold table nodes + gold → model-object edges
notebook-graph.jsonClient (WikiGenerator)notebook nodes + silver → gold lineage parsed from SQL

Every file shares the same canonical node-id convention, so an edge written by one producer resolves against a node written by another with no mapping. A partial build still works — if you only enabled the client wiki step you get notebook-graph.json alone, and the union simply covers fewer producers.

Generating the graph

Graph mode is opt-in per build. Add the graph build steps to your Generator build configuration (e.g. Generator/local.yaml); they run only when listed:

- BuildName: "DM - Fabric objects (graph)"
BuildType: GenerateFabricObjectsGraph
ConfigFile: "Dataplatform\\DP\\platform.local.json"
BuildIndex: 140

- BuildName: "DM - Model (graph)"
BuildType: GenerateModelGraph
ConfigFile: "Dataplatform\\DP\\platform.local.json"
BuildIndex: 150

- BuildName: "DM - Datamart (graph)"
BuildType: GenerateFabricDatamartGraph
ConfigFile: "Dataplatform\\DP\\platform.local.json"
BuildIndex: 160

GenerateWiki (the client wiki step) writes notebook-graph.json and copies viewer.html + explain_graph.py next to the graph files, so the folder is self-contained and ready to open or query.

Viewing the graph (viewer.html)

viewer.html is a self-contained, dependency-free web page that renders the merged graph as an interactive left-to-right medallion diagram (source → bronze → silver → gold → model). It has a search box, a per-node detail panel (columns, primary keys, upstream / downstream edges with their evidence), pan / zoom, and a Sources button listing which *-graph.json files contributed to the current view.

Because browsers block a page opened via file:// from reading the JSON fragments, serve the folder over HTTP — the query script does this for you:

python explain_graph.py serve            # serves the folder and opens viewer.html
python explain_graph.py serve --port 9000 --no-browser

Alternatively, open viewer.html directly and use the Load *-graph.json button — it is multi-select, so you can pick all of your graph files at once and the viewer unions them.

Querying the graph (explain_graph.py)

explain_graph.py reads every *-graph.json in its folder (override with --graph-dir), unions them, and runs the requested query against the merged view. Single-file invocation works too — pass --graph-dir path/to/one-graph.json.

python explain_graph.py sources                       # which files got merged
python explain_graph.py search Customer # substring search on id / name
python explain_graph.py info silver.dbo.customers # node detail + direct dependencies
python explain_graph.py find-column customer_id # tables/objects carrying that column
python explain_graph.py trace silver.dbo.customers --depth 5 # full upstream + downstream
python explain_graph.py warnings --query gold.D_Sales # filter unresolved-lineage warnings
python explain_graph.py status --dp <DP-folder> # freshness check — see below

Every command operates on the merged view, so a trace starting at a Silver table follows the notebook → Gold edges in notebook-graph.json and lands on gold tables declared in datamart-graph.json without you having to know which file holds which node.

Is this graph fresh?

Each *-graph.json carries an optional top-level metadata block, stamped by the client at write time, so a consumer can tell whether the graph still reflects your Dataplatform/DP before trusting it:

FieldNotes
generatedAtISO-8601 UTC timestamp of when the graph was written
sourceCommitbest-effort short git SHA of HEAD at write time; omitted outside a git repo
generatorproducer id — fabric-graph, model-graph, datamart-graph, or notebook-graph

Run the bundled freshness check, pointing it at your Dataplatform folder:

python explain_graph.py status --dp <path-to-Dataplatform>

It prints a FRESH / STALE / UNKNOWN verdict per file plus an overall verdict (showing each file's producer, age, and git SHA), and exits non-zero when anything is STALE. The staleness rule is intentionally simple and needs no API and no credentials — anything returned by

find <path-to-Dataplatform> -type f -newermt "<generatedAt>"

means a DP file was edited after the graph was written, so the graph is stale and should be regenerated. (When the working tree is clean, git diff --quiet <sourceCommit>..HEAD -- <DP> answers the same question against committed history.) A file without metadata reports UNKNOWN — older artifacts predate the stamp and still load.

Graph file structure

Each *-graph.json is a GraphResult document; all property names serialize as camelCase and null optional fields are omitted:

{
"version": 1,
"metadata": { "generatedAt": "...", "sourceCommit": "...", "generator": "..." },
"nodes": [ /* GraphNode[] */ ],
"edges": [ /* GraphEdge[] */ ],
"warnings": [ "free-form string" ]
}
  • nodes — keyed by id. type is one of source, bronze, silver, gold, model-object, notebook. Table nodes carry columns (with sourceColumn, sourceDataType, isPrimaryKey).
  • edgesfrom / to (canonical node ids), kind (ingest, lineage, or model), and evidence (a free-form trace such as a SQL fragment or file path).
  • warnings — free-form strings, e.g. an unresolved lineage target.

The viewer and the query script merge files the same way: nodes are unioned by id (first writer wins on scalar fields; columns unioned by name), edges are deduped on the (from, to, kind, evidence) tuple, and warnings are deduped with first-seen order preserved.

Node-id convention

Because the union is by string equality, every producer formats ids identically:

Node categoryId formatExample
table / history table<layer>.<schema>.<name> (all lowercased)silver.dbo.customers, bronze.his.customers
model objectmodel.<name> (name as authored)model.Verkopen
notebooknb.<layer>.<name> (layer lowercased)nb.gold.D_Klant

Lowercasing table ids lets the client resolve a notebook SQL reference like FROM Silver.dbo.Customers against an API-emitted silver.dbo.customers node with no case-insensitive matching. Edge kinds are ingest (source → bronze), lineage (SQL FROM/JOIN-style links between tables / notebooks), and model (Tabular relationships between model objects).

The full shape and id contract is the source of truth in apps/api-generator/ApiGenerator/Model/Graph/README.md; the client-side guide lives in apps/Generator/Generator/Wiki/README.md.