Data Lineage Graph

Overview

The lineage graph is a machine- and human-readable map of how data flows through your platform — from sources, through Bronze / Silver / Gold, into the Tabular model, including notebook-driven transformations. The EasyFabric Generator emits it as a set of small *-graph.json files (one per producer) that a bundled viewer and query script union into a single view by node id. Use it to answer "what feeds this table?", "where is this column used?", and "is this picture still up to date?" — for impact analysis, onboarding, or as grounding context for an LLM.

What gets generated

Stage	Component	Output
Build	EasyFabric Generator (EFG)	`GenerateFabricObjectsGraph`, `GenerateModelGraph`, `GenerateFabricDatamartGraph`, `GenerateWiki` → `*-graph.json` + `viewer.html` + `explain_graph.py`

The graph is produced by calling the Generator API in graph mode (the x-output-type: graph request header) instead of normal SQL/item generation. Four files land side by side in the wiki output folder:

File	Producer	What's in it
`fabric-graph.json`	API (`GenerateFabricObjects` in graph mode)	source / bronze / silver / history table nodes + ingest & lineage edges
`model-graph.json`	API (`GenerateModel` in graph mode)	model-object nodes + `Reference` (model) edges
`datamart-graph.json`	API (`GenerateFabricDatamart` in graph mode)	gold table nodes + gold → model-object edges
`notebook-graph.json`	Client (`WikiGenerator`)	notebook nodes + silver → gold lineage parsed from SQL

Every file shares the same canonical node-id convention, so an edge written by one producer resolves against a node written by another with no mapping. A partial build still works — if you only enabled the client wiki step you get notebook-graph.json alone, and the union simply covers fewer producers.

Generating the graph

Graph mode is opt-in per build. Add the graph build steps to your Generator build configuration (e.g. Generator/local.yaml); they run only when listed:

- BuildName: "DM - Fabric objects (graph)"
  BuildType: GenerateFabricObjectsGraph
  ConfigFile: "Dataplatform\\DP\\platform.local.json"
  BuildIndex: 140

- BuildName: "DM - Model (graph)"
  BuildType: GenerateModelGraph
  ConfigFile: "Dataplatform\\DP\\platform.local.json"
  BuildIndex: 150

- BuildName: "DM - Datamart (graph)"
  BuildType: GenerateFabricDatamartGraph
  ConfigFile: "Dataplatform\\DP\\platform.local.json"
  BuildIndex: 160

GenerateWiki (the client wiki step) writes notebook-graph.json and copies viewer.html + explain_graph.py next to the graph files, so the folder is self-contained and ready to open or query.

Viewing the graph (`viewer.html`)

viewer.html is a self-contained, dependency-free web page that renders the merged graph as an interactive left-to-right medallion diagram (source → bronze → silver → gold → model). It has a search box, a per-node detail panel (columns, primary keys, upstream / downstream edges with their evidence), pan / zoom, and a Sources button listing which *-graph.json files contributed to the current view.

Because browsers block a page opened via file:// from reading the JSON fragments, serve the folder over HTTP — the query script does this for you:

python explain_graph.py serve            # serves the folder and opens viewer.html
python explain_graph.py serve --port 9000 --no-browser

Alternatively, open viewer.html directly and use the Load *-graph.json button — it is multi-select, so you can pick all of your graph files at once and the viewer unions them.

Querying the graph (`explain_graph.py`)

explain_graph.py reads every *-graph.json in its folder (override with --graph-dir), unions them, and runs the requested query against the merged view. Single-file invocation works too — pass --graph-dir path/to/one-graph.json.

python explain_graph.py sources                       # which files got merged
python explain_graph.py search Customer               # substring search on id / name
python explain_graph.py info silver.dbo.customers     # node detail + direct dependencies
python explain_graph.py find-column customer_id       # tables/objects carrying that column
python explain_graph.py trace silver.dbo.customers --depth 5   # full upstream + downstream
python explain_graph.py warnings --query gold.D_Sales # filter unresolved-lineage warnings
python explain_graph.py status --dp <DP-folder>       # freshness check — see below

Every command operates on the merged view, so a trace starting at a Silver table follows the notebook → Gold edges in notebook-graph.json and lands on gold tables declared in datamart-graph.json without you having to know which file holds which node.

Is this graph fresh?

Each *-graph.json carries an optional top-level metadata block, stamped by the client at write time, so a consumer can tell whether the graph still reflects your Dataplatform/DP before trusting it:

Field	Notes
`generatedAt`	ISO-8601 UTC timestamp of when the graph was written
`sourceCommit`	best-effort short git SHA of `HEAD` at write time; omitted outside a git repo
`generator`	producer id — `fabric-graph`, `model-graph`, `datamart-graph`, or `notebook-graph`

Run the bundled freshness check, pointing it at your Dataplatform folder:

python explain_graph.py status --dp <path-to-Dataplatform>

It prints a FRESH / STALE / UNKNOWN verdict per file plus an overall verdict (showing each file's producer, age, and git SHA), and exits non-zero when anything is STALE. The staleness rule is intentionally simple and needs no API and no credentials — anything returned by

find <path-to-Dataplatform> -type f -newermt "<generatedAt>"

means a DP file was edited after the graph was written, so the graph is stale and should be regenerated. (When the working tree is clean, git diff --quiet <sourceCommit>..HEAD -- <DP> answers the same question against committed history.) A file without metadata reports UNKNOWN — older artifacts predate the stamp and still load.

Graph file structure

Each *-graph.json is a GraphResult document; all property names serialize as camelCase and null optional fields are omitted:

{
  "version": 1,
  "metadata": { "generatedAt": "...", "sourceCommit": "...", "generator": "..." },
  "nodes": [ /* GraphNode[] */ ],
  "edges": [ /* GraphEdge[] */ ],
  "warnings": [ "free-form string" ]
}

nodes — keyed by id. type is one of source, bronze, silver, gold, model-object, notebook. Table nodes carry columns (with sourceColumn, sourceDataType, isPrimaryKey).
edges — from / to (canonical node ids), kind (ingest, lineage, or model), and evidence (a free-form trace such as a SQL fragment or file path).
warnings — free-form strings, e.g. an unresolved lineage target.

The viewer and the query script merge files the same way: nodes are unioned by id (first writer wins on scalar fields; columns unioned by name), edges are deduped on the (from, to, kind, evidence) tuple, and warnings are deduped with first-seen order preserved.

Node-id convention

Because the union is by string equality, every producer formats ids identically:

Node category	Id format	Example
table / history table	`<layer>.<schema>.<name>` (all lowercased)	`silver.dbo.customers`, `bronze.his.customers`
model object	`model.<name>` (name as authored)	`model.Sales`
notebook	`nb.<layer>.<name>` (layer lowercased)	`nb.gold.D_Klant`

Lowercasing table ids lets the client resolve a notebook SQL reference like FROM Silver.dbo.Customers against an API-emitted silver.dbo.customers node with no case-insensitive matching. Edge kinds are ingest (source → bronze), lineage (SQL FROM/JOIN-style links between tables / notebooks), and model (Tabular relationships between model objects).

Overview​

What gets generated​

Generating the graph​

Viewing the graph (viewer.html)​

Querying the graph (explain_graph.py)​

Is this graph fresh?​

Graph file structure​

Node-id convention​