Skip to main content

Read latest Silver snapshot

The get_silver_snapshot function is designed to retrieve the current state (snapshot) of a dataset from its Silver history table. It effectively deduplicates history records into a single representative row for each primary key.

Overview

In the EasyFabric architecture, Silver history tables (Silver.his.*) contain a full audit trail of changes. Each time a record is modified, a new row is appended with a SYSTEMSTATETIMESTAMP.

get_silver_snapshot automates the process of finding the "latest" version of every record, giving you a clean table of current data without having to write complex window functions manually.

How it Works

The function applies the following logic:

  1. Partitioning: Groups data by SYSTEMPRIMARYKEY.
  2. Ordering: Sorts records within each group by ABS(SYSTEMSTATETIMESTAMP) in descending order (highest absolute value first).
  3. Selection: Selects the first row for each group (the most recent change).
  4. Deletion Handling: If the latest record has a negative SYSTEMSTATETIMESTAMP, it indicates the record was deleted. Depending on parameters, these are either included or filtered out.

Usage

As a Package Function

You can call the function directly by passing the full table name.

import easyfabric as ef

# Get the latest snapshot including deletions
df = ef.get_silver_snapshot("Silver.his.afs_medewerkerverzuimverloop")

# Get latest snapshot, filtering out deleted rows
df_active = ef.get_silver_snapshot("Silver.his.afs_medewerkerverzuimverloop", include_deleted_rows=False)

As a DataFrame Extension

The function is also monkey-patched onto the PySpark DataFrame class, allowing you to use it as a method if you already have the history loaded.

df_history = spark.table("Silver.his.my_table")
df_snapshot = df_history.get_silver_snapshot(include_deleted_rows=False)

Parameters

ParameterTypeDefaultDescription
table_namestrRequiredThe full name of the Silver history table (e.g., "Silver.his.MyTable").
include_deleted_rowsboolTrueIf False, records whose latest version in history has a negative SYSTEMSTATETIMESTAMP will be excluded from the resulting DataFrame.

Important Note

This function depends on the existence of SYSTEMPRIMARYKEY and SYSTEMSTATETIMESTAMP columns in the source table. It is specifically optimized for Silver history tables in the EasyFabric framework.