Gold Loader

The Gold Layer loader in EasyFabric is designed to load data into dimension and fact tables while handling surrogate key generation, validation, and different load types.

Basic Usage

To load data into the Gold layer, you use the modelloader function. This requires a Spark DataFrame, a LoadConfig, a Model configuration, and the ConfigManager.

from easyfabric import load_data_gold, Model, LoadConfig

modelfile = "Files/Model/DM/model.yaml"
mdl = Model.from_yaml_file(modelfile)

loadconfig = LoadConfig.from_dict({
    "model_object_name": model_object_name,
    "dry_run": False,
    "load_type": "full",
    "log_row_count": True
})

result = load_data_gold.modelloader(data_frame=df_result, load_config=loadconfig, model_config=mdl)

Dimension Tables (Dim)

For dimension tables, the loader automatically manages Surrogate Keys (SK) based on the Business Key (BK).

Automatic Key Generation

EasyFabric supports two methods for generating surrogate keys when the SK column is missing from the source DataFrame:

1. Hash-based Keys (Default)

If no specific data type is specified in the model, the loader creates a SHA-256 hash of the Business Key.

2. Numeric Keys (Row Number)

If you want to use incremental integer keys, you can specify this in your model.yaml:

- Table: MyDimension
  Name: MyDimension
  TableType: Dim
  KeyDataType: INT  # Supports: INT, INTEGER, BIGINT
  ...

When KeyDataType is set to a numeric type (INT or BIGINT):

The loader generates IDs using row_number() ordered by the BK column.
Special Case: If the Business Key value is -1 (representing an unknown record), the Surrogate Key will also be explicitly set to -1.

Fact Tables

For fact tables, the loader ensures the data is aligned with the target schema and handles partitioning.

load_config = LoadConfig(
    model_object_name="MyFactTable",
    load_type="appendwithdelete" # Useful for partition-based overwrites
)

Configuration Options

Property	Description	Default
`model_object_name`	The name of the table as defined in `model.yaml`	Mandatory
`load_type`	`full`, `append`, `appendwithdelete`, or `merge`	`full`
`dry_run`	If True, only logs what would happen without writing data	`True`
`auto_null_column`	Automatically fills missing columns with NULL	`True`
`log_row_count`	Logs the number of rows before and after the load	`False`

Best Practices

Business Keys: Ensure your source DataFrame contains the Business Key column (prefixed with BK_ by default, e.g., BK_Product).
Schema Alignment: By default, the loader will try to align your DataFrame to the target schema. Use auto_null_column=True to handle evolution gracefully.
Dry Runs: Always run with dry_run=True first when developing new notebooks to verify the generated keys and target table paths

Basic Usage​

Dimension Tables (Dim)​

Automatic Key Generation​

1. Hash-based Keys (Default)​

2. Numeric Keys (Row Number)​

Fact Tables​

Configuration Options​

Best Practices​