Gold Loader
The Gold Layer loader in EasyFabric is designed to load data into dimension and fact tables while handling surrogate key generation, validation, and different load types.
Basic Usage
To load data into the Gold layer, you use the modelloader function. This requires a Spark DataFrame, a LoadConfig, a Model configuration, and the ConfigManager.
from easyfabric import load_data_gold, Model, LoadConfig
modelfile = "Files/Model/DM/model.yaml"
mdl = Model.from_yaml_file(modelfile)
loadconfig = LoadConfig.from_dict({
"model_object_name": model_object_name,
"dry_run": False,
"load_type": "full",
"log_row_count": True
})
result = load_data_gold.modelloader(data_frame=df_result, load_config=loadconfig, model_config=mdl)
Dimension Tables (Dim)
For dimension tables, the loader automatically manages Surrogate Keys (SK) based on the Business Key (BK).
Automatic Key Generation
EasyFabric supports two methods for generating surrogate keys when the SK column is missing from the source DataFrame:
1. Hash-based Keys (Default)
If no specific data type is specified in the model, the loader creates a SHA-256 hash of the Business Key.
2. Numeric Keys (Row Number)
If you want to use incremental integer keys, you can specify this in your model.yaml:
- Table: MyDimension
Name: MyDimension
TableType: Dim
KeyDataType: INT # Supports: INT, INTEGER, BIGINT
...
When KeyDataType is set to a numeric type (INT or BIGINT):
- The loader generates IDs using
row_number()ordered by the BK column. - Special Case: If the Business Key value is
-1(representing an unknown record), the Surrogate Key will also be explicitly set to-1.
Fact Tables
For fact tables, the loader ensures the data is aligned with the target schema and handles partitioning.
load_config = LoadConfig(
model_object_name="MyFactTable",
load_type="appendwithdelete" # Useful for partition-based overwrites
)
Configuration Options
| Property | Description | Default |
|---|---|---|
model_object_name | The name of the table as defined in model.yaml | Mandatory |
load_type | full, append, appendwithdelete, or merge | full |
dry_run | If True, only logs what would happen without writing data | True |
auto_null_column | Automatically fills missing columns with NULL | True |
log_row_count | Logs the number of rows before and after the load | False |
Best Practices
- Business Keys: Ensure your source DataFrame contains the Business Key column (prefixed with
BK_by default, e.g.,BK_Product). - Schema Alignment: By default, the loader will try to align your DataFrame to the target schema. Use
auto_null_column=Trueto handle evolution gracefully. - Dry Runs: Always run with
dry_run=Truefirst when developing new notebooks to verify the generated keys and target table paths