Notebooks
EasyFabric relies (almost) solely on notebooks for loading and processing data into the different stages of the fabric workspace.
Default notebooks
EasyFabric comes with a couple of notebooks:
- DAG_Multiloader
- DAG_Gold
- Load_Bronze
- Load_Silver
- Load_Gold
These notebooks depend on the wheel package of EasyFabric.
You can open a notebook and run it in the webbrowser as usual. Imagine you want to load a table from Bronze to Silver, based on the settings of the object named MyTable from source MySource.
Run a notebook: Load_Silver
Steps to follow when using the direct notebook approach:
- Open Load_Silver
- Go to the parameter cell (right corner has the word 'Paramters')
- Replace 'tablefile' with the desired path to the object from the Files section of the Meta lakehouse
- Run all
What's happening during the run?
- A session is started
- The EasyFabric wheel package is installed
- The parameters are set
- The configmanager is set, based on the default yaml config file from the Meta lakehouse
- The configmanager and the tablefile are used to run the load_data_silver.run method
- Info logging is displayed in the notebook, while running.
- Logfile is saved to the Meta lakehouse and to a table in the Meta lakehouse
Run a notebook from a new custom notebook (preferred)
You can also open a new notebook and call an existing notebook with the following Python script:
# Load bronze table via DAG runMultiple
tablefile = "Files/Objects/MySource/MyTable.yaml"
DAG = {
"activities": [
{
"name": "Load_bronze_1",
"path": "Load_bronze",
"timeoutPerCellInSeconds": 900, # max timeout for each cell, default to 90 seconds
"args": {"tablefile": tablefile},
}]
}
notebookutils.notebook.runMultiple(DAG)
Run a DAG notebook: DAG_Multiloader
Loading a single item is straightforward, but in real-world scenarios, you often need to load multiple items into your lakehouses. This is where DAG (Directed Acyclic Graph) becomes valuable. For example, using DAG_Multiloader, you can orchestrate multiple loading operations simultaneously. This notebook initiates both Load_Bronze and Load_Silver operations for each object present in the specified parameter folder.
DAG in Microsoft Fabric
- DAG (Directed Acyclic Graph) represents a workflow structure in Microsoft Fabric's data pipelines
- It's a collection of tasks/activities connected in a way that forms a directed flow without cycles
- In Fabric, DAGs enable orchestration of data workflows, notebooks, and pipeline activities
- Each node in a DAG represents a task, while edges show dependencies between tasks
- DAGs ensure tasks execute in the correct order while preventing circular dependencies