Skip to main content

DataFrame Display Extension

The display extension provides a convenient way to visualize PySpark DataFrames within Microsoft Fabric notebooks. It automatically handles the environment context to ensure interactive visualization works where possible.

Overview

In a Microsoft Fabric environment, visualizing data typically requires calling specialized display functions. This extension monkey-patches the PySpark DataFrame class, allowing you to call .display() directly on any DataFrame object.

How it Works

The display method checks the current runtime context:

  1. Interactive Run: If the notebook is being run interactively (by a user), it uses notebookutils.visualization.display to render a rich, interactive table.
  2. Non-Interactive Run: If the notebook is running as part of a pipeline or job, it skips the rich visualization (to save resources and avoid errors) and prints a message instead.

Usage

Once easyfabric is imported, the display method becomes available on all DataFrames.

import easyfabric as ef

# Define your DataFrame
df = spark.table("Silver.dbo.MyTable")

# Use the extension to visualize the data
df.display()

# You can also include a summary (statistics)
df.display(summary=True)

Parameters

ParameterTypeDefaultDescription
summaryboolFalseIf set to True, the display will include descriptive statistics (count, mean, stddev, min, max) for the columns.

Benefits

  • Syntactic Sugar: Cleaner code compared to calling vis_display(df).
  • Environment Aware: Prevents pipeline failures or unnecessary processing when running in non-interactive modes.
  • Consistency: Provides a uniform way to look at data throughout the project.