data.yaml_helper
read_file_as_string
def read_file_as_string(file_path: str) -> str
Reads the content of a file and returns it as a string. If the provided file path starts with 'Files/', it prepends a predefined base path to the file path before accessing the file content.
Arguments:
file_path
str - The relative or absolute path of the file to be read. If it starts with 'Files/', the method modifies the path by appending it to a default base path '/lakehouse/default/'.
Returns:
str
- The full content of the file as a string.
preprocess_keys
def preprocess_keys(data)
Processes the input data structure to convert all dictionary keys to lowercase. The function recursively traverses through the input data, ensuring all dictionary keys are converted to lowercase while preserving the original structure.
Arguments:
data
dict | list | Any - The input data, which can be a dictionary, list, or any other data type.
Returns:
dict | list | Any: A modified version of the input data where all dictionary keys are converted to lowercase. If the input is not a dictionary or list, it is returned unchanged.
yaml_to_json
def yaml_to_json(yaml_content) -> str | None
Converts a YAML string to a formatted JSON string. The function attempts to safely load the YAML content and convert it into its JSON equivalent. If an error occurs during parsing, debug information about the context of the error is logged.
Arguments:
yaml_content
str - A string containing the YAML data to be converted.
Returns:
str
- A string containing the JSON representation of the YAML input, formatted with an indentation of 4 spaces. If an error occurs, returns None.
load_yaml
def load_yaml(file_path)
Loads and parses a YAML file from a given file path. If the file path begins with 'Files/', the function automatically prepends the path with a predefined base directory value (/lakehouse/default/). The function determines the character encoding of the file content before decoding and loading the YAML data.
Arguments:
file_path
str - The path to the YAML file. If the path starts with 'Files/', it will be prefixed with '/lakehouse/default/'.
Returns:
dict
- The parsed YAML data as a Python dictionary.
load_yaml_from_folder_alt
def load_yaml_from_folder_alt(folder_path)
Load and parse all YAML files from a specified folder into a list of data objects.
This function scans the provided folder path, identifies all files with extensions
".yaml" or ".yml," and processes them using the load_yaml
function. The parsed
data from each file is appended to a list, which is then returned. Filenames not
matching the specified extensions will be ignored.
Arguments:
folder_path
str - Path to the folder containing YAML files.
Returns:
list
- A list of parsed data objects from the YAML files found in the folder.
check_format
def check_format(input_string) -> str
Determines the format of an input string by attempting to parse it as JSON or YAML. If the input string successfully parses as JSON, the format is identified as 'json'. If parsing fails for JSON and succeeds for YAML, the format is identified as 'yaml'. If both parsing attempts fail, the format is returned as 'unknown'.
Arguments:
input_string
- The input string whose format needs to be identified.
Returns:
str
- A string indicating the determined format. It will be either 'json', 'yaml', or 'unknown'.
load_yaml_from_folder_json
def load_yaml_from_folder_json(folder_path)
Loads all YAML files from the given folder, processes their content, and converts the combined result into JSON format.
The function iterates through all files in the specified folder path, identifying files with extensions '.yaml' or '.yml'. Each valid YAML file is loaded and appended to a list. The accumulated data from all YAML files is then converted to JSON and returned.
Arguments:
folder_path
str - The path to the folder containing YAML files.
Returns:
dict
- A JSON object containing the data parsed and combined from all YAML files in the specified folder.
load_yaml_from_folder
def load_yaml_from_folder(folder_path, filter_string=None)
Loads and filters YAML files from a specified folder. Parses YAML files located in a given directory and optionally filters the contents of these files based on a key/value pair provided in the filter_string argument.
Arguments:
folder_path
str - Path to the folder containing YAML files to be loaded.filter_string
Optional[str] - Optional string in 'key=value' format used to filter YAML content. Only YAML files that contain the specified key with the given value will be included in the result.
Returns:
List[dict]
- A list of dictionaries containing the parsed contents of the YAML files that match the criteria, or all files if no filter is provided.
Raises:
FileNotFoundError
- If the specified folder_path does not exist or cannot be accessed.ValueError
- If the filter_string is provided but not in the 'key=value' format.Exception
- If any other issues occur during file reading or YAML parsing.
main
def main(args)
Converts YAML files to JSON format and writes the output to specified locations. The function supports bulk conversion when provided with a directory as input, as well as single-file conversion when provided with a specific file as input.
Arguments:
args
- A namespace object containing the following attributes:input
str - Path to the input YAML file or directory containing multiple YAML files.output
str - Path to the output file for single-file conversion or directory for bulk conversion.filter
Optional[str] - Optional string keyword to filter YAML files when converting from a directory.