structured_data_generator example#

Introducing our innovative hub function, structured_data_generator, designed to streamline the process of creating structured files based on a list of fields.
This powerful function takes user-provided fields as input and dynamically generates relevant data, crafting a comprehensive structured file that aligns with the specified themes.
Whether you’re working on content creation, testing scenarios, or simply need diverse data for development purposes, structured_data_generator is your go-to tool.

import os
import mlrun
# OpenAI tokens:
OPENAI_API_KEY = ""
OPENAI_API_BASE = ""
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["OPENAI_API_BASE"] = OPENAI_API_BASE
# Create mlrun project
project = mlrun.get_or_create_project("structured-data-generator-test")

# Import the function from the yaml file, once it's in the hub we can import from there 
data_generation = project.set_function(func="./structured_data_generator.py", name="structured_data_generator")
# Run the imported function with desired file/s and params
data_generation_run = data_generation.run(
    handler="generate_data",
            params={
                "amount": 5,
                "model_name": "gpt-4",
                "language": "en",
                "fields": ["first name", "last_name", "phone_number: at least 9 digits long", "email", "client_id: at least 8 digits long, only numbers"],
            },
            returns=[
                "clients: file",
            ],
    local=True,
)