Using MLRun Hub Module for OpenAI Proxy App#

This notebook walks through the process of importing an OpenAI proxy application from an MLRun Hub module and deploying it as part of your MLRun project.

The module provides a flexible FastAPI endpoint that exposes the following OpenAI URLs: chat completions, responses, and embeddings. So you can generate text, query models, and work with vector representations.

Note - Before running this notebook please generate an .env file with the following credentials

OPENAI_BASE_URL=".."
OPENAI_API_KEY=".."

# optional:
OPENAI_DEFAULT_MODEL=".." # by default uses gpt-4o-mini, it can changed by using this key
import mlrun
import os
import pandas as pd
from dotenv import load_dotenv
load_dotenv()

Load or create a project and set credentials.

project = mlrun.get_or_create_project("openai-module", user_project=True)

default_model = os.getenv("OPENAI_DEFAULT_MODEL",None)
if default_model:
    project.set_secrets({"OPENAI_DEFAULT_MODEL":default_model})
project.set_secrets({
    "OPENAI_BASE_URL": os.environ["OPENAI_BASE_URL"],
    "OPENAI_API_KEY": os.environ["OPENAI_API_KEY"],
})

Import the OpenAI proxy module from the Hub#

openai_module = mlrun.import_module("hub://openai_proxy_app")
# Instantiate the module with your MLRun project and deploy it 
openai_obj = openai_module.OpenAIModule(project)
openai_obj.openai_proxy_app.deploy()

Examples of OpenAI app API’s#

Chat completions API#

This example asks for the three largest countries in Europe and their capitals and returns a standard chat completion response.

response = openai_obj.openai_proxy_app.invoke(
    path="/v1/chat/completions",
    body={
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": "What are the 3 largest countries in Europe and what are their capitals names"}],
    },
    method="POST",
)

Go over the OpenAI response#

data = response.json()
text = data["choices"][0]["message"]["content"]
print(text)
The three largest countries in Europe by area are:

1. **Russia** (part of it is in Europe) - Capital: Moscow
2. **Ukraine** - Capital: Kyiv
3. **France** - Capital: Paris

Note that while Russia is the largest country in the world, only a portion of its landmass is in Europe.

Embedding with the Deployed OpenAI Proxy#

This example sends a short sentence to the embeddings endpoint and extracts the returned vector from the response payload.
The result is a numeric embedding you can use for similarity search, clustering, or downstream model features.

import json

response = openai_obj.openai_proxy_app.invoke(
    path="/v1/embeddings",
    body={
        "model": "text-embedding-3-small",
        "input": "Kubernetes whispers to its pods at night"
    },
    method="POST",
)

Go over the OpenAI response#

embedding = response.json()["data"][0]["embedding"]

#print if you want to see the embedding
#print(embedding) 

Request a Text Response and Extract the Output#

The proxy also supports the unified responses endpoint.
Here we send a compact request for a short joke and then extract the generated text from the structured output.

response = openai_obj.openai_proxy_app.invoke(
    path="/v1/responses",
    body={
        "model": "gpt-4o-mini",
        "input": "Give me a short joke about high tech workers",
        "max_output_tokens": 30
    },
    method="POST",
)

Go over the OpenAI response#

data = response.json()
text = data["output"][0]["content"][0]["text"]
print(text)
Why did the high-tech worker bring a ladder to work?

Because they wanted to reach new heights in their career!