Using MLRun Hub Module for OpenAI Proxy App#
This notebook walks through the process of importing an OpenAI proxy application from an MLRun Hub module and deploying it as part of your MLRun project.
The module provides a flexible FastAPI endpoint that exposes the following OpenAI URLs: chat completions, responses, and embeddings. So you can generate text, query models, and work with vector representations.
Note - Before running this notebook please generate an .env file with the following credentials
OPENAI_BASE_URL=".."
OPENAI_API_KEY=".."
# optional:
OPENAI_DEFAULT_MODEL=".." # by default uses gpt-4o-mini, it can changed by using this key
import mlrun
import os
import pandas as pd
from dotenv import load_dotenv
load_dotenv()
Load or create a project and set credentials.
project = mlrun.get_or_create_project("openai-module", user_project=True)
default_model = os.getenv("OPENAI_DEFAULT_MODEL",None)
if default_model:
project.set_secrets({"OPENAI_DEFAULT_MODEL":default_model})
project.set_secrets({
"OPENAI_BASE_URL": os.environ["OPENAI_BASE_URL"],
"OPENAI_API_KEY": os.environ["OPENAI_API_KEY"],
})
Import the OpenAI proxy module from the Hub#
openai_module = mlrun.import_module("hub://openai_proxy_app")
# Instantiate the module with your MLRun project and deploy it
openai_obj = openai_module.OpenAIModule(project)
openai_obj.openai_proxy_app.deploy()
Examples of OpenAI app API’s#
Chat completions API#
This example asks for the three largest countries in Europe and their capitals and returns a standard chat completion response.
response = openai_obj.openai_proxy_app.invoke(
path="/v1/chat/completions",
body={
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "What are the 3 largest countries in Europe and what are their capitals names"}],
},
method="POST",
)
Go over the OpenAI response#
data = response.json()
text = data["choices"][0]["message"]["content"]
print(text)
The three largest countries in Europe by area are:
1. **Russia** (part of it is in Europe) - Capital: Moscow
2. **Ukraine** - Capital: Kyiv
3. **France** - Capital: Paris
Note that while Russia is the largest country in the world, only a portion of its landmass is in Europe.
Embedding with the Deployed OpenAI Proxy#
This example sends a short sentence to the embeddings endpoint and extracts the returned vector from the response payload.
The result is a numeric embedding you can use for similarity search, clustering, or downstream model features.
import json
response = openai_obj.openai_proxy_app.invoke(
path="/v1/embeddings",
body={
"model": "text-embedding-3-small",
"input": "Kubernetes whispers to its pods at night"
},
method="POST",
)
Go over the OpenAI response#
embedding = response.json()["data"][0]["embedding"]
#print if you want to see the embedding
#print(embedding)
Request a Text Response and Extract the Output#
The proxy also supports the unified responses endpoint.
Here we send a compact request for a short joke and then extract the generated text from the structured output.
response = openai_obj.openai_proxy_app.invoke(
path="/v1/responses",
body={
"model": "gpt-4o-mini",
"input": "Give me a short joke about high tech workers",
"max_output_tokens": 30
},
method="POST",
)
Go over the OpenAI response#
data = response.json()
text = data["output"][0]["content"][0]["text"]
print(text)
Why did the high-tech worker bring a ladder to work?
Because they wanted to reach new heights in their career!