Introduction to the OpenAI API
The OpenAI API is a programming interface that allows code to communicate directly with OpenAI's AI models — including GPT-4o — to generate text, understand language, call tools, and much more. It is the gateway through which AI Agents connect to one of the most powerful LLMs available today.
This topic covers everything needed to use the OpenAI API confidently — from making the first call to understanding its structure, parameters, and the responses it returns.
How the API Works
When code calls the OpenAI API, it sends an HTTP request with:
- A list of messages (the conversation so far)
- A model name (e.g.,
gpt-4o) - Optional parameters like temperature and max_tokens
- Optional tools the model can call
OpenAI's servers process this request, run the model, and send back a response object containing the model's reply.
Code → [API Request] → OpenAI Servers → [API Response] → Code
Available OpenAI Models
| Model | Best For | Speed | Cost |
|---|---|---|---|
| gpt-4o | Best overall — reasoning, tools, multimodal | Fast | Medium |
| gpt-4o-mini | Lightweight tasks, high volume | Very Fast | Low |
| gpt-4-turbo | Complex reasoning, 128k context | Medium | High |
| gpt-3.5-turbo | Simple tasks, budget-sensitive apps | Fastest | Very Low |
For this course, gpt-4o is used as the primary model — it is the best balance of intelligence, speed, and cost.
The Messages Format
Every API call revolves around the messages array. This is a list of message objects, each with a role and content.
The Three Message Roles
| Role | Sent By | Purpose |
|---|---|---|
system | Developer | Sets the agent's behaviour, persona, and rules |
user | End user | Contains the user's input or question |
assistant | LLM (previous turns) | Contains previous responses from the AI |
messages = [
{
"role": "system",
"content": "You are a concise Python tutor. Keep all answers under 100 words."
},
{
"role": "user",
"content": "What is a Python list?"
}
]
Making the First API Call
import os
from dotenv import load_dotenv
import openai
load_dotenv()
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is a Python list?"}
],
temperature=0.3,
max_tokens=200
)
# Extract the text response
answer = response.choices[0].message.content
print(answer)
Output:
A Python list is an ordered, mutable collection of items that can store different data types. Lists are defined using square brackets: my_list = [1, "hello", 3.14, True] Items can be added, removed, or changed at any time.
Understanding the Response Object
The API response contains more than just the text. Here is the full structure:
response = {
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A Python list is an ordered, mutable collection..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 35,
"completion_tokens": 62,
"total_tokens": 97
}
}
Key Fields to Know
| Field | What It Contains |
|---|---|
choices[0].message.content | The actual text response from the model |
choices[0].finish_reason | "stop" = completed normally, "length" = hit max_tokens limit |
usage.total_tokens | Total tokens used (input + output) — determines cost |
choices[0].message.tool_calls | Present when the model wants to call a tool (None otherwise) |
Important API Parameters
temperature
Controls randomness. 0.0 = deterministic; 1.0 = creative.
# For agents doing factual tasks — low temperature temperature=0.1 # For creative writing — higher temperature temperature=0.8
max_tokens
Maximum tokens in the response. If the response would exceed this, it gets cut off.
max_tokens=500 # Allow up to 500 tokens in reply
top_p
An alternative to temperature for controlling randomness. Usually left at default (1.0) unless experimenting.
stop
A sequence of strings that tell the model to stop generating when encountered.
stop=["END", "\n\n"] # Stop when the model outputs "END" or a double newline
Multi-Turn Conversations
To simulate a real conversation, previous messages need to be included in each API call. The API itself is stateless — each call is independent. The conversation history must be managed manually.
messages = [
{"role": "system", "content": "You are a helpful assistant."}
]
# Turn 1
messages.append({"role": "user", "content": "My name is Kavya."})
response = client.chat.completions.create(model="gpt-4o", messages=messages)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
print("Agent:", reply)
# Turn 2
messages.append({"role": "user", "content": "What is my name?"})
response = client.chat.completions.create(model="gpt-4o", messages=messages)
reply = response.choices[0].message.content
print("Agent:", reply)
# Output: "Your name is Kavya."
Understanding Tokens and Cost
OpenAI charges per token. Monitoring token usage is important to keep costs under control.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| gpt-4o | $5.00 | $15.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-3.5-turbo | $0.50 | $1.50 |
Practical tip: For development and testing, use gpt-4o-mini to save costs. Switch to gpt-4o for production or complex reasoning tasks.
# Print token usage after each call
usage = response.usage
print(f"Tokens used → Input: {usage.prompt_tokens} | Output: {usage.completion_tokens} | Total: {usage.total_tokens}")
Handling API Errors
import openai
import time
def safe_api_call(client, messages, model="gpt-4o", retries=3):
"""Call the API with basic error handling and retry logic."""
for attempt in range(retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=500
)
return response
except openai.RateLimitError:
print(f"Rate limit hit. Waiting 20 seconds... (Attempt {attempt + 1})")
time.sleep(20)
except openai.AuthenticationError:
print("Invalid API key. Check your .env file.")
break
except openai.APIConnectionError:
print("Connection error. Check your internet.")
break
return None
Summary
The OpenAI API is the bridge between Python code and GPT-4o — the brain of the agents built in this course. Understanding the messages format, the response structure, important parameters like temperature and max_tokens, and how to manage multi-turn conversations forms the foundation for everything that follows. With the API set up and tested, the next step is to build the very first complete AI Agent.
