Introduction to the OpenAI API

The OpenAI API is a programming interface that allows code to communicate directly with OpenAI's AI models — including GPT-4o — to generate text, understand language, call tools, and much more. It is the gateway through which AI Agents connect to one of the most powerful LLMs available today.

This topic covers everything needed to use the OpenAI API confidently — from making the first call to understanding its structure, parameters, and the responses it returns.

How the API Works

When code calls the OpenAI API, it sends an HTTP request with:

A list of messages (the conversation so far)
A model name (e.g., gpt-4o)
Optional parameters like temperature and max_tokens
Optional tools the model can call

OpenAI's servers process this request, run the model, and send back a response object containing the model's reply.

Code → [API Request] → OpenAI Servers → [API Response] → Code

Available OpenAI Models

Model	Best For	Speed	Cost
gpt-4o	Best overall — reasoning, tools, multimodal	Fast	Medium
gpt-4o-mini	Lightweight tasks, high volume	Very Fast	Low
gpt-4-turbo	Complex reasoning, 128k context	Medium	High
gpt-3.5-turbo	Simple tasks, budget-sensitive apps	Fastest	Very Low

For this course, gpt-4o is used as the primary model — it is the best balance of intelligence, speed, and cost.

The Messages Format

Every API call revolves around the messages array. This is a list of message objects, each with a role and content.

The Three Message Roles

Role	Sent By	Purpose
`system`	Developer	Sets the agent's behaviour, persona, and rules
`user`	End user	Contains the user's input or question
`assistant`	LLM (previous turns)	Contains previous responses from the AI

messages = [
    {
        "role": "system",
        "content": "You are a concise Python tutor. Keep all answers under 100 words."
    },
    {
        "role": "user",
        "content": "What is a Python list?"
    }
]

Making the First API Call

import os
from dotenv import load_dotenv
import openai

load_dotenv()

client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "What is a Python list?"}
    ],
    temperature=0.3,
    max_tokens=200
)

# Extract the text response
answer = response.choices[0].message.content
print(answer)

Output:

A Python list is an ordered, mutable collection of items that can store
different data types. Lists are defined using square brackets:
  my_list = [1, "hello", 3.14, True]
Items can be added, removed, or changed at any time.

Understanding the Response Object

The API response contains more than just the text. Here is the full structure:

response = {
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A Python list is an ordered, mutable collection..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 35,
    "completion_tokens": 62,
    "total_tokens": 97
  }
}

Key Fields to Know

Field	What It Contains
`choices[0].message.content`	The actual text response from the model
`choices[0].finish_reason`	"stop" = completed normally, "length" = hit max_tokens limit
`usage.total_tokens`	Total tokens used (input + output) — determines cost
`choices[0].message.tool_calls`	Present when the model wants to call a tool (None otherwise)

Important API Parameters

temperature

Controls randomness. 0.0 = deterministic; 1.0 = creative.

# For agents doing factual tasks — low temperature
temperature=0.1

# For creative writing — higher temperature
temperature=0.8

max_tokens

Maximum tokens in the response. If the response would exceed this, it gets cut off.

max_tokens=500   # Allow up to 500 tokens in reply

top_p

An alternative to temperature for controlling randomness. Usually left at default (1.0) unless experimenting.

stop

A sequence of strings that tell the model to stop generating when encountered.

stop=["END", "\n\n"]  # Stop when the model outputs "END" or a double newline

Multi-Turn Conversations

To simulate a real conversation, previous messages need to be included in each API call. The API itself is stateless — each call is independent. The conversation history must be managed manually.

messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]

# Turn 1
messages.append({"role": "user", "content": "My name is Kavya."})
response = client.chat.completions.create(model="gpt-4o", messages=messages)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
print("Agent:", reply)

# Turn 2
messages.append({"role": "user", "content": "What is my name?"})
response = client.chat.completions.create(model="gpt-4o", messages=messages)
reply = response.choices[0].message.content
print("Agent:", reply)
# Output: "Your name is Kavya."

Understanding Tokens and Cost

OpenAI charges per token. Monitoring token usage is important to keep costs under control.

Model	Input (per 1M tokens)	Output (per 1M tokens)
gpt-4o	$5.00	$15.00
gpt-4o-mini	$0.15	$0.60
gpt-3.5-turbo	$0.50	$1.50

Practical tip: For development and testing, use gpt-4o-mini to save costs. Switch to gpt-4o for production or complex reasoning tasks.

# Print token usage after each call
usage = response.usage
print(f"Tokens used → Input: {usage.prompt_tokens} | Output: {usage.completion_tokens} | Total: {usage.total_tokens}")

Handling API Errors

import openai
import time

def safe_api_call(client, messages, model="gpt-4o", retries=3):
    """Call the API with basic error handling and retry logic."""
    for attempt in range(retries):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                max_tokens=500
            )
            return response

        except openai.RateLimitError:
            print(f"Rate limit hit. Waiting 20 seconds... (Attempt {attempt + 1})")
            time.sleep(20)

        except openai.AuthenticationError:
            print("Invalid API key. Check your .env file.")
            break

        except openai.APIConnectionError:
            print("Connection error. Check your internet.")
            break

    return None

Summary

The OpenAI API is the bridge between Python code and GPT-4o — the brain of the agents built in this course. Understanding the messages format, the response structure, important parameters like temperature and max_tokens, and how to manage multi-turn conversations forms the foundation for everything that follows. With the API set up and tested, the next step is to build the very first complete AI Agent.

Previous lesson

Back to course

Next lesson