- https://github.com/microsoft/agent-framework
- https://learn.microsoft.com/en-us/agent-framework/get-started/your-first-agent?pivots=programming-language-python
- https://www.youtube.com/watch?v=EAeUiipzCTE
- https://www.udemy.com/course/microsoft-agent-framework-fundamentals (same as the youtube course)
Two types:
- Session level memory: memory in a certain thread or session
- Earlier, they were called "threads" in code; now they are called "sessions".
- Long term memory i.e. context: memory across sessions or threads
- In MAF, they are managed using context providers
Context providers have two sections:
- Before invocation: Inject context derived from persistent memory (e.g. - user's preferred language: Spanish)
- After invocation: Inspect the user messages + agent responses and update your memory (e.g. - the user just told you a new preference)
All models have a maximum context window. As conversations grow:
- We may hit token limits and get errors
- API costs increase
- Response times slow down
- The model may lose focus on recent messages
Truncation strategies:
-
Sliding window - Keep only the last N messages, discarding older ones
- Pros: simpler to implement, predictable behaviour
- Cons: loses older context, doesn't account for message length
- Used more for prototypes/demos
-
Token-based truncation - Remove old messages until total tokens are under a budget
- Pros: More precise, respects actual token limits
- Cons: Requires token counting, slightly more complex
- Used more for production systems
Both of the above lose information, so a smarter approach is:
- Summarization
- Log conversation exchanges - track user and assistant messages
- Summarize old exchanges - using a dedicated summarizer agent
- Inject summary as a message - send it as the first message to the new thread
- This is also a multi-agent pattern
For the youtube/udemy tutorial KodeKloud labs:
Note: this was needed for section 1 but not for 2. Also, the given URL doesn't seem to exist anymore. Maybe something changed in the lab - need to try section 1 again to verify.
- Go to https://kodekloud.com/ai-playgrounds/kodekey, launch now, copy base url and api key
- Export
OPENAI_API_KEYandOPENAI_API_BASEin the terminal
- Use
model="gpt-5.5"(instead ofmodel_id="openai/gpt-4.1-mini") - Use
agent = client.as_agent(...)instead ofagent = client.create_agent(...)
- Replace
stream = agent.run_stream(query)withstream = agent.run(query, stream=True)
- Use
model="openai/gpt-4.1-mini"(instead ofmodel_id="openai/gpt-4.1-mini") - Use
agent.create_session()instead ofagent.get_new_thread() - Use
await agent.run(..., session=...)instead ofawait agent.run(..., thread=...) - Use
OpenAIChatCompletionClientinstead ofOpenAIChatClient.OpenAIChatClientcalls the OpenAI Responses API (/v1/responses), whileOpenAIChatCompletionClientcalls the classic Chat Completions API (/v1/chat/completions).- The KodeKloud lab gateway (a proxy
OPENAI_API_BASE) only implements/chat/completions, soOpenAIChatClientfails with404 - {'detail': 'Not Found'}(thedetailbody is a giveaway that it's a proxy, not real OpenAI). - Everything downstream (
client.as_agent(...),create_session(),agent.run(...)) is identical between the two clients — only the endpoint differs.
- Use
model="openai/gpt-4.1-mini"(instead ofmodel_id="openai/gpt-4.1-mini") - Import
Agentinstead ofChatAgent(renamed inpython-1.3.0) - Use
Agent(client=...)instead ofAgent(chat_client=...)(the constructor's first parameter was renamed fromchat_clienttoclient)