In a recent exploration by Braxton Nunnally from Phase 2 Labs, it was examined how Zep—a memory management tool—can help AI systems retain and recall important information over time. This kind of “organizational memory” allows AI to move beyond one-off interactions and instead offer consistent, informed responses that build on past context. Common Business Pain Point: "Our AI tools don’t retain context or past interactions—users repeat themselves, teams lose knowledge, and we miss opportunities to respond more intelligently." What the Team Learned: AI Needs...

In a recent analysis by Alan Ramirez, Phase 2 Labs explored how organizations can reduce the operational costs of Large Language Models (LLMs) by implementing context caching—a method that stores and reuses the static parts of AI prompts. This strategy minimizes redundant processing, leading to significant cost savings. Common Business Pain Point: “Our AI tools are powerful, but the cost of running them is escalating quickly—especially as usage grows across departments.” What the Team Learned: Understanding Context Caching: By separating static (unchanging) and dynamic...

In today's AI-driven landscape, creating systems with long-term memory capabilities has become increasingly important. Whether you're building a customer service chatbot that remembers interactions with its users and maintaining history longer than the context window are crucial parts of a successful system. While there are many components that go into building a fully functional long-term memory solution, Zep can be a powerful tool to help developers implement long-term memory in their AI applications. What is Zep? Zep is an API-based solution that...

Introduction Large Language Models (LLMs) have revolutionized how organizations process and generate natural language content, but their operational costs can become significant at scale. One of the most effective techniques for reducing these costs is context caching, which allows reuse of static prompt components across multiple requests. This article examines how the three major AI providers—Google (Gemini), Anthropic (Claude), and OpenAI—implement context caching, with detailed analysis of their technical approaches, pricing structures, and practical limitations. The Technical Fundamentals of Context Caching When interacting...

Unless you've recently spent your free time nose-deep in GitHub repos or interrogating ChatGPT like it's a barista who got your coffee order wrong, you may not be familiar with MCP servers. That’s okay. Two weeks ago, I wasn’t either. But thanks to a casual, "Hey, can you connect Notion to our project via an MCP server?" from a teammate (and my relentless need to avoid looking clueless), I dove headfirst into the rabbit hole. What I discovered is something potentially...