Newsletter – Open Tech Talks – Technology worth Talking

Agent Memory and Context How Agents Forget and Ways to Fix It

Kashif Manzoor — Sun, 14 Sep 2025 17:42:19 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – What you need to know about Generative AI, without the noise!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Feeling overwhelmed by the constant stream of AI news. I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

When Agents Forget How to Build Context-Aware Memory
Generative AI Use Case: AI-enabled Sales Assistant for an E-commerce store
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

EmbeddingGemma: Google’s New Open-Source On-Device Embedding Model

Google DeepMind released EmbeddingGemma, allowing a 308M-parameter multilingual text embedding model designed for on-device use. It runs efficiently (under 200 MB RAM when quantized), supports over 100 languages, and maintains state-of-the-art performance on several embedding benchmarks despite its small size. It features Matryoshka representation learning so developers can use reduced embedding dimensions (e.g., 768 → 128) for speed or storage trade-offs. EmbeddingGemma integrates with tools like SentenceTransformers, LangChain, Ollama, Weaviate, and works offline with Gemma 3n for mobile RAG pipelines.

Why It Matters:

What’s EmbeddingGemma changes what’s possible. For companies constrained by device size, network limitations, or privacy regulations, this model offers robust semanCloud without requiring Cloud or large models. It lets teams embed search, recommendation, and retrieval workflows directly on users’ devices, thereby accelerating response times, reducing latency, and keeping data local.

Because embedding quality is foundational for things like RAG systems or recommendation engines, having a compact, high-accuracy model means fewer downstream mistakes.

Agent Memory & Context: Why Agents Forget & How to Fix It

It is challenging to think about what to write every week. Usually, I go for what I have faced during the week, or maybe I have done something. Over the last few weeks, I have been so much into Claude code to complete my project of Gen AI maturity framework (progress in the next section), it is keeping me busy until I hit my limit of Claude code every day with the smallest monthly package I have subscribed to, and that is usually 4-5 hours.

This week, I hit a block. What to write as a main topic?

It brought me back to what I was doing with VIBE Coding, so let me share with you a few areas.

The Problem We See in VIBE Coding

‘”While building the Gen AI Maturity Portal using Claude Code (VIBE-coded), I noticed agents often “reset’, losing track of features already implemented, duplicating work, and failing to build upon past progress. At times, the agent coded parts from scratch that I had already built. This isn’t just annoying; it wastes time and erodes trust in using agents.

What Research & Practice Say

From Claude Code: Best Practices for Agentic Coding, Anthropic recommends using hierarchical memory locations: Enterprise-level memory (company-wide policies), project memory (shared architecture/design), and user memory (style preferences, shortcuts) stored in CLAUDE.md files that agents load automatically. This makes context persistent across sessions.
“Agent Memory: How to Build Agents that Learn and Remember” highlights memory types: message buffers (recent interactions), long-term memory blocks, and external databases to persist key info.
IBM describes how memory helps agents improve decision-making, perception, and adapt over time rather than treating every interaction as brand new.

How I’ve Applied This in VIBE Coding

I’ve structured the VIBE project so that project memory files exist (CLAUDE.md) to capture shared design patterns, completed features, and coding styles.
I utilize short-term session memory during each sprint: after working on one feature, I prompt Claude Code to summarize what was done and what remains, which is then stored.
For feature handovers, I manually check whether the agent’s memory already includes similar code or existing modules to avoid duplication, which helps prevent the agent from repeatedly building things from scratch.

“The most essential task I learned is to “maintain versioned documentation of what agents do, and on which documents it is baselining, from task breakdown, to feature explaining, to logging each task status so you can avoid rework when they forget”.

What I still need to figure out in the broader other areas to work in the coming weeks.

Anthropic’s multi-agent research where memory handoffs improve collaboration.
OpenAI’s gpt-oss Harmony format standardizes role-based memory usage.
LangChain’s memory modules allow summary buffers and retrieval augmentation.

Call to Action:

If you are experimenting with GenAI but struggling to scale beyond pilots, now is the right time to evaluate your maturity level.

Visit the GenAI Maturity Portal (GenAIMaturity.net), which I’ve been VIBE-coding live using Claude Code, and run the self-assessment to see where your organization stands:

Level 0–1 (Aware / Exploring): Start with simple session memory and logging. Capture what agents do so you can iterate quickly.
Level 2–3 (Operational / Integrated): Add persistent memory through vector stores or project memory files. Build handover checkpoints and memory pruning strategies.
Level 4+ (Autonomous / Transformative): Move to multi-agent memory sharing, continuous context caching, and automated learning loops.

Try the portal, experiment with memory strategies, and share what works (or breaks!). Your feedback helps refine the model.

Gen AI Maturity Framework:

A few more updates have been made to GenAIMaturity.Net, and this week, I have added the Gen AI Implementation Toolkit covering several areas. This entire portal is vibe-coded, and content is being reviewed and added frequently.

Weekly News & Updates…

Top Stories of the Week: K2 Thinkthe, such as AIME’ 25’25, is a new open-source reasoning model built jointly by the Institute of Foundation Models at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42, launching with just 32 billion parameters yet achieving performance on par with much larger and more resource-intensive flagship reasoning models. It delivers state-of-the-art results in math benchmarks, including AIME ’24/’25, HMMT ’25, and OMNI-Math-HARD. K2 Think follows UAE’s earlier models, such as (Arabic), NANDA (Hindi), and SHERKALA (Kazakh), expanding its portfolio of efficient, multilingual AI tools while building on the reproducible foundation laid by K2-65B, released in 2024.

Why It Matters: This development matters because it challenges the common assumption that only huge models (hundreds of billions of parameters) can deliver high reasoning performance. By achieving comparable results with fewer parameters, K2 Think offers a path toward more efficient, accessible, and sustainable AI. For businesses and researchers, this means lower cost of deployment, smaller infrastructure needs, and faster iteration.

The Cloud: the backbone of the AI revolution

OCI’s MLPerf Inference 5.0 benchmark results showcase exceptional performance source
Reaching Across the Isles: UK-LLM Brings AI to UK Languages With NVIDIA Nemotron source

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

AI-enabled Sales Assistant for an E-commerce Storefront

A generative AI assistant sits inside the storefront and guides shoppers from discovery to checkout. It understands natural language, recommends products, compares options, configures bundles, checks stock and delivery windows, and completes the order. The assistant answers policy questions and hands off to a human when needed. Leading retailers report adoption of conversational shopping and early gains from assistants.

Business Challenges:

Catalogs are large and complex to search. Filters confuse many visitors. Live chat teams are costly and cannot scale at peak times. Fragmented content produces inconsistent answers. Language coverage is limited. Slow guidance leads to cart abandonment. Survey data shows most shoppers now use AI to help them shop, which raises expectations for conversational help.

AI Solution:

The assistant interprets shopper goals and retrieves facts from the product catalog, attributes, pricing, inventory, reviews, and policies. It cites the source for every claim.
It suggests comparable items, explains trade-offs, and builds bundles or subscriptions. It adapts to constraints such as size, budget, and delivery date.
It uses session signals and past orders to tailor results. It respects privacy settings and logs all use of personal data.

Impact:

Revenue: Higher conversion and basket size through guided recommendations and faster answers.
User Experience: Shoppers get clear help in natural language with accurate information and live citations. Surveys show most shoppers now expect conversational support.
Operations: Fewer repetitive chats reach human agents, which frees them for complex cases. Klarna reports that AI handles most chats with very short resolution times.
Process: Standard responses and linked sources improve consistency and audit. Prebuilt agent patterns speed integration with existing retail systems
Cost: Lower handling time per question and fewer abandoned carts reduce support and acquisition waste.

Data Sources:

Product catalog and attributes.
Pricing and promotions.
Inventory and fulfillment data.
Images and rich media.
Reviews and rating summaries.
Policies for shipping returns and warranties.
Order history and session events with consent.
Search logs and clickstream for relevance tuning.

Strategic Fit:

The assistant converts existing content and data into real-time guidance that meets the rising expectations of shoppers. It protects brand trust by citing sources and deferring to experts when uncertain.

Favorite Tip Of The Week:

”We have to stop it taking over’

Geoffrey Hinton, the ‘Godfather of AI,’ discusses the past, present, and future of AI, including whether AI will ever be more intelligent than humans and whether we should do more to protect against the risks of superintelligent AI.

Potential of AI:

Albania has appointed Diellaworld’s””””, a virtual assistant powered by AI, as the world’s first AI-generated “minister” tasked with managing public procurement to fight corruption. Diella was launched in early 2025 as part of the e-Albania platform, helping citizens access online public services and issue hundreds of digital documents. Under this new role, Diella will gradually assume authority over public tenders, promising transparency, objectivity, and a tendering process “100 percent free of corruption.” Diella’s appointment reflects a bold step in redefining how government institutions can deploy AI for governance functions. The AI minister model raises legal, ethical, and operational questions, such as oversight, transparency of decisions, and how to ensure the system itself remains resistant to manipulation.

Why It Matters: demonstrates the transition from Albania. It shows how AI can move beyond assisting roles to assuming decision-making authority in public governance. For countries or organizations considering AI for oversight or regulatory functions, Albania’s experiment offers a real-world example of what’s possible and what to watch out for.

Things to Know…

Security Challenges in AI Agent Deployment: Insights from a Large-Scale Public Competition
A research team from Gray Swan AI and the UK AI Security Institute has worked on creating an Agent Red Teaming (ART) benchmark. A large-scale public red-teaming competition was conducted to stress-test 22 frontier AI agents across 44 realistic deployment scenarios, generating over 1.8 million prompt-injection attempts. More than 60,000 of these attacks were successful, resulting in policy violations such as unauthorized data access, financial misconduct, and compliance breaches, with most agents failing within just 10–100 queries. The study revealed that attacks were highly transferable, often succeeding across different agents and tasks, which underscores the systemic nature of these vulnerabilities. Interestingly, model size, compute power, or capability level were not reliable indicators of robustness; larger, more capable models were not inherently safer. To support the community, the authors introduced the ART benchmark (Agent Red Teaming benchmark) and an evaluation framework that enables standardized and repeatable testing of AI agents under adversarial conditions.

Source: https://arxiv.org/pdf/2507.20526

Why It Matters: This research highlights the urgent need for organizations to prioritize security before deploying AI agents in real-world environments. Since policy violations can occur quickly and attack patterns often work across models, defenses must be comprehensive and not limited to a single model type or vendor. The fact that larger models are not necessarily more secure should caution teams against relying solely on model sophistication as a safety measure. The availability of a standardized benchmark like ART provides a valuable tool for developers, researchers, and enterprises to test vulnerabilities early and build stronger guardrails.

AI in Business Tip

Checking the current AI capabilities in an Organization:

Before launching new AI initiatives, organizations should start by taking a clear inventory of their current AI capabilities. This includes identifying where AI is already in use, determining which workflows rely on automation, and identifying gaps in data readiness, infrastructure, and team skills.

Once this baseline is established, leaders can create a phased roadmap to expand AI adoption. The next step should be to select a few high-impact areas for pilots, set measurable goals for those pilots, and use the results to inform a broader rollout. This structured approach avoids wasted investment, ensures alignment with business objectives, and builds confidence across teams as they see early wins.

Quick Self-Assessment Checklist:

Do we have a current inventory of AI projects, tools, and workflows?

Are our data sources, accessCloud, and secure for AI use?

Do we have the infrastructure (Cloud, compute, APIs) to scale AI solutions?
Have we identified at least two high-impact use cases for the next phase?
Are there clear KPIs to measure success and guide future investment?

For a detailed assessment, follow the Generative AI Maturity Assessment

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 164 is “AI for Automation to Transform Business Operations with Aarti Anand“. Aarti was a product leader for 15 years and one day, decided to let it all go and start Kodenyx AI.

Apple | Amazon Music

AI for Automation to Transfo…

Aug 30 · OPEN Tech Talks: AI wort…

30:14

Courses to attend:

Building Towards Computer Use with Anthropic by DeepLearning AI. Throughout this course, you’ll explore the features that pave the way for computer use, from working with Anthropic’s API to multimodal prompting, prompt caching, and tool use, culminating in a demo that brings all these features together to create an AI assistant that relies on a computer.
AI Agents Course from Hugging Face. This free course will take you on a journey, from beginner to expert, in understanding, using, and building AI agents.

Events:

The AI Conference 2025, September 17-18, 2025, San Francisco, USA
TED Conference dedicated to Artificial Intelligence, September 24-26, 2025, Vienna, Austria
Gartner CIO & IT Executive Conference, October 6-8, 2025, Dubai, UAE
GITEX Global, October 13-17, 2025, Dubai, UAE
European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.

Tool / Product Spotlight

Tech and Tools…

MCP registry provides MCP clients with a list of MCP servers, like an app store for MCP servers.
Genkit is an open-source framework for building full-stack AI-powered applications, built and used in production by Google’s Firebase

The Investment in AI…

TENEX.AI, the AI-native cybersecurity company transforming security operations, announced its $27 million Series A funding. It offers a Managed Detection and Response (MDR) service that combines advanced agentic AI, automation, and expert human skills to provide faster detection, high-quality triage, and autonomous responses with human oversight.
LightSpun, an AI-powered dental insurance administration platform, has raised $13 million in Series A funding.

That’s it for this week – thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif

The opinions expressed here are solely my conjecture based on experience, practice, and observation. They do not represent the thoughts, intentions, plans, or strategies of my current or previous employers or their clients/customers. The objective of this newsletter is to share and learn with the community.

Early Generative AI Projects Are Failing and What to Do About It

Kashif Manzoor — Sun, 07 Sep 2025 09:48:16 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – What you need to know about Generative AI, without the noise!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Feeling overwhelmed by the constant stream of AI news? I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

Early Gen AI projects are failing
Generative AI Use Case: Transform Customer Feedback into Product Insights
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

xAI releases Grok-Code-Fast-1: A New Agentic AI for Speedy Coding

xAI, Elon Musk’s AI startup, launched Grok-Code-Fast-1, an agentic coding model designed for speed and efficiency. Built from scratch with programming-focused training data, this model excels at tool-driven coding flows, making rapid prototypes and iterative tasks feel nearly instantaneous. Launch partners, including GitHub Copilot, Windsurf, and Cursor, have early access, and xAI is initially offering unrestricted use to encourage adoption.

Why It Matters:

Grok-Code-Fast-1 is designed for real-world engineering speed, processing up to ~92 tokens per second with a response latency of around 67ms, which is an order of magnitude faster than most current models in coding tasks. Pricing as low as $0.20 per million input tokens and $1.50 per million output tokens (lower with cache) makes high-performance agentic AI accessible.

Early GenAI Projects Are Failing (95%): What to Do and How to Climb the Maturity Ladder

A recent report from MIT’s NANDA initiative, ‘State of AI in Business 2025,‘ got my attention, and it is also widely being referred to in several publications.

This report has revealed a strange truth:

95% of generative AI pilot projects deliver no measurable ROI or P&L impact.

Only 5% of companies successfully scale pilots into business value.

And this is despite $30-40 billion in enterprise GenAI investment.
Failures aren’t about weak models or regulations, but about poor execution tools that don’t integrate, don’t learn, and don’t fit into workflows.

Adoption is high (80% of organizations have tried LLMs), but true transformation is rare.

Only 5% reach production.

Source: State of AI in business 2025

Why Pilots Stall: The “Learning Gap”
Most GenAI tools lack memory, feedback loops, and adaptability.

Employees often turn to consumer tools like ChatGPT instead of “official” corporate systems.

This “shadow AI economy” refers to a scenario where nearly every employee utilizes AI, but not through company-initiated efforts.

What the 5% Get Right:

Workflow-first mindset: LLMs embedded into real processes.
Learning systems: AI that improves with feedback and memory.
External partnerships: Vendors that co-develop succeed more than internal builds.
Outcome focus: Measured in ROI (hours saved, costs reduced, retention gained).

Crossing the Chasm of Generative AI:
Three winning approaches:

Buy rather than build.
Empower line managers rather than central labs.
Select tools that integrate deeply and adapt over time.

The most innovative organizations are already testing agentic systems that can learn, remember, and act independently within set boundaries.

What To Do: Apply the GenAI Maturity Model

This is where the GenAI Maturity Model becomes a practical roadmap:

Level 1 – Aware: Run experiments, but don’t confuse pilots with strategy.
Level 2 – Exploring: Select high-value pilots with clear ROI metrics. Avoid vanity use cases.
Level 3 – Operational: Ensure tools integrate into workflows with feedback loops.
Level 4 – Integrated: Expand across functions, balancing human oversight and AI autonomy.
Level 5 / 6 – Autonomous / Transformative: Use agentic AI with persistent memory, adaptive learning, and orchestration across processes.

Call to Action:

The maturity framework helps executives identify their current position and chart their next steps. Instead of chasing hype, it creates a disciplined path from pilots to production.

Leaving it to you to share your feedback, views, how you are doing it, and what you are learning from the early projects.

Gen AI Maturity Framework:

A few more updates have been made to GenAIMaturity.Net, and you can try out Maturity Assessments. This entire portal is vibe-coded, and content is being reviewed and added frequently.

Weekly News & Updates…

Top Stories of the Week: Microsoft Unveils First Homegrown AI Models: MAI-Voice-1 & MAI-1-Preview

Microsoft introduced two in-house models under its new MAI initiative:

MAI-Voice-1: Speech model generating 1-minute audio in <1 second on a single GPU.
MAI-1-Preview: General-purpose text model trained across 15,000 H100 GPUs, now public for testing.

Why It Matters: Microsoft’s move reduces reliance on OpenAI. By owning both speech and text foundation models, Microsoft gains control over performance, features, and innovation cycles, paving the way for tighter OS-level integration of GenAI.

The Cloud: the backbone of the AI revolution

Use OpenAI’s Open Weight Models in OCI Data Science’s No-Code Interface source
How Do You Teach an AI Model to Reason? With Humans source

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

Transform Customer Feedback into Product Insights

Business Challenges:

Feedback is scattered across channels.
Manual reading is slow and misses signals.
Teams argue priorities without evidence.
Survey responses are short and have low signal.
Monthly/quarterly insight cycles.

AI Solution:

Aggregate reviews, tickets, surveys, posts.
LLMs cluster by feature/severity, with sample quotes.
Weekly briefs: top 5 issues, root cause hints, links.
Smarter follow-ups increase detail.
Push issues to the backlog, track resolution over time.

Impact:

Revenue: Faster fixes, higher retention, NPS gains.
User Experience: Clearer priorities, faster relief.
Operations: Analysts validate insights, not read raw text.
Process: Standard briefs align teams.
Cost: Lower manual review, fewer misdirected builds.

Data Sources:

Product reviews and app store comments.
Support tickets, chat logs, and email threads.
Survey responses from customer experience tools.
Community forums and social posts, where permitted by policy.
Product catalog, feature map, and release notes for mapping.

Strategic Fit:

This use case creates a continuous link between customers and the roadmap. It supports faster learning cycles and evidence-based decisions. It strengthens transparency with linked verbatim in every insight.

Favorite Tip Of The Week:

Coding with LLMs

Anthropic’s Boris Cherny, Claude Code, and Alex Albert discuss the current and future state of agentic coding, the evolution of coding models, and the design of Claude Code’s “hackability.”

Potential of AI:

Gemini 2.5 Flash Image: Google introduces Gemini 2.5 Flash Image, an image generation and editing model with multi-image fusion for consistency across sequences and storytelling, suitable for product scenes, explainers, and avatars. All outputs carry SynthID invisible watermarking for provenance.

Things to Know…

Experimentation Trap
Harvard Business Review warns that many teams are stuck in an AI experimentation trap, where scattered pilots do not tie to real outcomes. The fix is simple and strict. Fund fewer use cases and wire each one to an operating metric such as adoption, cycle time, unit cost, or error rate. Build for production from day one with workflow integration, change management, and human review, and keep a shared platform for data connectors and guardrails to avoid one-off builds. Replace demo metrics with audit-ready measures and use pass or kill rules on a fixed cadence. These steps turn pilots into durable value and help cut through the current skepticism around generative AI.

AI in Business Tip

Design for Handovers, Not Just Automation

Most failed GenAI pilots share the same flaw: they try to automate everything, but forget the handover points where humans and AI must work together.

What to Do:

Map where AI stops and humans pick up (approvals, judgment calls, escalations).
Add structured outputs (summaries, logs, confidence scores) so humans know precisely what the model did.
Treat the system as an assistant, not a black box.

Why It Works:

By designing for clean human-AI transitions, businesses reduce friction, avoid compliance issues, and build trust.

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 163 is “Building Conversational AI Chat Agents with Yam Marcovitz“. He is the co-founder and CEO of Parlant, an open-source platform that enables enterprises to build reliable, compliant, and predictable AI agents.

Apple | Amazon Music

Building Conversational AI C…

Aug 23 · OPEN Tech Talks: Technol…

31:13

Courses to attend:

CS324 – Large Language Models from Stanford. In this course, students will learn the fundamentals of modeling, theory, ethics, and systems aspects of large language models, as well as gain hands-on experience working with them.
Introduction to Deep Learning from MIT. In this course, Students will gain foundational knowledge of deep learning algorithms, practical experience in building neural networks, and an understanding of cutting-edge topics, including large language models and generative AI.

Events:

The AI Conference 2025, September 17-18, 2025, San Francisco, USA
TED Conference dedicated to Artificial Intelligence, September 24-26, 2025, Vienna, Austria
Gartner CIO & IT Executive Conference, October 6-8, 2025, Dubai, UAE
GITEX Global, October 13-17, 2025, Dubai, UAE
European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.

Tool / Product Spotlight

Tech and Tools…

WhisperLiveKit: Real-time speech transcription directly in your browser, with a ready-to-use backend and server, and a simple frontend.
Koog is a Kotlin-based framework designed to build and run AI agents entirely in idiomatic Kotlin. It enables you to create agents that can interact with tools, manage complex workflows, and communicate with users.

The Investment in AI…

Augment Secures $85 Million in Series A Funding to Improve AI Logistics Assistant “Augie”
Euclid Power raises $20M in Series A funding to accelerate renewable energy projects with its AI-driven platform and services.

That’s it for this week – thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif

Core Functions for Organizational AI Risk Management

Kashif Manzoor — Sun, 31 Aug 2025 16:36:26 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – highlighting what matters in Generative AI for business!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Feeling overwhelmed by the constant stream of AI news? I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

NIST Risk Management Framework
101 Generative AI Use Cases
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

NASA & IBM Release Surya: First Open-Source AI Model for Heliophysics

NASA and IBM have released Surya on Hugging Face, the world’s first open-source AI foundation model for heliophysics. Surya is a 366M-parameter transformer model, trained on nine years (~218 TB) of solar observational data from NASA’s Solar Dynamics Observatory (SDO). The training data covers 8 Atmospheric Imaging Assembly (AIA) channels and 5 Helioseismic and Magnetic Imager (HMI) products—providing a rich multi-instrument view of the Sun’s activity.

Why It Matters:

It is not just another foundation model; it’s the first time we’ve seen AI built specifically to decode the Sun at scale. Unlike general-purpose LLMs, this model is trained on nearly a decade of multi-instrument solar data, giving scientists a tool that can spot patterns humans would miss and run forecasts faster than physics-only simulations. As society’s dependence on satellites, GPS, aviation, and power grids grows, space weather forecasting is moving from “nice-to-have science” to “critical infrastructure defense.” It also sets a new template for domain-specific AI: if heliophysics can benefit from its own foundation model, climate, agriculture, and planetary defense may be next.

NIST AI Risk Management Framework Playbook

During this week, while working on one of the projects with the customer, questions arose about the risks of Gen AI and how to develop a framework within the organization to address key areas. This triggered me to go to the NIST AI Risk Management Framework (AI RMF) Playbook, which I had a chance to review a few months back, and it looked to me like a vital resource for organizations aiming to develop, deploy, and manage AI systems responsibly.

While specific to this customer scenario, I spent some time during the week, and we collectively had a few sessions on it and concluded that it provides actionable guidance to achieve the outcomes outlined in the AI RMF Core, focusing on four key functions: Govern, Map, Measure, and Manage. So here is what I am sharing, what I understood.

Source: NIST AI RMF Playbook

Map: Establishing Context for AI Risk Identification

The Map function is foundational, enabling organizations to understand the context in which an AI system operates and identify associated risks. By mapping the AI system’s purpose, usage, and stakeholders, organizations can pinpoint potential risks early in the lifecycle. This involves documenting system objectives, data sources, and stakeholder perspectives to ensure transparency and alignment with organizational goals. The Map function ensures that risks are framed within the specific context of the AI system, setting the stage for effective measurement and management.

Key Role: Provides a comprehensive understanding of the AI system’s context, enabling proactive risk identification and informing subsequent functions. Without this step, organizations may overlook critical risks stemming from system design or deployment settings.

Measure: Assessing and Monitoring AI Risks

The Measure function employs quantitative, qualitative, or mixed-method tools to analyze, assess, and monitor AI risks and their impacts. It builds on the context established in the Map function by evaluating system performance, trustworthiness, and potential biases. Regular testing before and after deployment ensures that AI systems align with trustworthy characteristics such as fairness, reliability, and security. By tracking metrics and documenting outcomes, organizations can maintain accountability and make data-driven decisions to mitigate risks.

Key Role: Enables organizations to quantify and monitor risks, ensuring systems remain trustworthy and compliant with organizational and regulatory standards throughout their lifecycle.

Manage: Mitigating and Responding to AI Risks

The Manage function focuses on allocating resources to address identified and measured risks, implementing plans for incident response, recovery, and continuous improvement. It leverages insights from the Map and Measure functions to prioritize risks and deploy mitigation strategies, such as regular monitoring, stakeholder feedback integration, and system updates. This function ensures that organizations can respond to incidents, reduce negative impacts, and enhance system resilience over time.

Key Role: Translates risk insights into actionable strategies, fostering resilience and accountability while minimizing system failures and societal impacts.

Key Takeaways for Organizational Implementation:

The Playbook is not a rigid checklist but a voluntary set of suggestions. Organizations should tailor their recommendations to their specific industry, use case, and risk tolerance, selecting only the actions that align with their needs.
Start with the Map function to establish a clear context for AI systems. Document system objectives, stakeholder perspectives, and data provenance to identify risks early and ensure alignment with organizational goals.
Use the Measure function to conduct regular testing and track metrics for trustworthiness, such as fairness and reliability. Incorporate standard software testing methods and stakeholder feedback to maintain system integrity.
Leverage the Manage function to create incident response, monitoring, and continuous improvement plans. Engage diverse stakeholders and document decisions to enhance transparency and accountability.
Integrate AI RMF functions into organizational policies and training programs. Senior leadership commitment and clear role assignments are critical to embedding a culture of responsible AI development.

Recipe for Organizational Implementation: To operationalize the NIST AI RMF Playbook:

Step 1: Familiarize and Assess: Study the AI RMF and Playbook to understand its functions. Identify all AI systems within your organization and assess their risk profiles.
Step 2: Map Risks: Document the context, purpose, and stakeholders for each AI system. Identify potential risks, including biases and societal impacts, using stakeholder input.
Step 3: Measure Performance: Implement testing protocols to evaluate system trustworthiness. Use metrics to monitor fairness, reliability, and security, and document results.
Step 4: Manage Risks: Develop mitigation strategies, including incident response and monitoring plans. Engage stakeholders regularly and update systems based on feedback.
Step 5: Embed Governance: Integrate AI RMF practices into organizational policies, ensuring senior-level support and ongoing training for AI actors.

By leveraging the Map, Measure, and Manage functions, organizations can build trustworthy AI systems that balance innovation with accountability, ensuring responsible deployment in alignment with their goals and societal values.

Gen AI Maturity Framework:

It is deployed on GenAIMaturity.Net, and you can try out Maturity Assessments. Several resources are available for you to go through. This entire portal is vibe-coded, and content is being reviewed and added frequently.

Weekly News & Updates…

Top Stories of the Week:

Cohere launched Command A Reasoning, a powerful 111B-parameter open-weight model designed for enterprise-grade reasoning. It supports tool integration, handles multilingual tasks (23 languages), and features a 256K token context window, making it ideal for long workflows and agent-based use. The model can toggle “reasoning” mode to trade off precision or speed, and runs effectively on a single H100 or A100 GPU.

Why It Matters: It’s built to think and act like an enterprise assistant. By offering reasoning, tool execution, and massive context length in one flexible package, Cohere lets companies consolidate AI workflows that used to require multiple models. It simplifies deployment, cuts costs, and scales automation without losing depth or accuracy. For businesses running AI internally, Command A Reasoning is a rare blend of power, efficiency, and control.

The Cloud: the backbone of the AI revolution

Delivering the Power of Frontier Models: Oracle’s Collaboration with Google. source
Think SMART: How to Optimize AI Factory Inference Performance source

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

101+ gen AI use cases with technical blueprints

Favorite Tip Of The Week:

AI Fluency: Framework & Foundations

Anthropic has teamed up with academic experts Prof. Joseph Feller from University College Cork and Prof. Rick Dakan from Ringling College to introduce an AI fluency course. This course provides practical skills for effective, efficient, ethical, and safe AI interaction. It offers valuable content for everyone, whether you’re new to Claude or an experienced AI user.

Potential of AI:

Tom Brown co-founded Anthropic after contributing to the development of GPT-3 at OpenAI. As a self-taught engineer, he improved from earning a B-minus in linear algebra to becoming a leading figure in AI’s scaling advances.

Things to Know…

Dubai’s Human-Machine Collaboration Icons (HMC)

Dubai Future Foundation, under the guidance of His Highness Sheikh Hamdan bin Mohammed bin Rashid Al Maktoum, has introduced the world’s first Human–Machine Collaboration (HMC) icon system. This visual framework allows creators to declare the level of AI involvement, ranging from “All Human” to “All Machine,” and identify specific content stages where AI contributed, such as ideation, data analysis, writing, visuals, and more.

Implementation is mandatory for all Dubai government entities, while creators worldwide are encouraged to adopt the icons voluntarily for transparency and accountability.

My Take

The HMC icons are more than labels; they’re a trust layer. As GenAI becomes ubiquitous in content creation, everyone needs clarity, not catchy slogans. These icons deliver that clarity: simple, standardized, and scalable.

Therefore, AI Tech Circle will begin adopting HMC icons across this newsletter. I am committed to declaring human vs. AI involvement explicitly.

AI in Business Tip

Generative AI Beyond Chatbots

Most people still equate Generative AI with chatbots that answer questions. But its real business value is emerging in less visible, workflow-transforming roles:

AI can unify fragmented data across PDFs, intranets, and SaaS tools, turning unstructured knowledge into decision-ready summaries. That’s a CFO’s dashboard upgrade, not a chatbot.
Agentic AI systems now act as coordinators: filing expense reports, updating CRMs, reconciling invoices, or scheduling campaigns. This is back-office automation with human-like flexibility.
AI drafts product sketches, generates regulatory documents, or simulates scenarios for engineering teams. These aren’t conversations; they’re accelerators for innovation pipelines.
LLMs monitor transactions, contracts, or communications in real time, flagging anomalies before auditors do. This reduces exposure in ways old rule-based systems never could.

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 162 is “The Importance of Data Sovereignty in AI Workflows with Giorgio Natili”. He is Vice President and Head of Engineering at Opaque Systems.

Apple | Amazon Music

The Importance of Data Sover…

Aug 16 · OPEN Tech Talks: Technol…

15:30

Courses to attend:

Large Language Models (LLMs) from Hugging Face
Practical Deep Learning for Coders from Fast AI

Events:

The AI Conference 2025, September 17-18, 2025, San Francisco, USA
TED Conference dedicated to Artificial Intelligence, September 24-26, 2025, Vienna, Austria
Gartner CIO & IT Executive Conference, October 6-8, 2025, Dubai, UAE
GITEX Global, October 13-17, 2025, Dubai, UAE
European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.

Tool / Product Spotlight

Tech and Tools…

Airi: Self-hosted, you owned Grok Companion, a container of souls of waifu, cyber living to bring them into our world.
Sim is an open-source AI agent workflow builder. Sim’s interface is a lightweight, intuitive way to rapidly build and deploy LLMs that connect with your favorite tools.

The Investment in AI…

TinyFish, an AI startup, has raised $47 million in Series A funding to expand its platform for creating and deploying AI-powered web agents.
Firecrawl has secured a $14.5 million Series A funding round. It is a developer platform that unlocks web data for developers and AI agents.

That’s it for this week – thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif

Core Functions for Organizational AI Risk Management

Kashif Manzoor — Sun, 24 Aug 2025 17:46:20 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – highlighting what matters in Generative AI for business!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Feeling overwhelmed by the constant stream of AI news? I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

NIST Risk Management Framework
101 Generative AI Use Cases
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

NASA & IBM Release Surya: First Open-Source AI Model for Heliophysics

Why It Matters:

NIST AI Risk Management Framework Playbook

Source: NIST AI RMF Playbook

Map: Establishing Context for AI Risk Identification

Key Role: Provides a comprehensive understanding of the AI system’s context, enabling proactive risk identification and informing subsequent functions. Without this step, organizations may overlook critical risks stemming from system design or deployment settings.

Measure: Assessing and Monitoring AI Risks

Key Role: Enables organizations to quantify and monitor risks, ensuring systems remain trustworthy and compliant with organizational and regulatory standards throughout their lifecycle.

Manage: Mitigating and Responding to AI Risks

Key Role: Translates risk insights into actionable strategies, fostering resilience and accountability while minimizing system failures and societal impacts.

Key Takeaways for Organizational Implementation:

The Playbook is not a rigid checklist but a voluntary set of suggestions. Organizations should tailor their recommendations to their specific industry, use case, and risk tolerance, selecting only the actions that align with their needs.
Start with the Map function to establish a clear context for AI systems. Document system objectives, stakeholder perspectives, and data provenance to identify risks early and ensure alignment with organizational goals.
Use the Measure function to conduct regular testing and track metrics for trustworthiness, such as fairness and reliability. Incorporate standard software testing methods and stakeholder feedback to maintain system integrity.
Leverage the Manage function to create incident response, monitoring, and continuous improvement plans. Engage diverse stakeholders and document decisions to enhance transparency and accountability.
Integrate AI RMF functions into organizational policies and training programs. Senior leadership commitment and clear role assignments are critical to embedding a culture of responsible AI development.

Recipe for Organizational Implementation: To operationalize the NIST AI RMF Playbook:

Step 1: Familiarize and Assess: Study the AI RMF and Playbook to understand its functions. Identify all AI systems within your organization and assess their risk profiles.
Step 2: Map Risks: Document the context, purpose, and stakeholders for each AI system. Identify potential risks, including biases and societal impacts, using stakeholder input.
Step 3: Measure Performance: Implement testing protocols to evaluate system trustworthiness. Use metrics to monitor fairness, reliability, and security, and document results.
Step 4: Manage Risks: Develop mitigation strategies, including incident response and monitoring plans. Engage stakeholders regularly and update systems based on feedback.
Step 5: Embed Governance: Integrate AI RMF practices into organizational policies, ensuring senior-level support and ongoing training for AI actors.

Gen AI Maturity Framework:

Weekly News & Updates…

Top Stories of the Week:

The Cloud: the backbone of the AI revolution

Delivering the Power of Frontier Models: Oracle’s Collaboration with Google. source
Think SMART: How to Optimize AI Factory Inference Performance source

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

101+ gen AI use cases with technical blueprints

Favorite Tip Of The Week:

AI Fluency: Framework & Foundations

Potential of AI:

Things to Know…

Dubai’s Human-Machine Collaboration Icons (HMC)

Implementation is mandatory for all Dubai government entities, while creators worldwide are encouraged to adopt the icons voluntarily for transparency and accountability.

My Take

Therefore, AI Tech Circle will begin adopting HMC icons across this newsletter. I am committed to declaring human vs. AI involvement explicitly.

AI in Business Tip

Generative AI Beyond Chatbots

Most people still equate Generative AI with chatbots that answer questions. But its real business value is emerging in less visible, workflow-transforming roles:

AI can unify fragmented data across PDFs, intranets, and SaaS tools, turning unstructured knowledge into decision-ready summaries. That’s a CFO’s dashboard upgrade, not a chatbot.
Agentic AI systems now act as coordinators: filing expense reports, updating CRMs, reconciling invoices, or scheduling campaigns. This is back-office automation with human-like flexibility.
AI drafts product sketches, generates regulatory documents, or simulates scenarios for engineering teams. These aren’t conversations; they’re accelerators for innovation pipelines.
LLMs monitor transactions, contracts, or communications in real time, flagging anomalies before auditors do. This reduces exposure in ways old rule-based systems never could.

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 162 is “The Importance of Data Sovereignty in AI Workflows with Giorgio Natili”. He is Vice President and Head of Engineering at Opaque Systems.

Apple | Amazon Music

The Importance of Data Sover…

Aug 16 · OPEN Tech Talks: Technol…

15:30

Courses to attend:

Large Language Models (LLMs) from Hugging Face
Practical Deep Learning for Coders from Fast AI

Events:

The AI Conference 2025, September 17-18, 2025, San Francisco, USA
TED Conference dedicated to Artificial Intelligence, September 24-26, 2025, Vienna, Austria
Gartner CIO & IT Executive Conference, October 6-8, 2025, Dubai, UAE
GITEX Global, October 13-17, 2025, Dubai, UAE
European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.

Tool / Product Spotlight

Tech and Tools…

Airi: Self-hosted, you owned Grok Companion, a container of souls of waifu, cyber living to bring them into our world.
Sim is an open-source AI agent workflow builder. Sim’s interface is a lightweight, intuitive way to rapidly build and deploy LLMs that connect with your favorite tools.

The Investment in AI…

TinyFish, an AI startup, has raised $47 million in Series A funding to expand its platform for creating and deploying AI-powered web agents.
Firecrawl has secured a $14.5 million Series A funding round. It is a developer platform that unlocks web data for developers and AI agents.

That’s it for this week – thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif

VIBE Coding Gen AI Maturity Portal Progress and Pitfalls

Kashif Manzoor — Sun, 17 Aug 2025 17:43:44 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – highlighting what matters in Generative AI for business!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Feeling overwhelmed by the constant stream of AI news? I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

Building Generative AI Maturity Portal
3 Generative AI Use Cases from the UK Gov
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

Dual Release Explored: OpenAI’s GPT-5 & GPT-OSS

OpenAI debuted GPT‑5, a unified model featuring a real-time router that dynamically switches between quick and deep reasoning modes. It delivers expert-level performance across coding, math, health, writing, visual perception, and other domains. Available across ChatGPT tiers and via API, GPT‑5 offers “thinking” and “mini” versions to suit varied workloads.
For the first time since GPT‑2, OpenAI released gpt-oss‑120B and gpt-oss‑20B, open-weight models under Apache 2.0 license. Both are optimized for reasoning, tool use, and local deployment, with gpt-oss‑120B running on a single 80 GB GPU and matching earlier OpenAI models in benchmarks.

Why It Matters:

GPT‑5 raises the bar for enterprise use with sophisticated, multimodal reasoning and performance improvements. It’s now ready for scenarios demanding deeper accuracy like analytics, automated coding, or healthcare prompts, making it a strategic upgrade consideration.
GPT‑OSS unlocks access to full model weights for the first time from OpenAI. This enables developers and organizations to customize, run, and integrate foundational models within their environments.
By pairing proprietary power (GPT-5) with openness (GPT-OSS), OpenAI hedges between enterprise upgrade incentives and mass accessibility. Organizations can balance performance vs. control, choosing APIs for scale or open weights for deep customization and privacy.
GPT-5’s incremental leap, despite being advanced, challenges hype around explosive AI breakthroughs, a reminder that gradual yet reliable improvements matter. Meanwhile, GPT-OSS signals a shift toward broader participation and innovation in the AI ecosystem.

VIBE Coded the entire Gen AI Maturity Portal

This journey aims to develop a Gen AI Maturity Model or framework with the support and effort of colleagues, friends, and leadership teams from several organizations. A few weeks back, I started vibe coding the entire project, and you have seen the progress during the last week.

Earlier work:

This week, more progress has been made, and you can access the portal, try out what is working and what is not, and let me know.

This entire project is being video-coded with Claude Code, from ideation to design to deployment on the VMs on the Cloud.

The AI Agent automatically deployed the code, built the Docker, SSL configurations, etc, etc. All steps were done with the Agent.

It is challenging, as humans work differently, whereas the Agents’ memory processes are different. These agents, as of today, sometimes lack Context and start coding from scratch where there is already code or a feature that has been developed.

So, it’s sometimes a mess. You’re stuck in a loop, as it keeps coding; however, the vibes are good, and I am also learning…

It is deployed on GenAIMaturity dot net, and with some early issues, is being worked out.

Try out and share your feedback and ideas to improve.

Weekly News & Updates…

Top Stories of the Week:

Grok Imagine Sparks Deepfake Controversy: Elon Musk’s xAI released Grok Imagine, allowing Android Premium users to generate images and videos, including via a “Spicy” mode for NSFW content. It already produced celebrity deepfakes that prompted urgent calls for regulation.

Why It Matters: This release highlights the thin line between creative AI tools and harmful misuse. Businesses working with generative media must bake in safety and ethical design, or risk regulatory backlash.

Safety First: Claude Opus 4.1 Adds Self-Termination Feature: Anthropic enabled Claude Opus 4 and 4.1 to end “persistently harmful or abusive” chats, prioritizing model integrity in rare extreme cases.

My Take: AI systems can now protect themselves, not just users. Embedding such defenses sets a new standard for trust-building in AI applications. Claude Opus 4.1 doesn’t just code better; it makes autonomous agents safer by stopping harmful interactions.

Genie 3 Creates Living 3D Worlds from Prompts: DeepMind’s Genie 3 generates 720p, real-time 3D environments, and text or image prompts become fully navigable worlds with memory and interactivity.

Why It Matters: This blurs the line between content generation and immersive simulation. Industries like training, robotics, and education have a new, accessible pathway to deploy dynamic virtual experiences.

ElevenLabs Music Raises the Bar for AI Audio: Eleven Music lets creators generate studio-quality music from text prompts with licensing deals for rights clearance baked in. This isn’t just generative sound, it’s accountable sound. For brands and content creators, that level of legal clarity is rare and powerful.

The Cloud: the backbone of the AI revolution

Delivering the Power of Frontier Models: Oracle’s Collaboration with Google source
What Is NVIDIA’s Three-Computer Solution for Robotics? source

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

I like the three Gen AI use cases that the UK Government has announced.

Source: X post from Department for Science, Innovation and Technology, UK

Favorite Tip Of The Week:

Make Dev Tools Your Own with Groq Code CLI

Try Groq Code CLI. It’s an open-source, lightweight coding interface you can customize yourself, perfect for testing and tweaking in your workflow. It lets you manage the tool the way you want it, not only to use it.

Potential of AI:

Anthropic co-founderJared Kaplan on scaling and the road to human-level AI at AI Startup School in San Francisco

Things to Know…

OpenAI Harmony Format for gpt-oss Models

OpenAI introduced the Harmony Response Format, a structured prompt and output schema explicitly designed for the GPT-OS Open-Weight models. The format clearly defines roles (system, developer, user, assistant, tool) and channels (final, analysis, commentary), and relies on special tokens to ensure correct model behavior. This format must be used correctly for gpt‑oss to function as intended.

Why It Matters

If you’re deploying gpt-oss on your infrastructure or via providers like Ollama or vLLM, understanding Harmony is now essential. It ensures your agent workflows, tool calls, and reasoning chains execute reliably. Getting the format wrong can lead to failures in prompting, tool use, or odd chain-of-thought outputs.

Harmony bridges the gap between OpenAI’s internal logic and open-source deployment, making advanced reasoning and API-like behavior possible at scale.

AI in Business Tip

Model Risk Management

As enterprises scale their use of Large Language Models (LLMs), the risks shift from experimental to systemic. Misuse, model drift, bias, and operational failures can erode trust and expose organizations to regulatory, reputational, and financial consequences. Effective LLM Risk Management is no longer optional—it’s part of corporate resilience.

Encourage your teams to build or get vendors to build MVPs in weeks, not months, with clear success/failure checkpoints. Be ready and accept that many will fail, but each failed MVP will sharpen your understanding of what works in your business. Success in GenAI isn’t about a perfect first launch – it’s about learning velocity.

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 161 is “The Impact of Generative AI on Education and Teaching Methods with Craig Mattson”.

Apple | Amazon Music

The Impact of Generative AI…

Aug 9 · OPEN Tech Talks: Technol…

27:11

Courses to attend:

Claude Code: A Highly Agentic Coding Assistant. In this course, Use Claude code to explore, develop, test, refactor, and debug codebases, and extend the capabilities of Claude Code with MCP servers such as Playwright and Figma MCP servers.

Events:

TED Conference dedicated to Artificial Intelligence, September 24-26, 2025, Vienna, Austria
Gartner CIO & IT Executive Conference, October 6-8, 2025, Dubai, UAE
GITEX Global, October 13-17, 2025, Dubai, UAE
European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.

Tool / Product Spotlight

Tech and Tools…

Archon serves as the command center for AI coding assistants, providing a sleek interface to manage knowledge, context, and tasks. For AI assistants, it functions as a Model Context Protocol (MCP) server to collaborate and share information. Connect Claude Code, Kiro, Cursor, Windsurf, etc., to enable your AI agents to access these resources.

The Investment in AI…

NeoLogic secures $10 million in Series A funding to advance energy-efficient server CPU development.
USD.AI Secures $13M Funding to Grow GPU-Backed Stablecoin Lending Platform

That’s it for this week – thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif

Assess your Organization’s Generative AI Maturity

Kashif Manzoor — Sun, 10 Aug 2025 16:18:30 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – highlighting what matters in Generative AI for business!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Feeling overwhelmed by the constant stream of AI news? I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

Building Generative AI Maturity Portal
Generative AI Use Case
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

Alibaba Launches Wan2.2: Open‑Source MoE Video Generation with Cinematic Control

Alibaba unveiled Wan2.2, the first open-source video-generation model built on a Mixture-of-Experts (MoE) architecture. The suite includes multiple models, text-to-video (T2V-A14B), image-to-video (I2V-A14B), and a hybrid (TI2V‑5B), trained on dramatically more aesthetic data. It offers advanced control over lighting, composition, camera settings, and physical realism, while reducing compute usage by up to half per video frame. Wan2.2 models are freely available under open-source licensing on platforms like Hugging Face and Alibaba’s ModelScope.

27B parameters in total, but only 14B activate at each step, cutting the computational needs roughly in half.
Fine-grained prompt controls for lighting, focal length, and scene composition for creators and developers.
The 5B parameter model (TI2V‑5B) can generate 5-second 720p video in minutes on a consumer GPU, making high-quality AI video accessible to smaller teams.

Why It Matters: This release marks a turning point: it democratizes cinematic-quality AI video for teams outside mega tech companies. With its open-source license and fine control over aesthetics, Wan2.2 unlocks new capabilities in marketing, product storytelling, training content, and rapid prototyping.

High-def video creation using consumer-grade hardware dramatically lowers the barrier to entry.
Teams can specify cinematic look and feel via prompts—no manual post-editing required.
Enterprises and governments can run models internally without vendor lock-in or API restrictions.

Wan2.2 establishes video generation as a practical, controllable, and open-source tool for real-world business use.

Knowing on which level you are, the best action to achieve more

During the last few months, I have covered the fundamentals of the Generative AI Maturity framework and how to run and plan your organization’s AI maturity. It is very essential for every organization to estimate how the efforts are being made in the organization, and you can plan rather than making random efforts.

This journey aims to develop a Gen AI Maturity Model or framework with the support and effort of colleagues, friends, and leadership teams from several organizations.

Earlier work:

While working on the AI maturity framework, I realized that all the resources could be in one place, making it the single source for everyone. This idea sparked the development of GenAIMaturity.com. Today, I am sharing the MVP version of this.

Most of the part is vibe coded with the Claude Code, and let me share with you how this project started.

As is being developed with the Claude Code, here you can have a glimpse of markdown files prepared with the specifications, architecture, etc, etc.

And you can review the admin panel of the Gen AI Maturity Portal.

It is deployed on GenAIMaturity dot net, and with some early issues, is being worked out.

Another option is also available; you can download the Excel template from this post and you can do the self-assessment.

In the coming weeks, I will keep you posted on the progress and the rest of the learning. My target is to complete development in sprints and keep releasing it publicly over GenAIMaturity

Try out and share your feedback and ideas to improve.

Weekly News & Updates…

Top Story of the Week:

Cohere released Command A Vision, a 112B-parameter dense language model optimized for enterprise image understanding tasks. It supports high-accuracy analysis of documents, graphs, diagrams, photos, and PDFs using open weights and private deployment options. In benchmarks, it outperformed GPT-4.1, LLaMA 4 Maverick, Mistral Medium, and Pixtral Large, scoring nearly 96% on Document VQA and 73.5% on MathVista. The model runs work-ready with just two A100 GPUs or one H100 in 4-bit mode, making it accessible for businesses without massive infrastructure.

Why it Matters: This release transforms how businesses handle “visual dark data” – unstructured visual content like scanned forms, diagrams, or charts. Instead of custom OCR pipelines or manual extraction, enterprise teams can now deploy a plug-in model that accurately and efficiently understands and extracts structured data. Command A Vision bridges the gap between language models and visual workflows, unlocking automation in fields like finance, legal, construction, and manufacturing.

The Cloud: the backbone of the AI revolution

Enterprise application workflows with Agentic AI source
Wired for Action: Langflow Enables Local AI Agent Creation on NVIDIA RTX PCs source

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

Generative AI in the financial services industry, Insurance claims

Use Case Description: Generative AI handles routine insurance claims from first notice to settlement. It reads photos or scanned forms, extracts facts, compares them with policy terms, drafts the payment recommendation, and sends a clear summary to an employee. The employee reviews and approves or edits before funds move.

Business Challenges:

Claims teams face high volume peaks that slow payouts.
Manual data entry from photos, PDFs, and medical bills introduces errors.
Customers expect near-real-time updates on simple claims.
Fraud checks add extra steps and prolong the cycle.
Rework after missed details raises cost and damages satisfaction.

Expected Impact / Business Outcome:

Revenue: Faster settlement improves retention and frees capacity for growth lines.
User Experience: Customers get clear status updates and quicker payouts, which raises satisfaction scores.
Operations: Adjusters focus on complex cases while AI handles straightforward ones, improving throughput.
Process: Standard templates reduce variation and provide full audit trails for regulators.
Cost: Lower manual labor per claim and fewer escalations cut average handling cost.

Required Data Sources:

Claim photos and scanned forms.
Policy terms and coverage tables.
Historical payout records and fraud labels.
Repair cost databases and parts prices.
External data, such as weather or police reports, for context.

Strategic Fit and Impact: The solution supports insurer goals to reduce combined ratio and lift net promoter scores. It aligns with regulatory demands for transparent automated decisions because every AI step stores linked evidence. Early movers report measurable gains in speed and cost while maintaining human control for fairness reviews.

Favorite Tip Of The Week:

Benchmark the Right Way

Cohere’s “AI Benchmarks: A Business Guide to Effective Evaluation” provides an excellent resource on the limitations of public benchmarks. The real value lies in building custom tests that reflect your actual workflows and priorities, whether coding accuracy, document summarization, or compliance checks.

Measure each model’s performance on real tasks, right from your data. That’s how you avoid surprises in production and make AI decisions rooted in your business reality.

Potential of AI:

Mustafa Suleyman, founder of DeepMind, co-founder of Inflection AI, and now CEO of Microsoft AI, will talk about the future of artificial intelligence.

Things to Know…

Kimi K2: Open-Source Agentic AI with MoE Efficiency

Moonshot AI released Kimi K2, an open-source Mixture-of-Experts (MoE) model with 1 trillion total parameters (32B active). It ranks state-of-the-art (SOTA) on SWE-Bench Verified, Tau2, and AceBench among open models, showing particular strength in coding and agentic workflows. While multimodal input and “thought-mode” (chain-of-thought reflection) are not yet supported, Kimi K2 is optimized for high-performance reasoning and tool use.

Why It Matters

Kimi K2 lowers the barrier for organizations to experiment with high-parameter agentic AI without vendor lock-in or API dependencies. Its MoE architecture activates only a subset of experts per task, balancing performance and resource efficiency. This empowers AI teams to build more capable automation agents in coding, research, and multi-step workflows on their own infrastructure.

AI in Business Tip

Let Teams Build Fast, Fail Faster with GenAI MVPs

The best way to find valuable GenAI or Agentic AI use cases is not through long planning cycles and keeping getting demos from the vendors; it’s through quick, small experiments.

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 160 is “Building AI-driven Hedge Fund Infrastructure with Jakub Polec”.

Apple | Amazon Music

Building AI-driven Hedge Fun…

Aug 2 · OPEN Tech Talks: Technol…

26:40

Courses to attend:

Post-training of LLMs from Deep Learning, this course understands when and why to use post-training methods like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning.
AI as an ultimate form of leverage, Lecture at Cornell by Hyung Won Chung, Research Scientist, OpenAI

Events:

TED Conference dedicated to Artificial Intelligence, September 24-26, 2025, Vienna, Austria
Gartner CIO & IT Executive Conference, October 6-8, 2025, Dubai, UAE
GITEX Global, October 13-17, 2025, Dubai, UAE
European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.

Tool / Product Spotlight

Tech and Tools…

Eino LLM application development framework in Golang
Dyad is a local, open-source AI app builder. It’s fast, private, and entirely under your control, like Lovable, v0, or Bolt, but running right on your machine.
Claude Code Router is a powerful tool to route Claude Code requests to different models and customize any request.

The Investment in AI…

Legion has raised $30M in Series A funding to train AI on how security teams operate. Legion is creating a browser extension AI SOC (Security Operations Center) companion that transforms in-house expertise into scalable automation.
Positron AI, the leading American manufacturer of semiconductors and inference hardware, secures $51.6 million in an Oversubscribed Series A

That’s it for this week – thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif

Building Agentic AI to run this newsletter

Kashif Manzoor — Sun, 27 Jul 2025 19:18:36 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – highlighting what matters in Generative AI for business!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Over the last few weeks, the summer vacation period has begun as summer started, and kids were off from school. We took a break for a few weeks to visit my parents & other family members, as it is always refreshing & a blessing to spend time with my mother. I am back this week, and here is the weekly newsletter.

Feeling overwhelmed by the constant stream of AI news? I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

Seven coordinated agents for discovery, curation, summarization, and review
Generative AI Use Case
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

America’s AI Action Plan

The White House has released “Winning the Race: America’s AI Action Plan,” a 24‑page policy roadmap laying out over 90 near‑term federal actions to secure U.S. leadership in artificial intelligence. The plan focuses on three strategic pillars:

Accelerating AI Innovation
Building American AI Infrastructure
Leading in Global AI Diplomacy & Security

Why It Matters: This strategy signals more than rhetoric; it’s a national acceleration plan. By removing policy bottlenecks, investing in infrastructure, and promoting U.S. innovation abroad, the plan sets the stage for AI to drive economic growth, global competitiveness, and national security.

Immediate operational impact for tech firms, energy providers, and manufacturing with streamlined permitting and procurement.
Emerging opportunities for startups and enterprises to contribute to open-source AI and federal R&D pipelines.
Strategic caution is warranted, as changes to governance frameworks, such as removing safeguards for DEI, climate, and misinformation, pose significant legal and reputational risks that require careful oversight.

Use of Agentic AI for the Preparation of this Newsletter

Over the last 1 year, I have been creating content for this newsletter manually, like keeping notes of the weekly top news and any other interesting things that I feel are worth sharing. It is time-consuming, and then I spend 8-10 hours over the weekend putting everything together. Gen AI is also being used in a few areas to get assistance from Gen AI. However, all this is manual and quite fatiguing, especially when working on different browsers, searching for something interesting, reading/reviewing it, and then selecting key points to include in this newsletter.

I thought of how Gen AI can be utilized for this purpose. With this idea in mind, I set a target to create Agents to assist me from content curation to summarization, followed by human oversight review, and finally, adding it to the AI Tech Circle newsletter.

Based on this mindset, I have begun to vibe-code the entire workflow, and here is the progress so far.

The 7 AI Agents:

1 – Content Discovery Agent

Purpose: Scrapes tech websites, RSS feeds, and social media
What it does: Finds new tech content automatically
Sources: manually enter the trusted sources

2 – Web Scraping Agent

Purpose: Extracts content from web pages
What it does: Gets full articles, metadata, and links
Tools: Cheerio, Puppeteer for dynamic content

3 – Quality Agent

Purpose: Evaluates content quality and relevance
What it does: Scores articles (1-10), filters out low-quality content
Criteria: Readability, accuracy, relevance, freshness

4 – Curation Agent

Purpose: Selects the best content for newsletters
What it does: Picks top articles, removes duplicates, and organizes by topic
Output: Curated list of high-quality articles

5 – Coordination Agent

Purpose: Orchestrates the entire workflow
What it does: Manages task flow between agents, handles errors
Think of it as: The “conductor” of the AI orchestra

6 – Tech News Discoverer Agent

Purpose: Specialized in finding breaking tech news
What it does: Monitors real-time sources for urgent updates
Priority: High-importance, time-sensitive content

7 – Newsletter Generation Agent

Purpose: Create the final newsletter
What it does: Writes summaries, organizes content, formats newsletter
Output: Complete newsletter ready for distribution

Here is the Different dashboard:

Agent Control Dashboard

Centralized control and monitoring of all AI agents.

Key Features:

Agent Status Monitoring: Real-time status of all agents (idle, running, error, stopped)
Performance Metrics: CPU usage, memory usage, task completion rates
Bulk Operations: Start, stop, or restart multiple agents simultaneously
Content Discovery Trigger: Automated content discovery across sources
Agent Analytics: Detailed performance analytics and health monitoring
Error Recovery: Automatic restart of failed agents
Agent Configuration: Individual agent settings and capabilities

Use Cases:

Monitoring agent health and performance
Triggering automated content discovery
Managing agent lifecycle and operations
Debugging agent issues

Content Pipeline Viewer

Real-time visualization and management of the content processing pipeline.

Pipeline Visualization: Visual representation of content flow
Status Tracking: Real-time status updates (pending, processing, approved, rejected, published)
Content Statistics: Comprehensive analytics and metrics
Filtering & Search: Advanced filtering by status, date, priority
Version History: Track content changes and versions
Approval Workflow: Streamlined approval and rejection process
Agent Integration: See which agents are processing content
Performance Analytics: Processing time analysis and optimization

Use Cases:

Monitoring content processing status
Approving or rejecting content
Analyzing content pipeline performance
Tracking content version history

Content Manager

Comprehensive content creation, editing, and management interface

Content Creation: Manual content creation with templates
Content Enhancement: AI-powered content improvement tools
Source Management: Integration with content sources
Content Templates: Pre-defined templates for different content types
Tagging System: Organize content with tags and categories
Scheduling: Schedule content for future publication
Workflow Integration: Seamless integration with approval workflows

Use Cases:

Creating new newsletter content
Enhancing existing articles
Managing content sources and feeds
Organizing content by categories

Content Workflow Manager

Specialized workflow management for content-specific operations

Content Enhancement Workflows: Automated content improvement processes
Content Creation Workflows: Streamlined content generation pipelines
Comparison Tools: Side-by-side content comparison and analysis
Priority Management: Queue management with priority levels (low/medium/high)
Batch Processing: Handle multiple content items simultaneously
Workflow Templates: Pre-configured workflows for common content tasks

Use Cases:

Enhancing existing content with AI
Creating new content from templates
Comparing different content versions
Managing content approval workflows

How They Work Together

Content Discovery → Agent Control triggers content discovery
Content Processing → Content Pipeline tracks processing status
Content Enhancement → Content Workflow Manager handles improvements
Workflow Orchestration → Workflow Manager coordinates all processes
Content Management → Content Manager provides final editing and approval

In the coming weeks, I will keep you posted on the progress and the complete architecture. My target is to complete development in sprints and use it for this newsletter preparation.

Weekly News & Updates…

Top Story of the Week:

Alibaba’s Qwen team introduced a massive 480B-parameter AI coding model, Qwen3-Coder-480B-A35B-Instruct, trained on 35B tokens and designed to perform high-level software development tasks across more than 90 programming languages. This marks one of the largest publicly disclosed open-source code LLMs to date.

Why it Matters: This release accelerates the open-source race in code generation and agentic development. A model of this scale enables the development of more advanced agent workflows outside U.S.-centric ecosystems, such as those offered by OpenAI or Anthropic. It also allows enterprises to explore sovereign AI coding capabilities without relying on commercial APIs, which is essential for IP control, cost efficiency, and compliance.

The Cloud: the backbone of the AI revolution

Stargate advances with 4.5 GW partnership with Oracle, source
Meta is expanding its AI infrastructure and has adopted a novel approach of building weather-proof tents to house GPU clusters. This enables us to get new data centers online in months instead of years. source

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

Procurement Assistant with Bid Integrity Analytics

Use Case Description: A procurement assistant drafts scopes of work, evaluation criteria, and contract clauses with citations to the source rule or template. It also screens bids for warning signs of collusion such as bid rotation, identical pricing, and repeated text patterns. Reviewers receive a short brief, linked sources, and an audit trail for every change. The design adheres to current public guidance and pilot programs in the United Kingdom and the United States.

Business Challenges:

A heavy drafting workload and short timelines often lead to the reuse of outdated text.
Rules change frequently, and teams must demonstrate how every clause aligns with policy.
Large bid volumes make it challenging to detect collusion through manual checks.
Buyers must increase transparency about when AI is used

Expected Impact / Business Outcome:

Revenue: Better specifications and scoring reduce delivery failures and help agencies protect value for money. OECD notes that stronger design and detection reduce losses from collusion.
User Experience: Buyers get clean drafts with sources in minutes. Evaluators receive short bid summaries and clear risk notes. Suppliers see more precise instructions through consistent templates.
Operations: Draft cycles shorten. Reviews become repeatable.
Process: Mandatory clauses and transparency questions are enforced by the rules engine. Records support audits and freedom of information requests.
Cost: Less manual drafting and earlier detection of suspect bids lower staff effort and reduce the risk of overpayment.

Required Data Sources:

Current procurement regulations, policy notes, and model contracts.
Template libraries and prior tenders, clarifications, and evaluation notes.
Historic bids, awards, supplier registries, debarment lists, and price indices.
Competition authority decisions and public case reports for training of red flag patterns.

Strategic Fit and Impact: The assistant supports national goals for the safe use of AI while enhancing productivity and trust. It supports productivity goals by freeing officials for market engagement and negotiation. It strengthens integrity by adding systematic collusion screening and complete audit trails. It also prepares organizations for new executive orders and policy updates that require contract language on AI compliance and neutrality.

Favorite Tip Of The Week:

Agents Work Better When They Talk to Each Other

Instead of building a single large AI agent to handle everything, create multiple smaller/sub-agents with distinct roles, such as researcher, planner, and executor, and let them collaborate in my above example, where I am creating several AI Agents for Content research, content filtering, etc, and then orchestrating them to work along with each other.

This “multi-agent” design enhances reliability, detects errors early, and simulates how real teams operate. It also makes it easier to test, monitor, and improve each part of your system over time. Simple idea, powerful payoff.

Potential of AI:

Demis Hassabis, CEO of Google DeepMind, gave an interview to Lex Fridman, covering a wide range of topics, including the future of AI & AGI, simulating biology & physics, video games, programming, video generation, world models, Gemini models, scaling laws, computing, and more.

Things to Know…

Stanford HAI on Trump’s AI Action Plan

Stanford HAI published an analysis of the Trump Administration’s AI Action Plan, highlighting its aggressive push for domestic AI infrastructure, innovation-friendly regulation, and international competitiveness. The plan focuses heavily on streamlining chip manufacturing, scaling compute, deregulating model development, and shifting federal AI funding toward commercially viable use.

Why It Matters

This marks a significant shift in U.S. AI policy from cautious governance to industrial acceleration. It favors rapid deployment, open-source development, and minimal constraints on model release, even in high-risk domains. For AI teams in regulated industries or global markets, this signals a more permissive but fragmented policy environment that could reshape how and where GenAI is built and used.

AI in Business Tip

Don’t Force Agentic AI into Legacy IT

Agentic AI systems don’t fit neatly into traditional IT stacks. They require event-driven workflows, continuous memory, dynamic context management, and feedback loops, very different from classic request-response systems.

To succeed, carve out space by pilot-testing agent-based tools in parallel environments or sandboxes before attempting deep integration. Treat them as new “intelligent layers,” not simple plug-ins. This prevents operational friction and gives your team room to design new control points, interfaces, and trust boundaries

Agentic AI works best when it evolves alongside, rather than within, legacy systems.

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 159 is “Mapping Your Generative AI Maturity From Aware to Transformative Part 2”

Apple | Amazon Music

Mapping Your Generative AI M…

Jul 26 · OPEN Tech Talks: Technol…

17:03

Courses to attend:

Retrieval Augmented Generation (RAG) Course by DeepLearning AI. This course helps you to build your first RAG system by writing retrieval and prompt augmentation functions and passing structured input into an LLM.
Race to Certification 2025, from July 1 to October 31, from Oracle. Free digital training and certifications in AI, Oracle Cloud Infrastructure, Multicloud, and Oracle Data Platform

Events:

GITEX Global, October 13-17, 2025, Dubai, UAE
TED Conference dedicated to Artificial Intelligence, September 24-26, 2025, Vienna, Austria
European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.

Tool / Product Spotlight

Tech and Tools…

Burn is a next-generation Deep Learning Framework
LiteLLM enables you to call all LLM APIs using the OpenAI format (Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq, etc.)
Sim Studio is a lightweight, user-friendly platform for building AI agent workflows.

The Investment in AI…

Rune Technologies, which provides solutions for military logistics through AI-enabled predictive software, has announced $24 million in Series A funding
Q.ANT received $73 million in Series A funding to advance Quantum Sensor Development and Commercialization

That’s it for this week – thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif

Safeguarding LLMs using the OWASP Top 10 Risks And Mitigation Guide

Kashif Manzoor — Sun, 29 Jun 2025 18:26:52 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – highlighting what matters in Generative AI for business!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Feeling overwhelmed by the constant stream of AI news? I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

2025 OWASP LLM Top Ten Risks And Mitigation Playbook
Generative AI Use Case
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

U.S. Court Says LLM Training on Copyrighted Books Is Fair Use

On June 23, 2025, the Northern District of California ruled that Anthropic’s use of purchased, copyrighted books to train its large-language models is “quintessential fair use.” The court called the training process “exceedingly transformative,” likening it to how people read books to improve writing skills, so long as the model does not reproduce the text verbatim. The decision granted summary judgment for Anthropic on the input-data question, while leaving two caveats: (1) storing pirated copies of books may still be infringing, and (2) the ruling does not address whether an LLM’s outputs can violate copyright.

Why It Matters: Until now, AI developers faced legal gray zones over whether training on copyrighted works required licenses. This ruling, alongside a similar one favoring Meta two days later, signals that U.S. courts may treat model training as fair use when the data is lawfully acquired. Start-ups and enterprises can move forward with model development without scrambling for blanket book licenses, but they must prove they obtained the texts legally and avoid storing pirated copies

Gen AI Guardrails: Your Playbook for the OWASP LLM Top 10 Risks & Mitigations

For a few weeks, we had been focusing on the Generative AI Maturity Model, and this week, as planned, I was going to cover how to advance to level 2 of the maturity curve.

However, Last week I had an eye-opening chat with one of my friends who works in a large organization. They received an alarm late one night because the Gen AI service consumption had suddenly increased four times higher than usual. An eager teammate had pasted a tricky prompt into the customer-support chatbot. The model became stuck in a loop, continually calling expensive tools and increasing the service’s utilization. The cost was smaller than a public data leak, yet substantial enough to prompt the team to rethink the safety of Generative AI.

Following this incident, we conducted a joint research effort. We found that the OWASP 2025 Top Ten Risks & Mitigations for LLMs and Gen AI Apps list addresses these challenges, covering several key areas.

Source: OWASP 2025 Top 10 Risk & Mitigations for LLMs and Gen AI Apps

After spending a few days and two meetings on this topic, we have started updating the current operating model.

For example, immediately, we added these questions:

Now, every review begins with a few questions. Instead of focusing first on new features, the key point now is:

LLM risk check?
Could this chatbot leak private data?
Do the rate limits stop runaway requests?

Now working on a clear playbook, showing how the OWASP list can change scary risks into simple, steady controls before the next midnight alarm rings. This is what we understand and will do for this organization. You can also try out or go through the process to change or update it according to your scenario.

Let’s first look at what is going to be covered:

A concise tour of the OWASP 2025 Top 10 Risks for Large-Language-Model (LLM) & Generative-AI applications, together with the key mitigations security teams are adopting. The 2025 list reflects lessons learned from the first production year of Gen AI systems:

Why it’s important

LLM endpoints now reside inside customer-facing chatbots, internal workflows, and autonomous agents, thereby multiplying the attack surface
New AI-specific clauses in the EU AI Act, UAE’s forthcoming AI Trust Mark, and updated NIST RMF profiles demand explicit risk treatment for Gen AI
Single prompt-flood attacks have racked up Gen AI Service / GPU bills; a leaked system prompt can cost millions in downtime.
Vendor risk questionnaires increasingly mirror the OWASP list, so meeting these controls shortens procurement cycles

How to implement it

Below is a mitigation starter kit that we have prepared and executed over the last week based on the OWASP guidelines. For space, only headline controls are shown; combine several to reach defence-in-depth.

Wrapping up and what happens next

The risks shift with every model update, new plugin, or surprise prompt that hits production. Treat the OWASP 2025 Top Ten as a living checklist: review it, test against it, and refine controls in every sprint.

Call to Action:

Run the self-assessment. Open the Word template linked above and run the self-assessment.
Select one high-impact fix to implement this week. Whether it’s rate limits, SBOM signing, or output filtering, ship a single control that cuts the most significant risk the fastest.

Start small and let continuous learning, not midnight alarms, drive Generative AI maturity.

Weekly News & Updates…

Top Story of the Week:

Google introduced Gemma 3n, the newest member of its open AI model family. Built for developers, it supports multimodal input text, images, and audio, and runs efficiently on laptops and mobile devices. It includes a detailed developer guide and is available under an open license optimized for commercial use.

My Take: Gemma 3n shifts the GenAI conversation from just performance to accessibility. It’s a model designed not just for big labs, but for indie developers and startups. With local deployment and multimodal capabilities, Gemma 3n is a strong signal; the future of AI isn’t just in the cloud, it’s in your pocket, on your laptop, and inside every product that needs intelligent interaction.

The Cloud: the backbone of the AI revolution

The Path to Agentic AI: A Collaborative Approach, source
NVIDIA Brings Physical AI to European Cities With New Blueprint for Smart City AI, source

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

Product Catalog Enrichment for E-Commerce

Use Case Description: Automatically generate rich, SEO-optimized product titles, descriptions, tags, and FAQs from minimal product input (e.g., name, image, or specs).

Business Challenges:

Manual content creation is slow and inconsistent
Scaling catalogs across geographies and languages is resource-intensive
Poor product descriptions hurt discoverability and conversions

Expected Impact / Business Outcome:

Revenue: Higher search visibility → more conversions
User Experience: Better product understanding = fewer returns
Operations: Teams manage 10× more SKUs with same headcount
Process: Instant updates to descriptions across regions
Cost: Reduces outsourcing and manual workload

Required Data Sources:

Product Metadata, product images
Existing product descriptions
Sales and conversion data

Strategic Fit and Impact: Ideal for companies reaching Operational or Integrated GenAI maturity, scaling personalization while keeping governance in check.

Favorite Tip Of The Week:

Jerry Liu, founder and CEO of Llama Index, has given a talk on Building AI Agents that actually Automate Knowledge Work. The talk covers the types of agent architectures and use cases that are actually useful to knowledge workers. It explores two main topics:

You need the correct set of tools (not “just” RAG) to process and structure enterprise context.
Humans interact with chat agents for more open-ended tasks, but they can be more hands-off for routine/operational tasks.

Potential of AI:

AI is revolutionizing every role on the planet, especially in white-collar jobs. I want to share this tweet from Sebastian Raschka, ML/AI researcher and former statistics professor.

Source: X Post

Things to Know…

What Stanford Did

Researchers at Stanford HAI built a system simulating the personalities and responses of over 1,000 real people using Generative AI agents. The simulations matched actual survey results with 85% accuracy compared to the individuals answering the same questions two weeks later. The system pairs interview transcripts with LLMs to emulate attitudes and behaviors for social research.

Why It Matters

These findings validate that Agentic AI can mimic human behavior at scale, opening doors for realistic policy and social testing without the need for costly real-world trials. At the same time, they raise urgent concerns about privacy, consent, and oversight. For organizations using or planning agent simulations, this study makes it clear: high-fidelity modeling is possible but only with the proper ethical safeguards and transparency baked in.

AI in Business Tip

Simulate Before You Deploy

Before rolling out LLM-based agents to real users, simulate their behavior across edge cases using synthetic personas or internal data.

This helps uncover unintended responses, security gaps, or hallucinations early, especially in customer-facing or regulated environments. Think of it as a “sandbox test” not just for code, but for behavior.

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 156 is “Mapping Your Generative AI Maturity From Aware to Transformative Part 1”

Apple | Amazon Music

Mapping Your Generative AI M…

May 8 · OPEN Tech Talks: Technol…

16:32

Courses to attend:

Building with Llama 4 by DeepLearning AI. Get hands-on with Llama 4 family of models, understand its Mixture-of-Experts (MOE) architecture, and how to build applications with its official API
Building RAG Agents with LLMs. This short course covered LLM Inference Interfaces, Pipeline Design with LangChain, Gradio, and LangServe, Dialog Management with Running States, Working with Documents, Embeddings for Semantic Similarity and Guardrailing, and Vector Stores for RAG Agents.

Events:

TED Conference dedicated to Artificial Intelligence, September 24-26, 2025, Vienna, Austria
European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.

Tool / Product Spotlight

Tech and Tools…

Firecrawl an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each
Perplexica is an open-source AI-powered searching tool or an AI-powered search engine that goes deep into the internet to find answers

The Investment in AI…

Voice AI company SuperDial secured $15M series A to automate insurance calls.
OpenRouter, a Marketplace for AI Models has raised $40 Million

That’s it for this week – thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif

Starting Your Journey to Generative AI Maturity Level 1

Kashif Manzoor — Sun, 22 Jun 2025 17:19:41 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – highlighting what matters in Generative AI for business!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Feeling overwhelmed by the constant stream of AI news? I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

Generative AI Maturity Model Self-Assessment Tool
Generative AI Use Case
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

Software 3.0 at YC AI Startup School

A must-watch 39-minute talk at Y Combinator’s AI Startup School, former OpenAI/Tesla engineer Andrej Karpathy introduced Software 3.0, a shift in which natural language prompts become the primary programming interface. He explained how LLMs are now similar to utilities, fabs, and operating systems, marking a fundamental change in software development.

Why It Matters: This isn’t hype, it’s a fundamental shift in how software is built and used. Prompts in plain English are now the code. That changes who can create software and how products are designed, putting new emphasis on interfaces that support partial autonomy, validation loops, and agent-friendly APIs.

This signals a clear priority for executives to invest in AI-ready developer workflows, build guardrails for prompt-driven systems, and rethink UX for humans and AI agents. The future of software won’t just run on code; it will run on well-structured prompts.

Lead your Organization’s Generative AI Adoption

The last four weeks’ articles on the Generative AI adoption Maturity framework are progressing well. Thank you for sharing your comments and feedback.

This journey aims to develop a Gen AI Maturity Model or framework with the support and effort of colleagues, friends, and leadership teams from several organizations.

Earlier work:

Let’s continue the journey this week; We have covered six levels of Generative AI maturity. You can also download the Excel file for your organization’s Generative AI Maturity self-assessment.

Gen_AI_Maturity_Self_Assessment.xlsx

How to Reach Generative AI Maturity Level 1

Establish a safe, managed starting point for Generative-AI exploration without committing to a significant budget or strategic change.

Road to Level 1:

You must execute these seven steps to achieve Level 1 Maturity in the organization.

How you will do the completion check?

All seven deliverables stored in a shared folder
Registry shows > 80 % of active pilots logged
Sponsor holds first monthly check-in scheduled
Employees survey confirms awareness of guideline

Next week, we will continue building the AI Maturity model and will work on targeting Level 2 to achieve.

Weekly News & Updates…

Top Story of the Week:

OpenAI o3-pro has been released with access to tools that make ChatGPT useful; it can search the web, analyze files, reason about visual inputs, use Python, personalize responses using memory, and more. As shown in academic evaluations, OpenAI o3-pro excels at math, science, and coding.

Why it Matters: This is quality over speed: o3‑pro excels in math, coding, science, and complex reasoning, earning higher ratings in accuracy, clarity, and instruction-following

The UK government is launching Extract, an AI-powered planning assistant built on Google DeepMind’s Gemini model. It can digitize and extract data from decades-old, handwritten planning documents and maps in minutes, potentially cutting the 250,000 annual hours local councils spend manually processing applications.

My Take: This move combines automation with strategic urban planning. Extract doesn’t just save time, it reframes how councils operate. Unlocking legacy data and accelerating approvals shifts staff focus from form-checking to decision-making.

Mistral AI introduced Magistral, its first model family focused on transparent, multi-step reasoning. Available in two versions, Magistral Small (open-source, 24B parameters) and Magistral Medium (enterprise-grade preview), it supports structured logic in diverse languages and delivers explainable, step-by-step outputs

Why it Matters: It adds traceable chains of thought, making it ideal for regulated industries like finance, healthcare, and legal, where auditability is essential.

The Cloud: the backbone of the AI revolution

Achieve Cost-Efficient LLM Serving with Production-Ready Quantization Solution, source
NVIDIA CEO Drops the Blueprint for Europe’s AI Boom, source

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

Synthetic Customer Support Simulation for Agent Training

Use Case Description: Generative AI creates realistic, diverse, and evolving customer interaction scenarios. These can simulate tone shifts, escalation paths, complaints, or edge cases, exposing customer service agents to real-world complexity before they go live.

Business Challenges:

New agents lack exposure to rare or high-pressure support scenarios
Manual role-play training is time-consuming and inconsistent
Onboarding takes weeks with limited feedback loops

Expected Impact / Business Outcome:

Revenue: Higher customer retention from better-trained agents
User Experience: Faster, more consistent support interactions
Operations: Reduces new agent ramp-up time by 40-60%
Process: Standardized evaluation across locations and trainers
Cost: Less dependency on live team leads and manual QA

Required Data Sources:

Past customer support transcripts
Product knowledge base and SOPs
Customer sentiment tagging (if available)

Strategic Fit and Impact: Aligns with organizations moving toward Autonomous or Integrated levels of the AI Maturity Framework. Bridges HR/training and customer experience, reduces human training load, and enhances readiness for AI-assisted hybrid support models.

Favorite Tip Of The Week:

Building Multi-Agent AI Systems: Anthropic’s engineering team shared hard-earned lessons from launching Claude’s new multi-agent research system via an article ‘How we built our multi-agent research system’.

Start with one lead agent that coordinates, and deploy multiple sub-agents in parallel to explore different parts of a problem. Run tasks concurrently, not in sequence, to massively speed up research.

Source: Anthropic Article: How we built our multi-agent research system

This pattern, lead agent plus sub-agents, reduced complex research time by up to 90% compared to single-agent setups. If your AI tasks are complex, fragmented, or require deep exploration, this approach can help achieve good performance.

Potential of AI:

MIT Press published Foundations of Computer Vision, a book that took over 10 years to write. This concise, modern textbook combines classic image-processing theory with the latest deep learning developments. It covers essential topics like motion estimation, vision-language associations, transformers, diffusion models, and even ethics in vision systems.

It bridges the gap between academia and practical application, helping teams understand the math and the machine learning techniques behind vision.

Things to Know…

NIST has released a dedicated risk profile for Generative AI systems (AI RMF 600-1) to help organizations manage the growing risks tied to Generative AI adoption. It’s designed as a practical extension of the original AI Risk Management Framework (RMF 1.0), but focuses specifically on the unique characteristics of generative models.

Key Highlights

12 Critical Risk Categories: Covers hallucinations, misuse, disinformation, IP infringement, privacy, bias, environmental impact, and more.
200+ Recommended Actions: Concrete steps aligned to the four RMF functions, Govern, Map, Measure, Manage.
Tailored for GenAI Lifecycle: Considers training, fine-tuning, prompting, deployment, and monitoring phases.
Adaptable and Voluntary: Designed to work across sectors and model types without being overly prescriptive.

Why It Matters

This is one of the most comprehensive, neutral, and technically grounded GenAI risk frameworks. It’s a valuable guide for making responsible choices for team building or integrating GenAI tools, especially ahead of upcoming AI regulations.

AI in Business Tip

Avoid Over-Automation with GenAI Agents

When deploying AI agent-based systems, resist the urge to automate every step. Instead, design agents collaborate with humans, let them draft, recommend, or summarize, not just act.

Use checkpoints or approvals where judgment matters. This reduces risk and builds trust with internal users and customers.

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 156 is “Mapping Your Generative AI Maturity From Aware to Transformative Part 1”

Apple | Amazon Music

Mapping Your Generative AI M…

May 8 · OPEN Tech Talks: Technol…

16:32

Courses to attend:

Building with Llama 4 by DeepLearning AI. Get hands-on with Llama 4 family of models, understand its Mixture-of-Experts (MOE) architecture, and how to build applications with its official API
Building RAG Agents with LLMs. This short course covered LLM Inference Interfaces, Pipeline Design with LangChain, Gradio, and LangServe, Dialog Management with Running States, Working with Documents, Embeddings for Semantic Similarity and Guardrailing, and Vector Stores for RAG Agents.

Events:

European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.

Tool / Product Spotlight

Tech and Tools…

Suna is a fully open source AI assistant that helps you accomplish real-world tasks with ease.
Jan is a ChatGPT-alternative that runs 100% offline on your device.

The Investment in AI…

Uncountable, a leading platform for digital transformation in industrial research and development, today announced it has raised $27 million
A16z leads $15m series A for AI interview tool Cluely
Tennr got $101M to build out AI that automates patient referral process.

That’s it for this week – thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif

Generative AI Maturity Model Self-Assessment

Kashif Manzoor — Sun, 01 Jun 2025 19:02:31 +0000

Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing – highlighting what matters in Generative AI for business!

I’m building and implementing AI solutions, and sharing everything I learn along the way…

Check out the updates from this week! Please take a moment to share them with a friend or colleague who might benefit from these valuable insights!

Feeling overwhelmed by the constant stream of AI news? I’ve got you covered! I filter it all so you can focus on what’s important.

Today at a Glance:

Generative AI Maturity Model Self-Assessment Tool
Generative AI Use Case
AI Weekly news and updates covering newly released LLMs
Courses and events to attend

Executive Brief

Stargate UAE: A Strategic Leap in Global AI Infrastructure

The United Arab Emirates has announced the Stargate UAE project, an initiative to construct a 1 gigawatt AI data center in Abu Dhabi. This facility, part of a broader 5 gigawatt AI campus, is set to become one of the world’s most powerful AI hubs, with an initial 200 megawatts expected to be operational by 2026.

Global Collaboration: The project is a joint effort involving OpenAI, G42 (a UAE-based AI firm), Oracle, NVIDIA, Cisco, and SoftBank Group.
Strategic Location: Situated in Abu Dhabi, the data center will provide AI services within a 2,000-mile radius, reaching up to half the world’s population
Technological Advancement: The facility will leverage NVIDIA’s advanced Grace Blackwell GB300 AI systems, enhancing the UAE’s capabilities in AI research and application

Why It Matters: It’s about local readiness. By building one of the world’s largest AI Infrastructure hubs, the UAE is leading the charge that Generative AI adoption is not optional; it’s becoming foundational across both government and private sectors.

This project will become a foundation for accelerating AI in critical industries like healthcare, oil, and gas. This is the signal to act: building internal Gen AI capabilities now isn’t only a competitive edge; it’s a must to stay relevant in a rapidly transforming ecosystem.

Lead your Organization’s Generative AI Adoption

The last four weeks’ articles on the Generative AI adoption Maturity framework are progressing well. Thank you for sharing your comments and feedback.

This journey aims to develop a Gen AI Maturity Model or framework with the support and effort of colleagues, friends, and leadership teams from several organizations.

Earlier work:

Let’s continue the journey this week; We have covered six levels of Generative AI maturity. You can use the matrix as your dashboard and revisit scores quarterly, attach key performance indicators (KPIs), and observe the color shift as capabilities strengthen across your organization.

Self-Assessment Tool

Now you need to take a fast, honest maturity check, and you can do this yourself. Refer to the slide below. You can also download the Excel file to conduct your organization’s Gen AI Maturity.

Gen_AI_Maturity_Self_Assessment.xlsx

Next week, we will continue building the AI Maturity model and will improve the Maturity Self Assessment format.

Weekly News & Updates…

Top Story of the Week:

Anthropic has launched its latest AI models, Claude Opus 4 and Claude Sonnet 4, marking a significant advancement in AI capabilities. Claude Opus 4, in particular, excels in coding and complex reasoning tasks, outperforming previous models with a 72.5% score on the SWE-bench benchmark. It can autonomously handle long-duration tasks, maintaining performance over extended periods.

My Take: This release signifies a major AI development, showcasing the potential for AI to handle complex, long-running tasks with minimal human intervention. For businesses and developers, this opens new avenues for automation and efficiency. However, it also reminds us the importance of addressing ethical considerations and ensuring robust safety measures as AI systems become more autonomous.

OpenAI has announced the acquisition of io, an AI hardware startup founded by renowned designer Jony Ive, in a deal valued at approximately $6.5 billion. This move aims to develop a new class of AI-integrated devices that transcend traditional screens and interfaces. Ive’s design firm, LoveFrom, will lead the design and creative direction for OpenAI’s hardware initiatives, while io’s team of approximately 55 hardware and software experts will join OpenAI to bring these innovative products to market.

Why it matters: This acquisition signifies OpenAI’s expansion into the consumer hardware space, aiming to create AI-native devices that offer more intuitive and seamless user experiences. This also leads us that how important is that AI works along with the hardware/Infrastructure to redefine how users interact with technology, moving beyond conventional devices like smartphones and laptops.

The Cloud: the backbone of the AI revolution

Accelerate AI Model Performance with Weka Converged Storage and OCI GPU Compute link
Run LLMs on AnythingLLM Faster With NVIDIA RTX AI PCs link

Use Case Spotlight

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

Generative AI for Digital Historical Reconstructions

Use Case Description: Generative AI rebuilds damaged or lost heritage sites in accurate digital form. It combines photographs, lidar scans, drone footage, and archival plans to create high-fidelity 3-D models that museums, scholars, and visitors explore in virtual or mixed reality. Neural Radiance Fields now reach millimetre accuracy for historic buildings. CyArk, Oxford’s Institute for Digital Archaeology, and UNESCO pilots show the method in active use for Palmyra and other at-risk sites.

Business Challenges:

Physical restoration is slow and costly and can damage fragile remains
War, climate events, and urban pressure threaten many sites before they are documented
Archival records are scattered across institutions and formats
Visitor access is limited by geography and conservation rules
Funding for preservation competes with other public needs

Expected Impact / Business Outcome:

Revenue: New ticketed virtual tours and licensing of 3-D assets generate income for site authorities and partner museums
User Experience: Global audiences view sites in high detail from any device and in multiple time periods which deepens engagement
Operations: Digital inspection lets conservators plan repairs without onsite travel and monitors erosion over time
Process: Standard AI workflows cut modeling time from months to days and keep source data traceable for reviewers
Cost: Lower field survey expense and reduced need for repeated manual modeling free funds for other conservation tasks

Required Data Sources:

High-resolution photographs including crowdsourced images
Lidar or photogrammetry point clouds
Historic maps, plans, and excavation drawings
Weathering and material studies for surface realism
Curatorial metadata that links models to catalog records

Strategic Fit and Impact: The project supports heritage mandates to document and protect culture for future generations. Digital twins supply research material, assist education, and create inclusive access for people unable to travel. They align with UNESCO Digital Heritage and national smart-tourism strategies while helping governments meet sustainability goals by reducing physical footprint at fragile sites and advances the institution to an “Integrated” level on the AI Maturity Framework.

Favorite Tip Of The Week:

Building Enterprise AI Agents, a e-book from Cohere. Explore how agent-based AI systems can drive real change in your organization. This guide walks through:

Key hurdles in deploying scalable AI agents across enterprise environments
Opportunities and risks of using AI agents in regulated sectors
Practical ways to explain the business value of agentic AI to stakeholders

Potential of AI:

Microsoft introduced Aurora, a large-scale AI foundation model designed to predict a wide array of environmental phenomena. Trained on over one million hours of diverse atmospheric data, Aurora excels in forecasting weather patterns, air quality, ocean waves, and tropical cyclones. Notably, it delivers 10-day global weather forecasts in under a minute, outperforming traditional models in both speed and accuracy.

Why it matters

Aurora represents a significant advancement in environmental forecasting, offering faster and more precise predictions at a fraction of the computational cost of traditional methods. Its ability to accurately forecast extreme weather events and air quality has profound implications for disaster preparedness, public health, and climate research. Aurora’s code and model weights publicly available

Things to Know…

At Google I/O 2025, Google introduced several AI tools aimed at enhancing developer productivity:

Gemini 2.5 Flash Preview: An updated version of Google’s AI model, optimized for speed and efficiency, with improved coding and reasoning capabilities.
Gemma 3n: A lightweight, multimodal model designed to run on various devices, supporting audio, text, image, and video inputs.
Gemini Diffusion: A new text model capable of generating outputs at five times the speed of previous models, suitable for rapid content creation.
Lyria RealTime: An experimental music generation model that allows interactive creation and performance of music in real time.

AI in Business Tip

Use Agents, Keep Humans in the Loop

AI agents can execute multi-step tasks, but as we are early into Agentic AI therefore you need to get benefit from human oversight.

The best approach?

Set agents to operate with checkpoints after each critical step, require a quick human review. This keeps quality high, avoids runaway behavior, and builds trust in real-world use without slowing things down.

The Opportunity…

Podcast:

This week’s Open Tech Talks episode 156 is “Mapping Your Generative AI Maturity From Aware to Transformative Part 1”

Apple | Amazon Music

Mapping Your Generative AI M…

May 8 · OPEN Tech Talks: Technol…

16:32

Courses to attend:

Reinforcement Fine-Tuning LLMs with GRPO by DeepLearning AI. in this course Learn the foundations of reinforcement learning and how to use the Group Relative Policy Optimization (GRPO) algorithm to improve reasoning in large language models.
Anthropic’s Prompt Engineering Interactive Tutorial: This course is intended to provide you with a comprehensive step-by-step understanding of how to engineer optimal prompts within Claude.

Events:

European Conference on Artificial Intelligence, October 25-30, 2025, Bologna, Italy.
The AI Summit London, June 11-12, 2025, Tobacco Dock, London

Tool / Product Spotlight

Tech and Tools…

onlook: The Cursor for Designers, an Open-Source, Visual-First Code Editor

The Investment in AI…

Artificial intelligence infrastructure startup Chalk said Wednesday it had raised a $50 million Series A funding round.

And that’s a wrap for this week! Thank you for reading.

I’d love to hear your thoughts, simply hit reply to share feedback or let me know which section was most useful to you.

If you enjoyed this issue, consider sharing it or forwarding to a colleague, friend who’d benefit. Your support helps grow our AI community.

Until next Saturday,

Kashif