OpenAI’s "ChatGPT AGENT" Is Here- But Can We TRUST It?

OpenAI has introduced the ChatGPT agent, a powerful AI assistant that can handle complex tasks like managing calendars, creating presentations, writing code etc.- But Is It Acting Responsibly?

Jul 18, 2025

Today's highlights:

You are reading the 111th edition of the The Responsible AI Digest by SoRAI (School of Responsible AI) . Subscribe today for regular updates!

At the School of Responsible AI (SoRAI), we empower individuals and organizations to become AI-literate through comprehensive, practical, and engaging programs. For individuals, we offer specialized training such as AI Governance certifications (AIGP, RAI) and an immersive AI Literacy Specialization. This specialization teaches AI using a scientific framework structured around four levels of cognitive skills. Our first course focuses on the foundational cognitive skills of Remembering and Understanding & the second course focuses on the Using & Applying. Want to learn more? Explore all courses: [Link] Write to us for customized enterprise training: [Link]

🔦 Today's Spotlight

The new ChatGPT agent is a major leap in AI capability, transforming from a chat-only tool to a task-performing assistant that can operate software, browse the internet, and complete complex workflows using a virtual computer. But with greater power comes greater responsibility. Is OpenAI’s latest release built with Responsible AI principles? This summary critically evaluates how well the ChatGPT agent aligns with five key pillars: safety, privacy and user control, transparency and oversight, bias and fairness, and misuse prevention.

Safety: Preventing Harm and Attacks

ChatGPT agent is fortified against unsafe behavior through a robust safety stack. It has undergone specialized training to resist prompt injection attacks, and tests show it ignores over 99% of malicious inputs in browsing scenarios. The system refuses harmful content like hate speech or illicit advice, maintaining nearly 100% compliance on internal safety benchmarks. A standout feature is Watch Mode, which halts high-stakes tasks like banking when the user is inactive, preventing the agent from acting unsupervised. These layered defenses collectively reflect OpenAI’s “do no harm” approach, though they acknowledge continuous refinement is needed.

Privacy and User Control

Privacy is deeply embedded in the agent’s design. The system requires users to manually log in to personal accounts, ensuring it never learns passwords or acts without explicit permission. Users must authorize service connectors like Gmail, and access can be revoked at any time. To prevent data persistence risks, long-term memory is disabled at launch, and the agent is explicitly trained to avoid seeking or exposing personal data. Crucially, it asks for user confirmation before significant actions, preserving both user control and informed consent throughout the session.

Transparency and Oversight

OpenAI has designed the agent to work in plain sight, not behind closed doors. As it operates, it narrates its steps—“Searching for articles… Reading… Extracting…”—giving users a clear view into its process. If it encounters issues, it asks for clarification, maintaining an open dialogue. Users can pause, override, or stop the agent at any time, ensuring humans stay in charge. System-level rules (like policy alignment) act as invisible supervisors, ensuring compliance even if the user tries to prompt otherwise. Overall, the agent balances autonomy with accountability.

Bias and Fairness

OpenAI evaluated the agent for fairness across gender and social context. On the Bias Benchmark Questions (BBQ), it showed improved caution—refusing to answer ambiguous or sensitive prompts rather than risk biased output. In gender bias tests, responses were nearly neutral, with a very low net bias score (≈0.003–0.004)—an improvement over previous models. While occasional over-cautiousness led to non-answers, the tradeoff likely reduces the risk of harm. These evaluations suggest a clear but still evolving commitment to equitable treatment across diverse users.

Misuse Prevention and Risk Mitigation

With advanced capabilities, the agent poses new misuse risks—but OpenAI has preemptively addressed them. It consistently refuses harmful or illegal requests, including complex adversarial “jailbreak” prompts. It won’t dig up personal data, complete financial transactions, or engage in restricted activities. In high-risk domains like biosafety, OpenAI designated it a “High Risk” system and activated advanced safeguards like threat modeling, classifier monitoring, and red teaming. The company also uses bug bounty programs and human reviewers to detect abuse. While a few gaps remain (e.g., slightly lower refusal rates in some financial tasks), the system is clearly built with proactive risk mitigation in mind.

For more details, refer to the ChatGPT Agent System Card published here.

Conclusion

The ChatGPT agent represents a significant shift in how AI can assist users- not just through conversation, but by taking meaningful, real-world actions. Its foundation seems to be built on Responsible AI principles, showing encouraging progress in safety, user control, transparency, fairness, and misuse prevention. Yet, this is still an early chapter. As users, developers, and researchers engage more deeply with the agent in varied contexts, new questions will emerge: How well will it adapt to edge cases? Can it consistently earn user trust at scale? And how should society govern increasingly autonomous digital agents? The answers will unfold over time, and it is this ongoing dialogue- between technology, ethics, and human values- that will shape what comes next.

🚀 AI Breakthroughs

Former OpenAI Employee Reveals Rapid Codex Development and Dynamic Team Practices

• A former OpenAI employee shared insights on the rapid seven-week development of Codex, describing it as the hardest and fastest-paced project in nearly a decade

• OpenAI’s collaborative engineering culture allows quick team adjustments, enabling rapid response to project needs without formal procedures or delays in resource allocation

• The employee described OpenAI as "frighteningly ambitious," highlighting its competitive efforts across multiple tech domains and reliance on Slack for team communication over email;

The Responsible AI Digest by School of Responsible AI- SoRAI

OpenAI’s "ChatGPT AGENT" Is Here- But Can We TRUST It?

OpenAI has introduced the ChatGPT agent, a powerful AI assistant that can handle complex tasks like managing calendars, creating presentations, writing code etc.- But Is It Acting Responsibly?

Today's highlights:

You are reading the 111th edition of the The Responsible AI Digest by SoRAI (School of Responsible AI) . Subscribe today for regular updates!

🔦 Today's Spotlight

Safety: Preventing Harm and Attacks

Privacy and User Control

Transparency and Oversight

Bias and Fairness

Misuse Prevention and Risk Mitigation

Conclusion

🚀 AI Breakthroughs

Former OpenAI Employee Reveals Rapid Codex Development and Dynamic Team Practices

Lovable Secures $200M Series A Funding, Valued at $1.8B Post-Launch Growth

Amazon Launches AgentCore Preview to Simplify Enterprise AI Agent Management

Claude AI Launches Financial Solution with Real-Time Data and Enterprise Integration

Anthropic Expands Claude's Capabilities with New Connectors Linking Various Tools

Mistral's Le Chat Gains New Features, Targets OpenAI and Google Rivals

Google Expands AI Business-Calling Feature, Enhances Search with Gemini 2.5 Pro Model

Gemini Embedding Model Launches, Offering Superior Multilingual Text Processing Abilities

Google Launches Gemini 2.5 Pro and Deep Search for Enhanced AI Search Capabilities

Hugging Face Achieves $1M in Sales With New Reachy Mini Robot Launch

Kiro Enhances AI Development with Specs and Hooks for Seamless Production Deployment

San Jose Leads AI Adoption in Public Sector, Integrating ChatGPT for Governance Tasks

IBM API Agent Enhances Efficiency and Governance in API Connect Platform Release

Agent Leaderboard v2 Evaluates AI Models in Real-World Multi-Domain Scenarios

⚖️ AI Ethics

AI Language Patterns Seep into Human Speech, Influencing Vocabulary and Tone: Study

Anthropic Users Face Unannounced Usage Limit Changes Sparking Confusion and Discontent

xAI's Grok 4 Sparks Outrage with Inappropriate Comments; Company Issues Corrections

AI Leaders Urge Deeper Study of Chain-of-Thought Monitoring for Enhanced Safety

Google Launches AI-Driven News Summaries in Discover on iOS and Android

Meta Addresses Security Flaw That Exposed Chatbot User Prompts and Responses

Elon Musk's xAI Faces Backlash for Controversial AI Characters on Grok App

Google's AI Agent Big Sleep Uncovers Critical SQLite Security Vulnerability

Quess Corp Report Highlights Talent Gaps in AI and Platform Engineering Roles

BrightCHAMPS Unveils World's Largest Student-Led AI Survey Across 29 Countries

🎓AI Academia

Context Engineering for Large Language Models: An In-Depth Research Overview

Study Reveals Large Language Models Can Recognize When They Are Under Evaluation

Framework Proposed to Address Manipulation by Misaligned AI in Security Systems

Global Coordination Needed for Effective Halt on Dangerous AI Development and Deployment

Discussion about this post

Ready for more?