OpenAI begins testing ads in ChatGPT for Free and Go users, keeps paid tiers ad-free
OpenAI began testing clearly labeled ads in ChatGPT for Free and Go-tier users; says ads won’t influence answers, won’t use conversation data, and won’t appear for paid plans..
Today’s highlights:
Over the past few weeks, the OpenAI vs Anthropic story has basically played out in two phases. First, it was a straight product fight in coding agents. Anthropic dropped Claude Opus 4.6 with strong agent-style coding features that can write, debug, and manage software tasks on its own. Very quickly after that, OpenAI released GPT-5.3 Codex, its own upgraded coding agent. The timing made it obvious that both were going after the same thing: developers and enterprise teams who want AI that can actually handle real coding workflows. Reports confirmed the launches happened almost back-to-back, and most people saw it as a direct competitive move rather than a coincidence.
Then the conversation shifted from models to business models and trust. OpenAI started testing ads inside ChatGPT for some free users, saying ads would be labeled and separate from answers and would help fund the product. Around the same time, Anthropic came out publicly saying Claude will stay ad-free and that AI chats often include personal or sensitive questions, so ads could create conflicts of interest. They even leaned into this in marketing, which many saw as a subtle dig at OpenAI. OpenAI leadership responded, and suddenly the rivalry wasn’t just about whose model is better- it became about how AI assistants should make money and how much users should trust them. So now two battles are happening at once: a product race in agentic coding tools, and a bigger philosophical fight over ads vs ad-free AI and what that means for long-term trust.
More specifically, OpenAI is beginning to test advertising in ChatGPT in the U.S. for users on the Free tier and the newer low-cost Go plan, priced at $8 a month and launched globally in mid-January. The company said ads will not appear for paid subscribers on Plus, Pro, Business, Enterprise, or Education plans, and insisted that ads will be clearly labeled, kept separate from responses, and will not influence answers.
OpenAI said advertisers will not receive user conversation data and will only get aggregated performance metrics, while users can view and clear ad-interaction history, dismiss ads, and manage personalization settings. Ads will not be shown to users under 18 and will be kept away from sensitive topics such as health, politics, and mental health, amid broader skepticism about ads inside AI chat experiences.
Emerging Risks and Open Questions
Though OpenAI claims not to sell data, ad generation still involves processing sensitive content, posing a privacy risk if leaked.
Bias is a risk: AI ads may unintentionally discriminate or stereotype based on user data.
Users might confuse sponsored content with neutral advice, especially in chat format, leading to manipulation.
Weak ad vetting could allow scams or disinformation, especially during elections. Scaling ads globally without adapting to local laws risks regulatory penalties.
User backlash is also possible- ads could push users to overload paid tiers, straining resources.
At the School of Responsible AI (SoRAI), we empower individuals and organizations to become AI-literate through comprehensive, practical, and engaging programs. For individuals, we offer specialized training, including AI Governance certifications (AIGP, RAI, AAIA) and an immersive AI Literacy Specialization. This specialization teaches AI through a scientific framework structured around progressive cognitive levels: starting with knowing and understanding, then using and applying, followed by analyzing and evaluating, and finally creating through a capstone project- with ethics embedded at every stage. Want to learn more? Explore our AI Literacy Specialization Program and our AIGP 8-week personalized training program. For customized enterprise training, write to us at [Link].
⚖️ AI Ethics
Harvard Business Review Study Finds Heavy AI Adoption Expands Workloads and Triggers Burnout Risks
A new Harvard Business Review report cites in-progress research showing that employees who most actively adopt AI tools may be among the first to feel burnout, as higher capability quickly turns into higher workload. Researchers embedded for eight months at a roughly 200-person tech company and, based on more than 40 in-depth interviews, found staff weren’t formally pushed into new targets but still expanded their to-do lists as AI made more tasks seem doable. Work increasingly spilled into lunch breaks and evenings, alongside rising expectations for speed and responsiveness, leading to fatigue and difficulty disconnecting. The findings align with other contested studies suggesting modest or uneven productivity gains, but this research argues the bigger risk is that AI-driven augmentation can make workplaces harder to step away from rather than easier.
Google Updates Family Link and YouTube Parental Controls for Safer Internet Day 2026
Google marked Safer Internet Day by highlighting recent updates aimed at helping kids and teens use its services more safely, alongside tools for parents to manage accounts. Family Link has a redesigned interface that lets parents manage multiple devices from a single page, view device-specific usage summaries, set time limits, and adjust controls through a consolidated screen-time tab. On YouTube, a revamped signup flow is designed to make it easier to create a child account and switch between accounts in the mobile app depending on who is watching. Parents can now limit time spent scrolling Shorts, with an option to set the timer to zero planned for a future update, and supervised child and teen accounts can use custom Bedtime and Take a Break reminders on top of existing default-on wellbeing protections.
New York Lawmakers Seek Three-Year Moratorium on New Data Center Permits Amid Energy Concerns
New York state lawmakers have filed a bill that would pause permits for building and operating new data centers for at least three years, reflecting growing bipartisan concern about their strain on local power grids and communities. The proposal, whose chances remain unclear, would make New York at least the sixth state to weigh a data-center construction pause, as big tech ramps up spending on AI infrastructure. Critics across the political spectrum argue data centers can drive up electricity demand and raise household power bills, and more than 230 environmental groups have urged Congress to consider a national moratorium. The push comes as the state moves to update grid-connection rules for large energy users and require them to cover more of the associated costs.
🚀 AI Breakthroughs
Crypto.com Buys AI.com for $70 Million, Targets Super Bowl Debut for AI Agent
Crypto.com has bought the AI.com domain for $70 million, the Financial Times reported, in what is described as the most expensive domain sale on record and paid entirely in cryptocurrency to an undisclosed seller. The company plans to spotlight the site in a Super Bowl ad, positioning it as a consumer-facing personal AI agent for messaging, app use, and stock trading. The price reportedly surpasses previous top domain deals such as CarInsurance.com at $49.7 million in 2010, along with VacationRentals.com and Voice.com. The purchase underscores Crypto.com’s big-ticket marketing strategy, though the long-term payoff of ultra-premium domains remains uncertain.
🎓AI Academia
Study Details Human–AI Co-Creation Model Generating Secure, High-Quality Multiple-Choice Question Series
A University of Jyväskylä study describes a human–AI co-creation model to speed up the production of high-quality multiple-choice questions, especially for natural science education where item writing is slow and exam security requires large banks. The work separates generation into two tasks: building MCQ “prototypes” via a three-step, multi-AI prompting workflow with human review between steps, then expanding each prototype into a series using a one-step method that generates multiple distinct items targeting the same learning outcome. Human and automated checks found about half of the generated series questions were acceptable without edits. The paper says small fixes to initially rejected items moderately improved series acceptance and significantly improved the quality of prototype questions.
Puda Browser-Based Agent Enables User-Sovereign Personalization With Three-Tier Privacy Data Sharing Controls
A new arXiv paper describes Puda, a “Private User Dataset Agent” designed to give users client-side control over personal data pulled from multiple online services for personalized AI agents. The system targets the privacy–personalization tension by letting people share data at three granularities: full browsing history, extracted keywords, or predefined category subsets. Implemented as a browser-based common platform and tested on a personalized travel-planning task, the study reports that sharing category subsets retained 97.2% of the personalization performance achieved by sharing detailed browsing history, using an LLM-as-a-judge evaluation across three criteria. The work positions Puda as an approach to reduce platform data silos while offering practical privacy choices for cross-service agent workflows.
Paper Finds Gender and Race Bias in LLM-Based Product Recommendations
A research paper reports that large language models can produce measurably different consumer product recommendations depending on a user’s stated gender and race. Using prompt-based tests across multiple demographic groups, the study applies three analysis techniques—marked-word comparisons, support vector machine classification, and Jensen–Shannon divergence—to detect and quantify disparities in the suggested products. The findings indicate significant gaps between groups, suggesting that LLM-driven recommendation tools may inherit and amplify implicit biases from training data. The final version is available via Springer at https://doi.org/10.1007/978-3-031-87766-7_22, with a related preprint posted on arXiv (2602.08124, Feb. 8, 2026).
Benchmark Finds Zero-Shot AI Image Detectors Struggle as Modern Generators Evade Most Models
A new benchmark study evaluated how well open-source AI-generated image detectors work out of the box, without any fine-tuning, reflecting the most common real-world deployment setup. It tested 16 detection methods (23 pretrained variants) across 12 datasets totaling 2.6 million images from 291 generators, including modern diffusion systems. The results found no consistent top performer, with detector rankings swinging widely between datasets (Spearman ρ ranging from 0.01 to 0.87) and average accuracy ranging from 75.0% for the best model to 37.5% for the worst. Performance depended heavily on training-data alignment, creating 20–60% swings even among similar detector families, while newer commercial image generators such as Flux Dev, Adobe Firefly v4, and Midjourney v7 pushed most detectors down to just 18–30% average accuracy. Statistical tests indicated the differences between detectors were significant (Friedman χ²=121.01, p<10⁻¹⁶), underscoring that there is no one-size-fits-all detector and that model choice needs to match the specific threat landscape.
Aegis Red-Teaming Framework Tests AI Voice Agents for Privacy, Privilege Escalation, and Abuse
A new arXiv preprint describes Aegis, a red-teaming framework aimed at testing the governance, integrity, and security of AI voice agents built on Audio Large Language Models (ALLMs) now used in banking, customer service, IT support, and logistics. The framework maps real-world deployment pipelines and runs structured adversarial scenarios covering risks such as privacy leakage, privilege escalation, and resource abuse. Case studies in banking call centers, IT support, and logistics found that while access controls can reduce direct data exposure, voice agents still face “behavioral” attacks that bypass restrictions even under strict controls. The paper also reports differences across model families, with open-weight models showing higher susceptibility, pointing to the need for layered defenses combining access control, policy enforcement, and behavioral monitoring.
AIRS-Bench Benchmark Measures Frontier AI Research Agents Across 20 Scientific Tasks, Shows Gaps
A new benchmark suite called AIRS-Bench has been released to evaluate how well frontier AI “research agents” can carry out end-to-end scientific work, using 20 tasks drawn from recent machine-learning papers across areas such as language modeling, mathematics, bioinformatics, and time-series forecasting. Unlike many benchmarks, the tasks do not ship with baseline code and are designed to test the full research lifecycle, including idea generation, experiment analysis, and iterative refinement under agentic scaffolding. Baseline results using frontier models with sequential and parallel scaffolds show agents beat reported human state-of-the-art on four tasks, but fall short on the other sixteen, and still fail to reach the theoretical performance ceiling even when they win. The project has open-sourced task definitions and evaluation code on GitHub, aiming to standardize comparisons across agent frameworks and highlight how far autonomous scientific research still has to go.
Study Finds Attribution Explanations Fall Short for Agentic AI, Backing Trace-Based Diagnostics
A new research paper argues that explainability tools built for traditional, single-step AI predictions do not reliably diagnose failures in newer “agentic” AI systems that act through multi-step decision trajectories. The study compares attribution-based explanations in static classification with trace-based diagnostics on agent benchmarks including TAU-bench Airline and AssistantBench, finding attribution methods produce stable feature rankings in static tasks (Spearman ρ = 0.86) but break down for execution-level debugging in agent runs. In contrast, trace-grounded rubric evaluation consistently pinpoints where behavior goes wrong and highlights state-tracking inconsistency as a key failure pattern. The paper reports that this inconsistency is 2.7× more common in failed runs and cuts success probability by 49%, supporting a shift toward trajectory-level explainability for autonomous AI agents.
Towards EnergyGPT Fine-Tunes LLaMA 3.1-8B for Energy Sector Question Answering
A recent paper describes “EnergyGPT,” a large language model tailored for energy-sector use cases by fine-tuning Meta’s open-weight LLaMA 3.1-8B on a curated corpus of energy-related texts rather than training from scratch. The work compares two adaptation methods: full-parameter supervised fine-tuning and a lower-cost LoRA approach that updates only a small subset of parameters. The authors report that both adapted versions outperform the base model on domain-specific question-answering benchmarks for energy language understanding and generation tasks. The LoRA variant is said to deliver competitive accuracy gains while significantly reducing training cost and infrastructure needs, alongside an end-to-end pipeline covering data curation, evaluation design, and deployment.
About SoRAI: SoRAI is committed to advancing AI literacy through practical, accessible, and high-quality education. Our programs emphasize responsible AI use, equipping learners with the skills to anticipate and mitigate risks effectively. Our flagship AIGP certification courses, built on real-world experience, drive AI governance education with innovative, human-centric approaches, laying the foundation for quantifying AI governance literacy. Subscribe to our free newsletter to stay ahead of the AI Governance curve.












