Microsoft Confirms Office Copilot Bug Summarized Confidential Emails Despite Data Loss Prevention Policies
Microsoft says it has rolled out an update to fix the issue, and that it "did not provide anyone access to information they weren't already authorised to see".
++ Google launches Gemini 3.1 Pro Preview and adds Lyria 3 music in Gemini app; YouTube expands AI “Ask” to TVs and consoles; Anthropic releases Claude Sonnet 4.6 (1M-token) as Claude Code efficiency and autonomy reports emerge, amid a major outage; OpenAI teams with Reliance for AI search and recommendations in JioHotstar and launches EVMbench with Paradigm; Cohere ships open-weight Tiny Aya multilingual models for offline devices; WordPress.com adds built-in AI assistant; EU Parliament disables AI tools over cloud risks; UN pushes technical human control; US Labor issues AI literacy framework; new studies and benchmarks cover agent skills, semantic stability, unlearning audits, frontier risk frameworks, Africa-centric safety gaps, oversight-by-design, ethics review agents, Llama-3 geographic bias, and GLM-5 agentic engineering.
Today’s highlights:
Microsoft confirmed that a bug in Microsoft 365 Copilot Chat allowed the AI to summarize customers’ confidential emails for weeks without permission, even when data loss prevention policies were meant to block such processing. First reported by BleepingComputer, the issue meant that draft and sent emails carrying a “confidential” label could be incorrectly processed by Copilot Chat, an AI feature available to paying Microsoft 365 customers across Office apps. Microsoft tracked the incident under admin advisory CW1226324 and said it began rolling out a fix earlier in February. The company did not disclose how many customers were affected, as broader scrutiny grows over whether built-in AI tools could upload sensitive correspondence to the cloud.
At the School of Responsible AI (SoRAI), we empower individuals and organizations to become AI-literate through comprehensive, practical, and engaging programs. For individuals, we offer specialized training, including AI Governance certifications (AIGP, RAI, AAIA) and an immersive AI Literacy Specialization. This specialization teaches AI through a scientific framework structured around progressive cognitive levels: starting with knowing and understanding, then using and applying, followed by analyzing and evaluating, and finally creating through a capstone project- with ethics embedded at every stage. Want to learn more? Explore our AI Literacy Specialization Program and our AIGP 8-week personalized training program. For customized enterprise training, write to us at [Link].
⚖️ AI Ethics
Google Shares 2026 Responsible AI Progress Report Highlighting Testing, Governance, and Risk Mitigation
Google has published an updated 2026 Responsible AI Progress Report dated February 18, 2026, saying 2025 marked a shift as AI systems became more proactive partners, with growing use of multimodal and personalized models and “agentic” tools aimed at boosting productivity. The report says responsible AI practices are now embedded across product and research lifecycles, with expanded testing, risk mitigation, and safeguards supported by human expertise and AI-driven automation, informed by decades of user-trust work. It also outlines a multi-layer governance framework spanning research, development, launch, and post-launch monitoring to detect and adapt to emerging risks. Alongside safety, the report frames broader access as a goal, pointing to applications such as flood forecasting for 700 million people and advances tied to genome research and blindness prevention, while stressing partnerships with governments, academia, and civil society to shape standards.
European Parliament Disables AI Tools on Lawmakers’ Devices Over Cloud Data Security Risks
The European Parliament has blocked lawmakers from using built-in AI tools on their work devices, citing cybersecurity and privacy concerns about sending confidential material to cloud-based AI services. An internal IT email seen by Politico said the institution cannot guarantee the security of data uploaded to AI providers’ servers and that what information may be shared is still being assessed, so disabling these features is considered safer. The restrictions reflect worries that using chatbots such as ChatGPT, Copilot, or Claude could expose sensitive data to third-party access, including potential demands under U.S. law, and that user inputs may be used to improve models. The move comes as EU institutions and member states reassess reliance on U.S. tech firms amid broader debates over European data protection rules and cross-border access to user data.
UN Panel Seeks to Make Human Control of AI a Technical Reality, Guterres Says
UN chief Antonio Guterres has urged “less hype, less fear” around artificial intelligence, saying a newly confirmed UN expert group will work to make “human control” of AI a practical, technical reality. The UN General Assembly has approved the 40-member Independent International Scientific Panel on Artificial Intelligence, created in August and modeled on the IPCC to provide science-based input for AI governance. Guterres said AI is advancing faster than the world’s ability to understand and regulate it, heightening concerns such as job losses, misinformation and online abuse. The panel’s first report is expected ahead of the UN Global Dialogue on AI Governance in July, with a focus on meaningful human oversight in high-stakes decisions and clear accountability for outcomes.
Claude AI Outage Hits Chat and Website as Downdetector Logs Nearly 3,000 Reports
Anthropic’s AI assistant Claude saw service disruptions on Tuesday evening, with Downdetector reports in the US spiking around 6:30 PM ET and peaking at nearly 3,000 user complaints citing chat and website problems. Users also reported the interface looking broken and, in some cases, losing access to ongoing conversations. Anthropic’s status page later acknowledged “CSS errors on Claude.ai,” said it was investigating, and then marked the incident resolved shortly after 12:10 AM UTC on 18 February 2026. The status dashboard subsequently showed no major disruptions and listed roughly 99.6% uptime for claude.ai and about 99.8% for platform.claude.com over the past 90 days, with the outage coming shortly after the company said it raised $30 billion in a Series G round valuing it at $380 billion post-money.
Anthropic Report Finds Claude Code Agent Autonomy Rising, With Emerging Use in Riskier Domains
Anthropic’s report on “Measuring AI agent autonomy in practice” analyzes millions of tool-using interactions across Claude Code and its public API to gauge how much autonomy people actually give AI agents, how oversight changes with experience, and where agents are being used. In Claude Code, the longest autonomous work periods nearly doubled in about three months, with the 99.9th-percentile “turn” rising from under 25 minutes to over 45 minutes, suggesting users and product factors—not just model upgrades—are pushing autonomy upward. As users gain experience, full auto-approve rises from roughly 20% of sessions to over 40%, while interruption rates also increase, indicating a shift from step-by-step approvals to monitoring and intervening when needed; on complex tasks, the agent pauses for clarification more than twice as often as humans interrupt it. On the API side, most agent actions appear low-risk and reversible, with software engineering making up nearly half of tool calls, but early activity is also showing up in healthcare, finance, and cybersecurity, underscoring the need for stronger post-deployment monitoring and better human-agent oversight design as higher-stakes use grows.
US Labor Department Issues AI Literacy Framework With Five Content Areas and Seven Principles
The U.S. Department of Labor’s Employment and Training Administration has published an AI Literacy Framework aimed at guiding nationwide AI literacy efforts across workforce and education systems. The document lays out five foundational content areas and seven delivery principles to help shape program design and deployment, while allowing flexibility across industries, job roles, and educational settings. The framework follows recent federal guidance encouraging the use of Workforce Innovation and Opportunity Act funds and governors’ reserve money to support AI skills training. It was developed with input from employers, training providers, and state and local agencies, and is expected to evolve as AI capabilities and labor market needs change, with feedback invited via a department email address.
OpenAI and Paradigm Launch EVMbench to Benchmark AI Smart Contract Security Skills
OpenAI, working with Paradigm, has published EVMbench, a benchmark designed to measure how well AI agents can detect, patch, and exploit high-severity vulnerabilities in Ethereum Virtual Machine smart contracts. The benchmark is built from 120 curated vulnerabilities drawn from 40 audits, largely sourced from Code4rena audit competitions, and also includes scenarios based on security work for the Tempo payments-focused blockchain. It evaluates three modes—finding known issues, fixing them while keeping functionality intact, and executing end-to-end fund-draining attacks in a sandboxed local Anvil environment using a Rust harness for deterministic grading. In exploit tasks, GPT‑5.3‑Codex via Codex CLI scored 72.2%, up from 31.9% for GPT‑5 about six months earlier, while detect and patch results still fell short of full coverage. The company said the benchmark has limits, including imperfect grading for newly found issues and reduced realism versus heavily scrutinized production contracts, but it is meant to track growing dual-use cyber risk and support more defensive AI-assisted auditing.
🚀 AI Breakthroughs
Google Releases Gemini 3.1 Pro Preview After Record Benchmark Gains in Reasoning Tasks
Google has released Gemini 3.1 Pro, an upgraded version of its Gemini Pro large language model, with the company saying it is available in preview now and will reach general availability soon. The model is being rolled out across the Gemini app and NotebookLM for consumers, and through Google’s Gemini API tools, Android Studio, Vertex AI and Gemini Enterprise for developers and businesses. Google said Gemini 3.1 Pro is a step up in core reasoning over Gemini 3 Pro, citing a verified 77.1% score on the ARC-AGI-2 benchmark—more than double the prior version’s reasoning performance. The company also pointed to stronger results on other independent benchmarks, as competition among major AI labs accelerates around models built for multi-step reasoning and agent-style work.
Google’s Gemini App Adds Lyria 3 Beta to Generate 30-Second Music Tracks
Google has added music creation to the Gemini app, rolling out its latest generative music model, Lyria 3, in beta. The feature lets users generate 30-second tracks from text prompts or from uploaded photos and videos, with options for lyrics or instrumentals and extra controls such as style, vocals and tempo, plus AI-generated cover art. Google said every generated track will include SynthID watermarking, and Gemini’s verification tools are expanding to check audio for SynthID alongside images and video. Lyria 3 is also being used in YouTube’s Dream Track for Shorts soundtracks, starting in the U.S. and expanding to more countries. The music tool is available to users 18+ in eight languages on desktop first, with mobile rollout following, while paid Google AI subscribers get higher usage limits.
YouTube expands conversational AI “Ask” feature to smart TVs, consoles, and streaming devices
YouTube is expanding its experimental conversational AI feature to TVs, bringing the “Ask” button to select smart TVs, gaming consoles, and streaming devices so viewers can query information about what they’re watching without leaving the video. Eligible users over 18 can choose suggested prompts or use a remote microphone to ask questions such as recipe details or song-lyric context, with support currently limited to English, Hindi, Spanish, Portuguese, and Korean. The move comes as YouTube viewing on televisions continues to grow; Nielsen data from April 2025 put YouTube at 12.4% of total TV audience time, ahead of Disney and Netflix. The rollout follows similar living-room AI pushes from Amazon’s Alexa+ on Fire TV, Roku’s upgraded AI voice assistant, and Netflix’s AI search testing, alongside YouTube’s other AI efforts such as comment summaries and improved TV video quality.
Anthropic releases Claude Sonnet 4.6 with 1M-token context and improved coding
Anthropic has released Claude Sonnet 4.6, calling it the most capable Sonnet model so far, with upgrades across coding, long-context reasoning, agent planning, knowledge work, design, and “computer use.” The model adds a 1 million-token context window in beta and becomes the default on claude.ai and Claude Cowork for Free and Pro users, while keeping Sonnet pricing unchanged at $3 per million input tokens and $15 per million output tokens. The company said early testing showed users preferred Sonnet 4.6 over Sonnet 4.5 about 70% of the time in Claude Code, and chose it over Claude Opus 4.5 59% of the time, citing better instruction-following and fewer hallucinations. Anthropic also reported improved performance on computer-use evaluations such as OSWorld and stronger resistance to prompt injection attacks compared to Sonnet 4.5, alongside safety testing that found no major new misalignment concerns. Sonnet 4.6 is available across Claude plans, Claude Code, the API, and major cloud platforms, with tooling updates including context compaction in beta and broader availability of code execution and other developer features.
OpenAI and Reliance to add AI conversational search and recommendations to JioHotstar
OpenAI has partnered with Reliance to add AI-powered conversational search to JioHotstar, enabling users to find movies, shows and live sports through text or voice prompts in multiple languages, with recommendations tailored to viewing history and preferences. The integration, built on OpenAI’s API, is also set to work in the other direction by surfacing JioHotstar suggestions inside ChatGPT with deep links into the streaming catalogue. The deal comes as streaming platforms increasingly test conversational discovery, following similar experiments by Netflix and Google TV. The tie-up is part of OpenAI’s broader push to expand in India, including plans to open offices in Mumbai and Bengaluru and additional partnerships across infrastructure and enterprise.
Cohere Launches Open-Weight Tiny Aya Multilingual Models Supporting 70+ Languages for Offline Devices
Cohere has released Tiny Aya, a new family of open-weight multilingual AI models rolled out alongside the India AI Summit, designed to run locally on everyday devices like laptops without an internet connection. The models support more than 70 languages, including South Asian languages such as Bengali, Hindi, Punjabi, Urdu, Gujarati, Tamil, Telugu, and Marathi, with a 3.35-billion-parameter base model. The lineup includes TinyAya-Global for stronger instruction-following and regional variants such as TinyAya-Earth (African languages), TinyAya-Fire (South Asian languages), and TinyAya-Water (Asia Pacific, West Asia, and Europe). Cohere said the models were trained on a single cluster of 64 Nvidia H100 GPUs and are aimed at offline use cases like translation, with releases available via Hugging Face and the Cohere Platform, alongside plans to share datasets and a technical report.
WordPress.com Adds Built-In AI Assistant for Natural-Language Editing, Layout Changes, and Image Generation
WordPress.com, Automattic’s website hosting platform, has added an opt-in AI Assistant that works inside the site editor to understand a site’s content and layout and apply natural-language changes. The tool can adjust block-theme layouts, styles, colors, fonts, and patterns in real time, and it can also add pages or sections such as contact or testimonials, but it does not appear for classic themes. It can rewrite or translate site text and provide editing help such as headline suggestions, grammar checks, and fact-checking within the block notes editor in WordPress 6.9 using an @ai command. For visuals, it can generate or edit images from the Media Library via Google Gemini’s Nano Banana models, with controls like aspect ratio and style, and it is enabled by default for sites created with the AI website builder.
Anthropic Says Claude Code Relies on High Prompt Caching Hit Rates for Efficiency
Anthropic said its Claude Code product is designed around achieving consistently high prompt-cache hit rates, arguing that long-running agent sessions otherwise become too slow and expensive. The company described prompt caching as strict prefix matching, where any change early in a request can invalidate cached computation, making prompt ordering and stability critical. To preserve cacheability, Claude Code keeps static elements like the system prompt and tool definitions at the start, pushes updates into later system messages instead of editing the system prompt, and avoids switching models mid-session because caches are model-specific. It also keeps the tool set fixed during a conversation, using deferred loading and tool-driven state transitions, and runs conversation “compaction” in a cache-safe way by reusing the same upfront context. Anthropic framed cache hit rate as an operational metric on par with uptime because small drops can significantly raise latency and costs.
🎓AI Academia
SkillsBench Benchmark Finds Curated Agent Skills Boost Task Success, Self-Generated Skills Lag
A new preprint on arXiv describes SkillsBench, a benchmark designed to measure whether “agent skills” (structured procedural guides used at inference time) actually improve LLM agents on real tasks. The dataset covers 86 tasks across 11 domains and uses curated skills plus deterministic verifiers, with evaluations run in three settings: no skills, curated skills, and self-generated skills. Across 7 agent-model setups and 7,308 trajectories, curated skills increased average pass rates by 16.2 percentage points, though gains varied sharply by domain (from +4.5pp in software engineering to +51.9pp in healthcare) and 16 of 84 tasks got worse. Self-generated skills delivered no average improvement, suggesting models often can’t reliably write the procedural knowledge they benefit from using, and the paper also reports that tightly focused skills (2–3 modules) can beat comprehensive documentation and help smaller models match larger ones without skills.
Moltbook Study Finds AI Agent Societies Stabilize Semantics but Fail to Form Consensus
A new arXiv preprint dated February 19, 2026 examines Moltbook, described as the largest persistent and publicly accessible AI-only social platform with millions of LLM-driven agents posting, commenting, and voting, to test whether “socialization” emerges in large-scale agent societies. The paper proposes metrics to track dynamic change, including semantic stabilization, lexical turnover, individual inertia, influence persistence, and collective consensus. It reports that overall semantic content stabilizes quickly, but individual agents remain diverse with ongoing vocabulary churn rather than converging toward a shared style or norms. The analysis also finds strong agent inertia and weak adaptation to interaction partners, leading to fleeting influence, no durable “supernodes,” and a lack of stable structure or consensus, which it attributes to missing shared social memory.
IEEE Study Models Auditing Rules for Machine Unlearning Compliance Under Right-to-Be-Forgotten Laws
A new IEEE Transactions on Mobile Computing paper outlines an economic auditing framework for checking whether AI systems truly comply with “right to be forgotten” deletion requests by using machine unlearning, rather than just deleting stored files. It models unlearning verification as a hypothesis-testing problem to quantify auditors’ detection power, then uses game theory to capture strategic behavior by operators facing accuracy and profit trade-offs. The analysis finds that inspection intensity can optimally fall as deletion requests rise because weaker unlearning makes cheating easier to detect, aligning with reported reductions in audits in China despite increasing requests. It also argues that while undisclosed audits can give auditors more information, disclosed auditing can be more cost-effective, and experiments on real data report large payoff gains versus a benchmark (up to 2549.30% for the auditor and 74.60% for the operator).
ForesightSafety Bench Sets 94-Dimension Framework to Evaluate Frontier AI Risks and Governance
A new AI safety evaluation framework called ForesightSafety Bench has been detailed in an arXiv preprint (arXiv:2602.14135v2, posted Feb. 18, 2026), aiming to address gaps in current benchmarks that struggle to detect frontier risks. The framework starts with seven “Fundamental Safety” pillars and expands into areas such as Embodied AI Safety, AI-for-Science safety, social and environmental risks, and catastrophic and existential risks, alongside eight industrial safety domains, totaling 94 risk dimensions. The authors report that it already contains tens of thousands of structured risk data points and has been used to assess more than 20 mainstream advanced large models. The evaluation flags widespread vulnerabilities, with emphasis on risky agentic autonomy, AI4Science and embodied interaction risks, social manipulation, loss of human control, and self-replication, and the project’s code and documentation are publicly available on GitHub and a dedicated website.
Africa-Centric AI Safety Evaluations Highlight Portability Gaps and Severe Risk Pathways for Frontier Systems
A new research paper argues that as frontier AI systems spread across Africa, most existing AI safety evaluations—built and tested mainly in Western settings—may miss “Africa-centric” routes to severe harm when these tools are deployed in resource-constrained and tightly interdependent infrastructures. It defines severe AI risks as outcomes causing grave injury or death of thousands of people, or economic loss and damage equivalent to 5% of a country’s GDP, and lays out a taxonomy that links hazards, vulnerabilities, and exposure, with special focus on harms that are rapidly triggered and amplified by AI. The paper also outlines practical threat-modelling approaches tailored to conditions such as poor connectivity, limited technical capacity, weak state institutions, and conflict, drawing on methods like scenario planning and structured expert elicitation. On misalignment, it says African deployments are more likely to reveal broadly shared failure modes through distributional shift than to create uniquely regional misalignment pathways, and it recommends open tools, tiered evaluation pipelines, and wider sharing of results to expand evaluation coverage under tight budgets.
Oversight-by-Design Approach Adds Mandatory Human Review to Ensure Accessible LLM-Generated Interfaces
A new research preprint accepted for the IUI Workshops 2026 proceedings argues that LLM-generated user interfaces are moving into high-stakes areas such as healthcare communication, where presentation choices can affect real-world decisions and must remain accessible to people with disabilities. It warns that risks like hallucinations, semantic distortion, bias, and accessibility failures can weaken trust and make it harder for users to understand and challenge AI-supported outputs. The paper says oversight is often treated as a late-stage check, with unclear triggers for intervention and accountability. It proposes “oversight-by-design,” embedding human judgment throughout the UI generation pipeline using escalation policies, automated risk checks (readability, factual and semantic consistency, and accessibility standards), and mandatory human review when thresholds are breached. It also describes ongoing human supervision using monitoring signals and audit logs to tune policies, detect drift, and make oversight verifiable over time.
Mirror Multi-Agent System Uses Fine-Tuned EthicsLLM to Assist Institutional Research Ethics Reviews
A new arXiv preprint (2602.13292, posted Feb. 9, 2026) describes Mirror, a multi-agent system designed to assist institutional ethics review as research volumes and cross-disciplinary risks increase. The system centers on EthicsLLM, a language model fine-tuned on a purpose-built EthicsQA dataset of about 41,000 question, chain-of-thought, and answer examples distilled from ethics and regulatory sources. Mirror operates in two modes: Mirror-ER runs expedited, rule-based compliance checks for minimal-risk studies, while Mirror-CR simulates full committee deliberation among specialized agents to produce structured assessments across ten ethics dimensions. The paper reports that Mirror delivers more consistent and professional ethics assessments than strong general-purpose LLMs, while aiming to be modular and privacy-preserving for real institutional deployment.
Global Audit Finds Llama-3 Shows Geographic Bias, Deepening Governance Gaps Between North and South
Research from the Technical AI Governance Challenge 2026 reports early results from a global bias audit that stress-tested Meta’s Llama-3 8B model for geographic and socioeconomic skew in technical AI governance knowledge. Using 1,704 queries across 213 countries and eight technical metrics, the audit found a sharp information gap between higher-income regions and lower-income countries in the Global South. The model produced number-or-fact style answers in just 11.4% of responses, and the real-world accuracy of those claims had not yet been verified. The study argues that these gaps could undermine inclusive AI governance by leaving policymakers in underserved regions without reliable data or vulnerable to hallucinated facts, and calls for more globally representative training data.
GLM-5 Foundation Model Targets Agentic Engineering With Lower Costs and Stronger Coding Benchmarks
GLM-5, a new foundation model described in an arXiv preprint dated Feb. 17, 2026, targets a shift from “vibe coding” toward more autonomous “agentic engineering,” building on earlier agentic, reasoning, and coding capabilities. The paper says the model uses a DSA approach to cut training and inference costs while preserving long-context performance, alongside an asynchronous reinforcement-learning setup that separates text generation from training to speed post-training. It also outlines new asynchronous agent RL algorithms aimed at improving long-horizon decision-making during complex interactions. On eight public agentic, reasoning, and coding benchmarks shown in the paper’s Figure 1, GLM-5 is reported to post leading results in several areas, including strong scores on BrowseComp, MCP-Atlas, and SWE-bench variants, and improved end-to-end software engineering performance. Code and model resources are listed at https://github.com/zai-org/GLM-5.
About SoRAI: SoRAI is committed to advancing AI literacy through practical, accessible, and high-quality education. Our programs emphasize responsible AI use, equipping learners with the skills to anticipate and mitigate risks effectively. Our flagship AIGP certification courses, built on real-world experience, drive AI governance education with innovative, human-centric approaches, laying the foundation for quantifying AI governance literacy. Subscribe to our free newsletter to stay ahead of the AI Governance curve.



