US releases national AI policy framework
++ Microsoft rolls back Copilot entry points in Windows 11; Cloudflare CEO warns AI bot traffic may surpass humans by 2027; Meta expands AI content enforcement and reduces third-party vendors..& more
Today’s highlights:
Administration has issued a six-part legislative framework for a single national AI policy designed to set uniform safety and security guardrails while preempting states from adopting their own AI rules.
This framework addresses six key objectives:
Protecting Children and Empowering Parents: Parents are best equipped to manage their children’s digital environment and upbringing. The Administration is calling on Congress to give parents tools to effectively do that, such as account controls to protect their children’s privacy and manage their device use. The Administration also believes that AI platforms likely to be accessed by minors should implement features to reduce potential sexual exploitation of children or encouragement of self-harm.
Safeguarding and Strengthening American Communities: AI development should strengthen American communities and small businesses through economic growth and energy dominance. The Administration believes that ratepayers should not foot the bill for data centers, and is calling on Congress to streamline permitting so that data centers can generate power on site, enhancing grid reliability. Congress should also augment Federal government ability to combat AI-enabled scams and address AI national security concerns.
Respecting Intellectual Property Rights and Supporting Creators: The creative works and unique identities of American innovators, creators, and publishers must be respected in the age of AI. Yet, for AI to improve it must be able to make fair use of what it learns from the world it inhabits. The Administration is proposing an approach that achieves both of these objectives, enabling AI to thrive while ensuring Americans’ creativity continues propelling our country’s greatness.
Preventing Censorship and Protecting Free Speech: The Federal government must defend free speech and First Amendment protections, while preventing AI systems from being used to silence or censor lawful political expression or dissent. AI cannot become a vehicle for government to dictate right and wrong-think. The Administration is proposing guardrails to ensure that AI can pursue truth and accuracy without limitation.
Enabling Innovation and Ensuring American AI Dominance: The Administration is calling on Congress to take steps to remove outdated or unnecessary barriers to innovation, accelerate the deployment of AI across industry sectors, and facilitate broad access to the testing environments needed to build and deploy world-class AI systems.
Educating Americans and Developing an AI-Ready Workforce: The Administration wants American workers to participate in and reap the rewards of AI-driven growth, encouraging Congress to further workforce development and skills training programs, expanding opportunities across sectors and creating new jobs in an AI-powered economy.
The biggest concern is that the framework could preempt stronger state protections without replacing them with equally strong federal safeguards. It is also being criticized for limited attention to privacy, algorithmic discrimination, accountability, liability, and enforceable oversight. On copyright, it leans toward letting courts sort out training-data disputes instead of giving Congress a firmer rule. Some observers also note that it is sparse on national-security specifics despite AI’s geopolitical significance.
The White House said it wants to work with Congress in the coming months to turn the framework into a bill and aims to codify it this year, though passing it could be difficult in a closely divided Congress as several states continue pursuing their own AI regulations and industry groups warn against a patchwork of laws.
At the School of Responsible AI (SoRAI), we empower individuals and organizations to become AI-literate through comprehensive, practical, and engaging programs. For individuals, we offer specialized training, including AI Governance certifications (AIGP, RAI, AAIA) and an immersive AI Literacy Specialization. This specialization teaches AI through a scientific framework structured around progressive cognitive levels: starting with knowing and understanding, then using and applying, followed by analyzing and evaluating, and finally creating through a capstone project- with ethics embedded at every stage. Want to learn more? Explore our AI Literacy Specialization Program and our AIGP 8-week personalized training program. For customized enterprise training, write to us at [Link].
⚖️ AI Ethics
NHAI to deploy AI-enabled dashcam cameras across 40,000 km of national highways
The National Highways Authority of India (NHAI) said it will deploy AI- and machine learning-enabled dashcam analytics on about 40,000 km of the national highway network to support faster maintenance, improve road safety and enhance user experience. Special dashboard cameras will be mounted on route patrol vehicles to conduct weekly surveys, with AI models trained to automatically detect more than 30 types of defects and anomalies using high-resolution video and imagery. NHAI said at least one night survey will be carried out each month to assess signages, lane markings, road studs and highway lighting, while also flagging issues such as water stagnation, drainage cover gaps, vegetation growth and bus-bay conditions. The project will be monitored through five zones and a dedicated IT platform with data management, AI analytics and visualization dashboards, with outputs integrated into NHAI’s central data lake for tracking repairs over time and ensuring timely rectification.
Cursor Confirms Composer 2 Coding Model Built on Moonshot AI’s Open-Source Kimi 2.5 Base
Cursor acknowledged that its new coding model, Composer 2, was built on top of Moonshot AI’s open-source Kimi 2.5, after social media users flagged code suggesting the underlying model identity. The company said only about a quarter of the compute used for the final system came from the base model, with the rest coming from additional training and reinforcement learning, leading to different benchmark results. It also said the use complied with Kimi’s license terms, and Moonshot AI’s Kimi account said the work was part of an authorized commercial partnership via Fireworks AI. Cursor later conceded it should have credited Kimi in its original write-up and said it plans to do so in future releases.
Anonymous Post Accuses YC-Backed Delve of Fake Compliance, Exposing Customers to Regulatory Risk
A Substack post published this week accused Y Combinator-backed compliance startup Delve of misleading customers with “fake compliance,” alleging it fabricated evidence, skipped key framework requirements, and relied on audit firms that allegedly rubber-stamped reports, potentially exposing customers to HIPAA and GDPR risk. The post also claimed Delve helped customers present unimplemented controls on public trust pages and referenced a reported spreadsheet leak, alongside separate claims on X of exposed sensitive internal documents. Delve rejected the allegations as misleading, saying it does not issue compliance reports, that independent accredited auditors produce final opinions, and that it provides templates rather than “pre-filled evidence.” The anonymous author said the response sidestepped major claims and indicated more allegations would follow, while Delve said it is investigating any potential leaks.
Hachette Pulls Horror Novel “Shy Girl” in US and UK Amid AI-Generated Text Concerns
Hachette Book Group has pulled the horror novel “Shy Girl” amid concerns that the text may have been generated using artificial intelligence. The book had been set for a U.S. release this spring, and the publisher said it will also discontinue the title in the U.K., where it is already on sale. Online reviewers on Goodreads and YouTube had raised suspicions of AI use, and The New York Times reported it questioned the publisher about the allegations a day before the decision. The author denied using AI and said an acquaintance hired to edit an earlier self-published version may be responsible, adding that legal action is being pursued. Industry observers noted that U.S. publishers often do limited re-editing when acquiring previously published titles, which can allow problems to slip through.
Anthropic Court Filings Dispute Pentagon’s National Security Claims, Cite Near-Alignment After Trump Split
Anthropic has filed two sworn declarations in a California federal court dispute with the U.S. Department of Defense, contesting the Pentagon’s claim that the company poses an “unacceptable risk to national security” and saying the government’s case rests on technical misunderstandings and issues not raised during prior negotiations. The filings come ahead of a March 24 hearing in San Francisco and follow a late-February breakdown after President Trump and the defense secretary said the Pentagon would cut ties with the company over limits on military uses. The declarations argue Anthropic never sought any approval role over military operations and say concerns about the company disabling or changing its AI mid-mission appeared only in court, not during talks. They also cite an email sent the day after the Pentagon finalized its supply-chain risk designation indicating the sides were “very close” on disputed issues, and contend that once deployed in air-gapped government systems Anthropic cannot access, alter, or remotely shut down the models. Anthropic says the designation is retaliatory and violates the First Amendment, while the government argues it is a standard national security determination tied to business decisions rather than protected speech.
Microsoft Rolls Back Copilot Entry Points in Windows 11, Cutting AI Integrations in Apps
Microsoft is scaling back some Copilot touchpoints in Windows 11 as part of a broader push to improve the operating system, reducing AI integrations in apps such as Photos, Widgets, Notepad, and the Snipping Tool. The shift signals a more selective approach to where Copilot appears, amid wider consumer unease about “AI bloat” and trust concerns, reflected in recent Pew Research findings showing more Americans worried than excited about AI as of June 2025. The move follows earlier reports that some Copilot-branded features planned for deeper Windows 11 areas like Settings and File Explorer were shelved, and it comes after delays and ongoing scrutiny around the privacy and security of the Recall feature on Copilot+ PCs. Alongside the Copilot changes, Microsoft is working on more customization and performance updates, including flexible taskbar placement, faster File Explorer, improved Widgets, and updates to feedback and Insider tools.
Cloudflare CEO Warns AI Bot Traffic Will Surpass Human Web Traffic by 2027
Online bot traffic is on track to surpass human internet traffic by 2027, according to comments from Cloudflare’s CEO at SXSW in Austin, citing rapid growth in generative AI. He said AI agents can generate far more web requests than people—such as scanning thousands of sites for tasks like shopping—creating real load for websites and networks. He estimated that before the generative AI boom, bots accounted for about 20% of internet traffic, led mainly by major search crawlers, but AI’s heavy data demands are pushing bot activity sharply higher. He also said the shift will require new infrastructure, including temporary “sandbox” environments for AI agents, alongside continued investment in data centers and traffic-management tools.
Meta Expands AI Content Enforcement Tools, Cuts Third-Party Vendor Role Across Apps
Meta is rolling out more advanced AI systems to handle content enforcement across its apps, while cutting back on third‑party vendors that currently help review harmful material such as terrorism, child exploitation, drugs, fraud, and scams. The company said the AI tools will be expanded once they consistently outperform existing enforcement methods, with humans still handling high‑risk decisions like appeals and law‑enforcement reports. Meta claimed early tests found the systems detected twice as much adult sexual solicitation content as review teams and cut errors by more than 60%, while also improving detection of impersonation, account takeovers, and roughly 5,000 scam attempts per day. The shift comes amid broader changes to Meta’s moderation approach over the past year, and as major tech platforms face lawsuits over alleged harms to young users. Meta also rolled out a 24/7 Meta AI support assistant to Facebook and Instagram on iOS and Android, plus the desktop Help Center.
Maharashtra Moves Toward Dedicated Tribunal for IT Employee Grievances Under New Labour Codes
Maharashtra has signalled it will set up a first dedicated tribunal mechanism for IT employees once state rules aligned with the Industrial Relations Code, 2020 and the new labour codes are finalised and notified. The state labour minister told the Assembly that provisions for special tribunals, including for the IT sector, will be created under the upcoming framework to handle grievances and disputes. The issue was raised amid complaints from Pune’s IT workforce about alleged fraud, forced resignations and other labour issues, with the government citing recent intervention in a placement-related fraud case that was later handed to police. Until the new system is in place, disputes will continue to be mediated by officials and, if unresolved, sent to existing labour or industrial courts, while the state also indicated it will hold consultations with industry, officials and IT professionals.
Supermicro Co-Founder Arrested in $2.5 Billion AI Server Smuggling Scheme to China
US prosecutors have charged three people, including a Super Micro Computer co-founder and senior executive, with conspiring to smuggle advanced AI servers to China in an alleged export-control evasion scheme run between 2024 and 2025. Authorities said the group routed US-assembled servers—reportedly equipped with NVIDIA AI GPUs—through Taiwan and Southeast Asia using an intermediary that falsely appeared to be the end customer, then repackaged shipments into unmarked boxes for onward delivery to China. Investigators allege the defendants used falsified paperwork, misleading internal records, and staged compliance checks that included non-functional “dummy” servers. The intermediary company is accused of buying about $2.5 billion in servers over the period, with at least $510 million allegedly diverted to China within weeks in early 2025; two defendants have been arrested while one remains a fugitive. Supermicro said it has not been charged, and the accused face counts including conspiracy, smuggling, and defrauding the US, with penalties of up to 20 years if convicted.
Gujarat Deep-Tech Firm Develops AI Action Firewall to Verify and Log AI Operations
A Gujarat-based neuro-engineering deep-tech AI company has developed an “AI Action Firewall” designed to make AI systems safer by verifying every AI-driven action before it is executed. The tool is positioned as a policy-based safety layer between AI agents and real-world operations, with the aim of ensuring actions are authorised, monitored and recorded. It is built to regulate tasks such as sending emails, executing code, accessing databases and triggering automated workflows, classifying each action as “allow”, “review” or “block,” with high-risk steps requiring human approval. The company said it has filed for a global patent for the technology and that all decisions will be logged to create an audit trail for transparency, accountability and compliance. It also argued existing cybersecurity tools built for human users are not sufficient for controlling autonomous AI agents, making a dedicated control layer necessary.
Russia Proposes Rules Allowing Bans or Limits on Foreign AI Tools Like ChatGPT
Russia is moving toward new regulations that could ban or restrict foreign AI tools such as ChatGPT, Claude, and Google’s Gemini if they do not comply with rules aimed at strengthening a “sovereign internet” and aligning technology with officially defined traditional values. The draft proposals would give authorities broad powers to limit “cross-border” AI services, citing risks of covert manipulation and discriminatory algorithms, and arguing that foreign models transmit Russian users’ data abroad. Under the proposed regime, widely used AI models may be required to store Russian user data on servers located in Russia for three years, a demand some Western tech firms have previously resisted. The measures are expected to take effect next year after further review and approval, and could boost domestic AI providers while encouraging foreign or open models to be deployed in closed, Russia-based environments.
Microsoft Weighs Legal Action as Amazon-OpenAI Deal Hinges on ‘Stateful’ vs ‘Stateless’
Microsoft is considering legal action against Amazon and OpenAI over a reported $50 billion cloud deal that could weaken Microsoft’s exclusive arrangement to host OpenAI model access on Azure, according to the Financial Times. The dispute hinges on whether OpenAI’s planned “Frontier” product on AWS relies on “stateful” access (with memory and context) through a proposed Stateful Runtime Environment on Bedrock, or remains “stateless” in a way OpenAI argues does not violate its Microsoft contract. Microsoft reportedly views the setup as an infeasible workaround that breaches the spirit of the agreement, while OpenAI says the Amazon deal does not provide backdoor access to its stateless models and that new products with third parties are allowed if they are not primarily offered as APIs. Amazon has reportedly told staff to describe the system as being integrated with or powered by OpenAI, avoiding language suggesting direct ChatGPT access, as tensions rise while OpenAI prepares for a potential IPO and faces other legal pressures.
Tech Workers Ramp Up AI Token Spending as Coding Leaderboards Raise Productivity Doubts
Tech workers are ramping up use of AI coding “agents,” driving token consumption—and costs—to extremes, including a reported case of more than $150,000 spent in a month on Anthropic’s Claude Code. Some companies, including Meta and Shopify, are increasingly factoring AI usage into performance reviews, while internal token leaderboards at firms such as Meta and OpenAI reportedly reward heavy use without directly measuring output quality. Agentic tools can run for hours unattended and spawn sub-agents, making it possible for individuals to burn through hundreds of millions to billions of tokens per week, boosting revenue for AI providers. At the same time, some engineers and managers warn the trend is becoming productivity theater and may show diminishing returns as spending rises faster than provable gains.
PWC US CEO Warns Partners Must Go AI-First or Risk Losing Roles
PwC’s US chief executive has warned that partners and employees who do not adopt an “AI-first” mindset risk being replaced and may not stay long at the firm, according to comments to the Financial Times. The executive said PwC’s hiring mix is shifting away from traditional accounting and consulting roles toward more engineers and data specialists, even as the firm remains a net hirer despite cutting 5,600 jobs last year to take global headcount to under 365,000. PwC also plans to turn parts of its tax and consulting work into AI-powered subscription tools that clients can use without a consultant involved in early steps. The firm is rolling out an AI platform called “PwC One” with six automated services, and expects automation to push the industry toward outcome-based pricing as clients seek measurable results.
OpenAI Details GPT-5.4-Powered Monitoring System to Detect Misalignment in Internal Coding Agents
OpenAI detailed an internal monitoring system designed to detect potential misalignment in coding agents used inside the company, as AI agents take on more autonomous, high-impact work. The setup uses a separate GPT‑5.4 “Thinking” model at maximum reasoning effort to review full agent sessions—including tool use and internal reasoning traces—and issue severity-rated alerts within about 30 minutes, with a longer-term goal of near real-time and even pre-action blocking. Over five months, the system monitored tens of millions of coding trajectories, matched every issue employees independently escalated, and flagged additional suspect behavior; about 1,000 conversations triggered moderate alerts, often tied to internal red-teaming, while none hit the top severity level meant for rare, high-stakes scheming. The company said it most often observed agents trying to work around restrictions when prompts inadvertently encouraged it, reported no evidence so far of motives like self-preservation, and noted that less than 0.1% of internal traffic remains outside monitoring coverage as that gap is closed.
Sora 2 and Sora app add provenance, consent checks, and teen safety controls
OpenAI published a safety overview dated March 23, 2026 detailing how Sora 2 and the Sora app aim to reduce misuse as video generation becomes more realistic and adds audio. The company said every Sora-generated video carries provenance signals, including C2PA metadata, and can be traced using internal reverse image and audio search tools, with many shared outputs also showing moving watermarks that can include the creator’s name. It also described stricter controls for image-to-video involving real people, requiring users to attest they have consent, with even tighter moderation for content featuring children or young-looking persons and mandatory watermarks on sharing. OpenAI said the “characters” feature is designed for consent-based use of a person’s appearance and voice, with user-controlled access and visibility into any drafts that use the character, alongside blocks on public-figure depictions outside that feature. Additional measures cover teen accounts, layered filtering for sexual content, terrorism and self-harm, transcript scanning for generated speech, blocks on music imitation of living artists, takedown handling, and reporting, blocking, and post-removal tools for user recourse.
🚀 AI Breakthroughs
Lovable Adds Data Analysis, Document and Media Generation to Its AI App-Building Platform
Lovable, an AI-powered app-building platform, has broadened its product into an all-in-one workspace that combines software creation with document generation, data analysis, and media production in a single chat-style interface. The company said its agent can handle files such as spreadsheets, PDFs, slide decks, Word documents, images, and videos, run code and Python for analysis, convert formats, and validate outputs in a secure environment tied to the apps users build. Lovable also said users can turn static inputs like spreadsheets, PDFs, or screenshots into working applications with databases, dashboards, and user interfaces, while integrations with tools like Slack and analytics platforms help summarize feedback and surface product recommendations. The company previously raised $330 million in a Series B round at a $6.6 billion valuation, and TechCrunch has reported it has surpassed $400 million in annual recurring revenue.
OpenAI Sets 16MB, 10-Min Parameter Golf Challenge for Model Training on 8x H100 GPUs
OpenAI has launched “Parameter Golf,” a model training challenge that asks participants to build the best-performing language model that fits within a 16MB artefact and can be trained in under 10 minutes on an NVIDIA 8x H100 GPU cluster. Submissions will be judged primarily on compression performance on the FineWeb validation set, alongside reproducibility and strict compliance with the size and compute limits. OpenAI positioned the effort as a move away from simply scaling models up, pushing researchers toward parameter-efficient optimisation and unconventional techniques such as parameter tying, low-rank methods, and new tokenisation strategies. The challenge runs from March 18 to April 30, and OpenAI is offering $1 million in compute credits to support participants, while also treating the contest as a signal for early-career talent.
WordPress.com Enables AI Agents to Draft, Edit, and Publish Posts With User Approval
WordPress.com now supports AI agents that can draft, edit, and publish posts and pages, manage comments, update metadata for SEO, and organize tags and categories through natural-language commands. The features build on the platform’s earlier support for Model Context Protocol (MCP), which already let AI tools connect to read site content and settings, and now extends that access to making changes. The company said AI-written posts are saved as drafts by default and require user approval, with actions recorded in the site’s Activity Log. While WordPress powers more than 43% of websites overall—mostly outside WordPress.com—WordPress.com said its own network reaches about 20 billion monthly page views and 409 million unique visitors, amplifying concerns about more machine-generated content on the web.
DoorDash Launches Tasks App Paying Couriers for Videos and Audio to Train AI
DoorDash has launched a stand-alone “Tasks” app that pays its delivery couriers to complete assignments such as filming everyday actions or recording speech in other languages, with the goal of improving AI and robotic systems. The company said pay is shown upfront and set based on the effort and complexity of each activity, and Bloomberg reported the submitted audio and video can be used to evaluate DoorDash’s own AI models as well as those from partners across industries. Tasks will also appear inside the Dasher app, including jobs like taking real photos of restaurant dishes or documenting hotel entrances to help with drop-offs, alongside a Waymo-related task involving closing doors on self-driving cars. The app and in-app tasks are available in select U.S. locations, excluding California, New York City, Seattle, and Colorado, with plans to expand to more task types and countries.
Amazon Offers Free Kiro AI Coding Credits to Students Despite Internal Reliability Concerns
Amazon is offering eligible university students free, limited-time access to its AI coding tool Kiro through a new student programme that provides 1,000 credits per month for a year without requiring a credit card. Access requires sign-up with a university email and third-party verification, with availability currently limited to select schools and expected to expand. The move mirrors rivals’ efforts to lock in student developers with free tiers, as GitHub Copilot and other AI coding tools offer student deals. However, the rollout comes as reports cite internal concerns from current and former Amazon employees who described Kiro as unreliable, saying it can hallucinate, produce flawed code, and sometimes slow workflows through added manual fixes. Separate reporting has also linked the tool to operational incidents, underscoring risks tied to autonomous agents in production settings.
🎓AI Academia
GOLDMARK Assessment Reference Kit Targets Standardization and Reproducible Evaluation for AI Pathology Biomarkers
A new arXiv preprint (arXiv:2603.20848v1, posted March 21, 2026) describes GOLDMARK, an “Assessment Reference Kit” aimed at standardizing how AI-based computational biomarkers are built and evaluated from H&E whole-slide pathology images. The paper notes that slide-level multiple-instance learning paired with pathology foundation models has become a common baseline for predicting treatment response or prognosis, but the field still lacks clinical-grade infrastructure. It highlights gaps such as standardized intermediate data formats, provenance and audit trails, consistent checkpointing practices, and reproducible evaluation metrics. The authors argue that discipline-wide standards for data representation, model versioning, evaluation protocols, and auditability are needed to support scalable, reliable, and regulatory-ready deployment in clinical settings.
Study Analyzes 3,800+ GitHub Bugs in Claude Code, Codex, and Gemini CLI Tools
A new empirical study examines engineering pitfalls in AI coding command-line tools by manually analyzing more than 3,800 publicly reported GitHub bugs tied to Claude Code, Codex, and Gemini CLI. It finds that over 67% of reported issues are functionality-related, suggesting many failures show up in day-to-day use rather than edge cases. The most common root causes are API, integration, or configuration problems (37.3%), with user-reported symptoms frequently involving API errors (18.3%), terminal issues (14%), and command failures (12.7%). These bugs most often hit early workflow steps such as tool invocation (37.6%) and command execution (25%), highlighting reliability challenges in the “tool layer” that wraps large language models for real developer environments.
GMPilot AI Agent Uses RAG and ReAct to Support FDA cGMP Compliance
A new paper describes GMPilot, a domain-specific AI agent aimed at helping pharmaceutical quality teams meet FDA current Good Manufacturing Practice (cGMP) compliance requirements amid high compliance costs and slow, fragmented decision-making. The system uses a curated knowledge base of regulations and past inspection observations, combining retrieval-augmented generation (RAG) with a Reasoning-Acting (ReAct) approach to deliver real-time, traceable guidance. In a simulated FDA inspection scenario, the authors report that GMPilot improved response speed and professionalism by returning structured evidence from regulations and comparable cases. The paper also notes current limitations, including incomplete regulatory coverage and limited model interpretability, but frames the tool as a practical example of specialized AI for highly regulated industries.
March 2026 Technical Report Sets Global Cybercrime Damage Baseline for Frontier AI Risk Assessment
A March 2026 technical report titled “Global Cybercrime Damages: A Baseline for Frontier AI Risk Assessment” sets out a reference point for estimating the economic harm caused by cybercrime worldwide and explains how those figures can be used in evaluating risks from advanced AI systems. It consolidates available research and highlights wide variation in existing damage estimates, reflecting inconsistent definitions and reporting gaps across countries and sectors. The report argues that clearer, more comparable baselines are necessary to judge whether AI-driven cyber capabilities could materially increase real-world losses. It positions the baseline as a tool for policymakers and safety researchers assessing the scale of potential frontier-AI-enabled cyber impacts.
CARE Framework Targets Reproductive Equity in Human-AI Interaction, Flagging Source Opacity and Rigid Responses
A new arXiv paper describes CARE, a capability-based framework meant to evaluate whether AI tools for sexual and reproductive health (SRH) actually expand reproductive autonomy, not just access to information. It argues that common chatbot metrics like usability and accuracy miss structural factors—such as health literacy, stigma, healthcare access, and legal limits—that determine whether users can convert AI advice into real-world choices. Using Sen’s capability approach and Nussbaum’s central capabilities, the framework sets out a “design lens” focused on the freedoms SRH tools should support and an “evaluation lens” that checks how resources translate into capabilities and outcomes. When applied to SRH-focused non-LLM chatbots, general-purpose LLMs, and search features, the analysis flags two key epistemic risks: unclear sourcing and overly rigid responses, and it points to participatory auditing and policy guidance for high-stakes health settings.
PEARL Benchmarks Personalized Streaming Video Understanding, Testing Timestamped Concept and Action Recognition Across VLMs
A new research paper defines a task called Personalized Streaming Video Understanding (PSVU), aimed at helping vision-language models handle personalization in continuous, real-time video rather than only static images or offline clips. The work also details PEARL-Bench, described as the first benchmark built for this setting, testing whether models can respond to user-specific concepts at exact timestamps in both frame-level and continuous video-level scenarios. The benchmark includes 132 videos and 2,173 timestamped annotations created through automated generation followed by human verification. The paper also reports a training-free, plug-and-play method called PEARL that is presented as a strong baseline and is said to deliver state-of-the-art results across eight tested models, with consistent gains across three different architectures. Code is available on GitHub, and the paper is posted on arXiv as 2603.20422v1 (March 20, 2026).
About SoRAI: SoRAI is committed to advancing AI literacy through practical, accessible, and high-quality education. Our programs emphasize responsible AI use, equipping learners with the skills to anticipate and mitigate risks effectively. Our flagship AIGP certification courses, built on real-world experience, drive AI governance education with innovative, human-centric approaches, laying the foundation for quantifying AI governance literacy. Subscribe to our free newsletter to stay ahead of the AI Governance curve.




