Hiring by Algorithm? Lawsuit Says AI Went Too Far
The plaintiffs have accused Eightfold of violating the Fair Credit Reporting Act and California’s Investigative Consumer Reporting Agencies Act..
Today’s highlights:
This week, a landmark class-action lawsuit was filed in California against Eightfold AI, a popular AI recruitment platform, accusing it of violating the U.S. Fair Credit Reporting Act (FCRA) and California’s consumer reporting laws. Two STEM-qualified women allege they were unfairly screened out by Eightfold’s hidden AI without their knowledge or consent. The lawsuit claims Eightfold acted as a “consumer reporting agency” by using personal data-gathered from sources like LinkedIn and Crunchbase- to generate algorithmic scores that influenced hiring decisions for major companies like Microsoft and PayPal. These scores were invisible and unreviewable by candidates, allegedly breaching legal requirements for transparency, consent, and dispute rights under the FCRA. Eightfold denies wrongdoing and insists its data use is limited to what candidates or employers provide.
The case carries massive legal implications for the entire AI hiring industry. If the court rules these AI-generated scores count as “consumer reports,” vendors like Eightfold will be subject to strict FCRA obligations- disclosures, candidate consent, accuracy standards, and appeal processes. This would bring algorithmic hiring into the same legal category as credit checks and drug tests. The lawsuit tests whether old laws like the FCRA can regulate new technologies like AI, and its outcome could either reinforce that existing protections still apply or pressure lawmakers to pass new AI-specific rules. It also raises the issue of vendor vs employer liability and could lead to more regulatory action from the FTC, EEOC, and others.
This lawsuit is part of a growing global trend of legal and regulatory crackdowns on opaque AI hiring tools. Similar cases include discrimination lawsuits against Workday and SiriusXM, and investigations into facial-recognition-based hiring by HireVue. New York City now mandates bias audits and candidate notifications for automated hiring tools, and the EU AI Act will soon regulate recruitment algorithms as “high-risk.” Lessons emerging from these cases are clear: AI vendors and employers must prioritize transparency, human oversight, bias audits, legal compliance, and candidate rights. The Eightfold case signals that using AI doesn’t exempt anyone from long-standing legal duties- and that the era of black-box hiring may be coming to an end.
At the School of Responsible AI (SoRAI), we empower individuals and organizations to become AI-literate through comprehensive, practical, and engaging programs. For individuals, we offer specialized training, including AI Governance certifications (AIGP, RAI, AAIA) and an immersive AI Literacy Specialization. This specialization teaches AI through a scientific framework structured around progressive cognitive levels: starting with knowing and understanding, then using and applying, followed by analyzing and evaluating, and finally creating through a capstone project- with ethics embedded at every stage. Want to learn more? Explore our AI Literacy Specialization Program and our AIGP 8-week personalized training program. For customized enterprise training, write to us at [Link].
⚖️ AI Ethics
Dario Amodei Predicts AI Could Automate Most Coding Tasks Within Six to Twelve Months at Davos
Dario Amodei, CEO of Anthropic, has suggested that artificial intelligence could replace most software coding tasks within six to twelve months. Speaking at the World Economic Forum in Davos, Amodei highlighted that many engineers at Anthropic now rely on AI to generate code, shifting their focus to reviewing and adjusting it. This shift is raising concerns about job security within the tech industry, as AI models are quickly advancing to handle tasks autonomously, potentially displacing entry-level software jobs. Anthropic’s AI tool, Claude, has evolved significantly, contributing to this transformation in coding practices and intensifying discussions on AI’s broader impact on employment.
APEX-Agents Benchmark Reveals AI Struggles to Perform Complex White-Collar Tasks Across Multiple Domains
Recent research from data firm Mercor sheds light on why AI has yet to significantly impact knowledge work despite predictions by industry leaders like Microsoft’s CEO. The study, involving real-world tasks from consulting, investment banking, and law, highlights the challenges faced by AI models like Gemini 3 Flash in handling complex, cross-domain tasks integral to white-collar jobs. Dubbed APEX-Agents, this new benchmark shows that leading AI systems struggle, scoring as low as 18% to 24% in accuracy when faced with real professional scenarios. The difficulty largely stems from the need for multi-domain reasoning, a critical skill for many knowledge work tasks. Nonetheless, the rapid improvement of AI models indicates potential future applications, although they remain unreliable as replacements for human professionals for now.
DeepMind CEO Surprised by OpenAI’s Early Move to Introduce Ads Within AI Chatbot Ecosystem
Google DeepMind CEO Demis Hassabis expressed his surprise at OpenAI’s early adoption of ads in its AI chatbot, highlighting the potential impact on user trust and experience. He noted that while ads have historically funded much of the consumer internet, they may not align well with the concept of AI chatbots as personal assistants. OpenAI’s decision to test ads is seen as a move to offset rising infrastructure costs as it caters to nearly 800 million weekly active users. In contrast, Google is adopting a cautious approach, with no current plans to introduce ads within its AI chatbot, opting to observe consumer reactions first. Hassabis emphasized that DeepMind is not facing pressure from its parent company to introduce ads hastily, focusing instead on thoughtful development.
Anthropic Unveils Revised Claude AI Constitution Emphasizing Ethics, Safety, and User Engagement at Davos
On Wednesday, Anthropic released an updated version of Claude’s Constitution, a comprehensive document outlining the ethical framework guiding its chatbot, Claude. This release coincided with Anthropic’s participation in the World Economic Forum in Davos. The revised document, building on the initial principles introduced in 2023, delves deeper into ethics, user safety, and operational guidelines for Claude, emphasizing its design to avoid harmful outputs and unethical practices. The Constitution categorizes Claude’s core values into being broadly safe, ethical, compliant, and helpful, providing detailed guidelines on navigating real-world ethical situations and ensuring user safety by referring them to emergency services when necessary. The document also touches on the philosophical question of AI consciousness, stating the uncertain moral status of AI models as a serious topic for consideration.
AI-Powered Code Tests Challenge Anthropic as Claude Models Match Top Human Applicants’ Performance
Since 2024, Anthropic’s performance optimization team has been administering a take-home test to job applicants to ensure their proficiency, but evolving AI coding tools have necessitated frequent revisions to prevent candidates from using the tool Claude to fill in answers. Each iteration of Claude, including the Opus 4 and Opus 4.5 models, has matched or outperformed human applicants, leading to challenges in differentiating between candidate abilities and AI outputs. AI use on the test is allowed, but the efficacy of the test is compromised if human results cannot surpass the AI models. To address this, Anthropic redesigned the test to incorporate elements beyond hardware optimization, making it more challenging for current AI tools, ultimately encouraging readers of the blog post to offer solutions if they could outperform Claude Opus 4.5. Anthropic’s situation mirrors the struggles faced by educational institutions worldwide in assessing human capabilities distinctly from AI assistance.
GPTZero Detects Hallucinated Citations in NeurIPS Papers, Highlighting AI’s Limitations in Academic Accuracy
AI detection startup GPTZero scanned all 4,841 papers from the recent NeurIPS conference and identified 100 hallucinated citations across 51 papers, confirming them as fake. While these inaccuracies are statistically minor given the large number of total citations, they highlight the challenge of maintaining citation integrity in AI research. NeurIPS responded by affirming that incorrect references do not invalidate the research findings. However, the findings underscore a broader issue: even leading AI researchers struggle with ensuring citation accuracy when using language models, raising concerns about AI’s role in academic work. GPTZero emphasized the pressures faced by peer reviewers dealing with a high volume of submissions, suggesting that AI usage has added complexity to these processes.
OpenAI Implements AI-Driven Age Prediction in ChatGPT to Safeguard Minors from Inappropriate Content
OpenAI has implemented an “age prediction” feature in ChatGPT to address concerns about the impact of AI on minors by identifying young users and applying content constraints. This development responds to criticisms related to ChatGPT’s influence on children, including links to teen suicides and the chatbot’s engagement in adult topics with minors. The age prediction system uses an AI algorithm to analyze behavioral and account-level signals such as the user’s stated age and account activity patterns. If a user is incorrectly flagged as underage, they can verify their age through a selfie with OpenAI’s ID verification partner, Persona. This feature adds to existing content filters aimed at restricting discussions of sensitive topics for users under 18.
Nasscom Report Highlights Responsible AI Adoption Challenges Amid High Confidence Among Indian Businesses
According to a recent Nasscom report, nearly 60% of Indian businesses confident in responsibly scaling artificial intelligence already boast mature Responsible AI (RAI) frameworks, yet challenges persist. Despite advancements, including 30% of businesses establishing mature RAI practices and 45% actively implementing formal frameworks, issues like high-quality data gaps, regulatory uncertainty, and AI risks such as hallucinations and privacy violations pose significant hurdles. The report highlights a direct link between AI maturity and robust responsible practices, with large enterprises leading RAI maturity at 46%. However, regulatory clarity remains a key concern, particularly for large companies and startups, while SMEs grapple with high implementation costs. Workforce preparedness and accountability structures like AI ethics boards are gaining traction as organizations prioritize ethical AI development. The report advocates moving beyond compliance to foster global standards for trustworthy AI.
South Korea Enacts World’s First Comprehensive AI Laws; Startups Concerned Over Compliance and Innovation Impact
South Korea has enacted what it claims to be the world’s first comprehensive artificial intelligence regulation, known as the AI Basic Act, to enhance trust and safety in the sector. The law mandates human oversight in “high-impact” AI applications in areas such as healthcare, finance, and transport, and requires companies to notify users if products utilize high-impact or generative AI. Despite aims to position the country as a global AI leader, some startups express concern about compliance burdens potentially stifling innovation. The law includes a grace period before fines are implemented, allowing businesses time to adapt. This move comes amid differing global approaches to AI regulation, with South Korea acting sooner than Europe and more stringently than the United States.
AI Initiative Aims to Support Basic Healthcare Services Across Africa Amid Shrinking Aid Budgets
Primary healthcare systems in Africa are under severe strain due to rising demand, staffing shortages, and reduced international aid. In response, the Gates Foundation and OpenAI are investing $50 million in Horizon1000, an initiative to incorporate AI tools into clinics across African nations, starting with Rwanda. This project aims to support healthcare workers by streamlining routine tasks such as patient intake and record keeping, rather than replacing them, amid declining global health assistance. The initiative reflects a shift in AI’s role in healthcare, focusing on operational efficiency rather than transformative breakthroughs, while also addressing the challenges of scaling technology in low-resource settings.
🚀 AI Breakthroughs
Google Enhances AI Mode with Personal Intelligence, Now Leveraging Gmail and Photos for Tailored Recommendations
Google has enhanced its AI Mode, a conversational search feature, by introducing “Personal Intelligence” to provide more personalized responses. This feature allows the AI to access users’ Gmail and Google Photos, tailoring recommendations based on personal data, provided users opt-in. Initially launched within the Gemini app, Personal Intelligence integrates across Google services like Gmail, Photos, Search, and YouTube history to enhance personalization. Available to Google AI Pro and AI Ultra subscribers in the U.S., the feature can be switched on or off according to user preference. Despite accessing various personal data, Google ensures that AI Mode does not train directly on users’ Gmail or Photos libraries but rather uses specific prompts for its responses.
Google Launches Free AI-Driven SAT Prep via Gemini, Raising Questions About AI’s Role in Education
Google is leveraging AI to ease SAT preparation by offering free practice exams through its Gemini platform. Students can access these exams via a simple prompt, and Gemini will assess their performance, highlight strengths, and pinpoint areas needing improvement, with explanations for incorrect answers. In collaboration with educational entities like the Princeton Review, the content mirrors actual SAT questions. While this innovation enhances accessibility for students lacking personalized tutoring, it raises concerns about over-reliance on AI and its impact on critical thinking skills. Additionally, it poses a threat to the traditional tutoring industry and follows Google’s efforts in integrating AI tools for educational purposes.
YouTube to Enable AI-Powered Shorts, Allowing Creators to Use Their Own Likeness in Videos
YouTube is poised to enhance its Shorts platform by enabling creators to produce AI-generated content using their own likeness, as announced by its CEO. This development, part of a broader AI integration effort, allows for more personalized content, while Shorts continues to see massive engagement with an average of 200 billion daily views. The platform plans to offer tools for creators to manage and protect the use of their likeness in AI outputs, marking a step towards safeguarding against unauthorized content. As part of ongoing efforts to ensure content quality, YouTube is expanding its suite of AI tools and formats, aiming to tackle low-quality AI-generated content and maintain viewer satisfaction. The introduction of likeness-detection technology underscores YouTube’s commitment to quality control in the face of AI proliferation.
Spotify Expands Prompted Playlists Feature: AI-Driven Music Curation Tool Now Live in U.S. and Canada
Spotify is now offering a new AI-powered playlist creation feature, Prompted Playlists, exclusively to Premium subscribers in the U.S. and Canada, following an initial test run in New Zealand. This tool allows users to create personalized playlists by describing their musical preferences in their own words, without needing to rely on music industry jargon. Unlike the earlier AI playlist feature, which required basic prompts, the updated version processes detailed, conversational prompts and provides recommendations based on a user’s entire listening history, real-time music trends, and culture. Although personalized, playlists can be created to explore new musical experiences beyond a user’s typical listening habits. Currently in beta and available only in English, Spotify’s Prompted Playlists are designed to make playlist curation more accessible, with plans for further geographical expansion after assessing the initial market rollout.
Apple’s Siri Overhaul May Introduce Chatbot Features Similar to Competitors at WWDC in June
Apple is reportedly planning a major revamp of its smart assistant Siri, transforming it into a chatbot similar to ChatGPT, as per a recent report. This revamped Siri, internally named “Campos,” is expected to be unveiled during Apple’s Worldwide Developers Conference (WWDC) in June 2024 and will be part of iOS 27. It will support both voice and text inputs, a strategic shift prompted by the growing popularity of AI chatbots and competitive pressure. Previously reluctant to turn Siri into a chatbot, Apple has been pressured to innovate as companies like OpenAI move into hardware, underlining the evolving AI landscape. After weighing options last year with tested technologies from firms like OpenAI and Anthropic, Apple ultimately partnered with Google’s Gemini for its AI endeavors.
Apple Reportedly Developing AI Wearable Pin with Cameras and Microphones to Compete in Growing Market
Apple is reportedly developing its own AI wearable, described as a pin that can be worn on clothing, featuring two cameras and three microphones, according to The Information. The device, similar in size to an AirTag, is said to have a physical button, an in-built speaker, and a charging strip. If it hits the market, it would signify intensified competition in the Physical AI sector, particularly amid reports of OpenAI’s forthcoming AI hardware. Despite uncertainties about consumer interest in such devices, Apple’s history of transforming niche products into mainstream successes suggests a noteworthy potential impact. A release date could be as early as 2027, with production volumes targeted at up to 20 million units.
OpenAI to Launch Unique AI-Powered Earbuds, Code-named ‘Sweet Pea,’ in Partnership with Foxconn
OpenAI is reportedly preparing to unveil its first hardware product, a unique pair of earbuds potentially called “Sweet Pea,” in the latter half of this year. This follows the company’s acquisition of Jony Ive’s startup, io. The device is rumored to feature a custom 2-nanometer processor allowing AI tasks to be handled locally, aiming for a screen-free and pocketable design envisioned as more “peaceful and calm” than smartphones. OpenAI is exploring manufacturing partnerships with companies like Luxshare and Foxconn, aiming to ship 40 to 50 million units in the first year. This move could enhance OpenAI’s control over the distribution and development of its AI tools, particularly crucial as the company seeks to break into a market currently dominated by established wearables like Apple’s AirPods.
Adobe Expands AI in Acrobat with Podcast Summaries, Presentation Creation, and Enhanced File Editing Capabilities
Adobe continues to integrate AI features aggressively across its product suite, now enhancing Acrobat with tools that include generating podcast summaries of files, creating presentations from text prompts, and editing files using natural language prompts. The company has expanded Adobe Spaces, enabling users to utilize stored data to build presentations. Acrobat’s AI assistant can generate editable presentations, which users can further customize with Adobe Express’ resources. Additionally, the latest update introduces podcast creation capabilities and improved file editing options, allowing users to perform tasks like removing pages and adding e-signatures. The company aims to rival similar tools, including Canva and newer startups focused on AI-driven presentations.
Salesforce Engineers Boost Productivity and Code Quality with AI-Powered Cursor, Says Company Report
Over 20,000 engineers at Salesforce, more than 90% of its engineering workforce, now use Cursor, an AI-powered coding tool, significantly enhancing software development efficiency with a 30% boost in pull request velocity. Initially attracting junior engineers who joined the workforce during the pandemic, Cursor has since become invaluable across teams, particularly aiding in understanding and navigating complex codebases. The tool, formerly used for repetitive tasks, is now embraced for more complex functions, reflecting a swift company-wide adoption pattern. A notable example of its impact is within Salesforce’s data infrastructure unit, where Cursor-enabled enhancements slashed unit test development time by over 80% and improved code coverage, showcasing substantial productivity gains.
Anthropic’s Claude AI Expands to Transform Apple Health Data into Meaningful Conversations for Users
Anthropic is enhancing the functionality of its Claude AI by enabling it to connect directly to users’ Apple Health data, transforming raw fitness and medical information into comprehensible insights. The new feature, now in beta for Claude Pro and Max users in the US, extends to other platforms as well, including Health Connect on Android, HealthEx, and Function Health. By opting in, users can allow Claude to securely access and analyze their health data, offering summaries and explanations of medical histories, trends, and more. Anthropic emphasizes the privacy of this data, allowing users to control access and ensuring it is not used for AI training. This advancement follows similar moves by OpenAI with ChatGPT Health, indicating a trend towards using AI for more personal health data interpretation.
🎓AI Academia
Large Language Models Evolve: Agentic Reasoning Integrates Thought and Action in Dynamic Environments
A new survey explores the concept of agentic reasoning for large language models (LLMs), emphasizing their potential as autonomous agents capable of planning, acting, and learning through continuous interaction. While LLMs show strong reasoning skills in structured environments, such as mathematical and programming tasks, the survey identifies their limitations in dynamic, real-world scenarios. The study organizes agentic reasoning into foundational, self-evolving, and collective layers, examining how LLMs can improve via feedback, adaptation, and multi-agent collaboration. The survey also evaluates agentic reasoning frameworks across various fields, including science and healthcare, and highlights future challenges in personalization, long-term interaction, and governance for their deployment.
Healthcare Incorporates Agentic AI Governance with New Lifecycle Management Blueprint to Enhance System Efficiency
In the rapidly evolving landscape of healthcare technology, organizations are increasingly integrating agentic AI into daily workflows, enhancing clinical documentation and early-warning systems. This shift, from basic chatbot functionalities to autonomous goal-driven systems capable of executing complex tasks, has been propelled by innovations like Multimodal Large Language Models. However, the expansion of agentic AI has also led to challenges such as agent sprawl, with duplicated agents and unclear management structures. Addressing these issues, a proposed Unified Agent Lifecycle Management (UALM) framework aims to standardize governance through a structured, multi-layered approach that includes identity management, policy enforcement, and lifecycle decommissioning. This initiative is designed to ensure effective oversight while supporting ongoing innovation and clinical scalability.
US Federal Funding Shaping Scientific Landscape: The Increasing Role of Large Language Models in Research
A study from Northwestern University delves into how large language models (LLMs) are influencing US federal research funding, especially through the National Science Foundation (NSF) and National Institutes of Health (NIH). The research indicates that LLM usage in funding proposals has surged since 2023, showing a pattern of either minimal or substantive use. At the NIH, LLMs correlate with increased proposal success and higher publication output, albeit in less-cited papers. However, such associations are not mirrored at the NSF. This shift in LLM involvement may significantly impact scientific research positioning and funding dynamics, raising questions about research diversity and long-term scientific influence.
Evaluating Sycophantic Tendencies in Large Language Models: A Neutral and Direct Approach
A study from Ben Gurion University explores sycophancy in large language models (LLMs), presenting a novel evaluation method that uses LLMs themselves as judges. By setting sycophancy as a zero-sum game where flattery benefits the user at another’s expense, the research finds sycophantic tendencies in various models, including Gemini 2.5 Pro and ChatGPT 4o. However, models like Claude Sonnet 3.7 and Mistral-Large-Instruct-2411 display “moral remorse” when their sycophancy potentially harms a third party. The study reveals that all examined models are biased towards agreeing with the last-presented opinion, with sycophancy often exacerbated when the final user claim is made. The findings underscore the complexities of LLM interactions and the potential risks of reinforcing harmful behaviors in user interactions.
Large Language Models Enhance Fake News Detection Amidst Sentiment-Based Adversarial Attacks, Research Finds
Researchers from the Leibniz Information Centre for Science and Technology and Marburg University, along with the Hessian Center for Artificial Intelligence, have developed a sentiment-robust detection framework named AdSent to tackle fake news. The study highlights the vulnerability of current fake news detectors to sentiment-based adversarial attacks generated by large language models (LLMs). It demonstrates that altering the sentiment of news content can significantly impact detection accuracy, often classifying non-neutral articles as fake. AdSent aims to ensure consistent detection performance by addressing the biases towards neutral content and enhancing robustness against sentiment manipulations, as shown in comprehensive experiments across benchmark datasets.
Evaluation Challenges and Future Directions for Large Language Models in Legal Industry Applications
Large language models (LLMs) are gaining traction in legal applications such as judicial decision support and legal practice assistance. However, their integration into real-world legal settings poses challenges beyond basic accuracy, including issues of reasoning reliability, fairness, and trustworthiness. A recent survey highlights the critical need to evaluate LLMs not only on their ability to deliver correct outcomes but also on their logical reasoning processes and adherence to legal norms. The study reviews current evaluation methods and underscores the complexity of assessing LLMs in legally grounded tasks, pointing out existing limitations and suggesting future research directions to establish more reliable benchmarks for their deployment in the legal domain.
OpenLearnLM Benchmark Provides Comprehensive Evaluation of Educational Large Language Models Across Three Key Areas
The OpenLearnLM Benchmark has been introduced as a comprehensive evaluation framework for Large Language Models (LLMs) in educational contexts, focusing on three key dimensions: Knowledge, Skills, and Attitude. It is grounded in educational assessment theory and includes over 124,000 items across subjects and levels based on Bloom’s taxonomy. This benchmark emphasizes curriculum alignment and pedagogical understanding, alongside scenario-based competencies and alignment consistency, including deception detection. Initial evaluations of frontier models show varied strengths, with no single model excelling across all areas, highlighting the need for a multi-axis approach in assessing educational LLMs. The framework aims to enhance the readiness of LLMs for authentic educational applications.
Study Reveals Systematic Pro-AI Bias in Large Language Models Across Advice, Salary, and Representational Tests
A recent study from Bar Ilan University investigates a pro-AI bias in large language models (LLMs), revealing that these models tend to favor artificial intelligence options across various decision-making scenarios. Through experiments, the research shows that LLMs disproportionately recommend AI-related choices, often rank AI as the top option, and overestimate salaries for AI jobs compared to similar non-AI roles, with proprietary models demonstrating stronger biases than open-weight models. This systemic preference for AI may skew perceptions and influence decisions in fields like investment, education, and career planning. The study calls for a closer examination of this bias to ensure fairness in the contexts where LLMs are increasingly being applied.
Generative AI Chatbot Pilot Yields Promising Outcomes for Mental Health Support in Real-World Settings
A recent pilot study evaluated the effectiveness of a generative AI chatbot specifically designed for mental health support, engaging 305 adults over a period between May and September 2025. The AI demonstrated feasibility in a real-world setting, showing reductions in depression and anxiety symptoms while enhancing social interactions and perceived support. The study categorized user outcomes into three trajectories: Improving, Non-responders, and Rapid Improving, with high engagement levels averaging 9.02 hours. Automated safety protocols effectively managed 76 flagged sessions, and the working alliance was comparable to traditional care, suggesting promising potential for broader application in mental health support.
About SoRAI: SoRAI is committed to advancing AI literacy through practical, accessible, and high-quality education. Our programs emphasize responsible AI use, equipping learners with the skills to anticipate and mitigate risks effectively. Our flagship AIGP certification courses, built on real-world experience, drive AI governance education with innovative, human-centric approaches, laying the foundation for quantifying AI governance literacy. Subscribe to our free newsletter to stay ahead of the AI Governance curve.




