Business Insights
  • Home
  • Crypto
  • Finance Expert
  • Business
  • Invest News
  • Investing
  • Trading
  • Forex
  • Videos
  • Economy
  • Tech
  • Contact

Archives

  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • August 2023
  • January 2023
  • December 2021
  • July 2021
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019

Categories

  • Business
  • Crypto
  • Economy
  • Finance Expert
  • Forex
  • Invest News
  • Investing
  • Tech
  • Trading
  • Uncategorized
  • Videos
Apply Loan
Money Visa
Advertise Us
Money Visa
  • Home
  • Crypto
  • Finance Expert
  • Business
  • Invest News
  • Investing
  • Trading
  • Forex
  • Videos
  • Economy
  • Tech
  • Contact
Benchmark in talks to lead Series A for Greptile, valuing AI-code reviewer at $180M, sources say
  • Tech

A new AI coding challenge just published its first results – and they aren’t pretty

  • July 24, 2025
  • Roubens Andy King
Total
0
Shares
0
0
0
Total
0
Shares
Share 0
Tweet 0
Pin it 0

A new AI coding challenge has revealed its first winner — and set a new bar for AI-powered software engineers. 

On Wednesday at 5pm PST, the nonprofit Laude Institute announced the first winner of the K Prize, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian prompt engineer named Eduardo Rocha de Andrade, who will receive $50,000 for the prize. But more surprising than the win was his final score: he won with correct answers to just 7.5% of the questions on the test.

“We’re glad we built a benchmark that is actually hard,” said Konwinski. “Benchmarks should be hard if they’re going to matter,” he continued, adding: “Scores would be different if the big labs had entered with their biggest models. But that’s kind of the point. K Prize runs offline with limited compute, so it favors smaller and open models. I love that. It levels the playing field.”

Konwinski has pledged $1 million to the first open-source model that can score higher than 90% on the test.

Similar to the well-known SWE-Bench system, the K Prize tests models against flagged issues from GitHub as a test of how well models can deal with real-world programming problems. But while SWE-Bench is based on a fixed set of problems that models can train against, the K Prize is designed as a “contamination-free version of SWE-Bench,” using a timed entry system to guard against any benchmark-specific training. For round one, models were due by March 12th. The K Prize organizers then built the test using only GitHub issues flagged after that date.

The 7.5% top score stands in marked contrast to SWE-Bench itself, which currently shows a 75% top score on its easier ‘Verified’ test and 34% on its harder ‘Full’ test. Konwinski still isn’t sure whether the disparity is due to contamination on SWE-Bench or just the challenge of collecting new issues from GitHub, but he expects the K Prize project to answer the question soon.

“As we get more runs of the thing, we’ll have a better sense,” he told TechCrunch, “because we expect people to adapt to the dynamics of competing on this every few months.”

Techcrunch event

San Francisco
|
October 27-29, 2025

It might seem like an odd place to fall short, given the wide range of AI coding tools already publicly available – but with benchmarks becoming too easy, many critics see projects like the K Prize as a necessary step toward solving AI’s growing evaluation problem.

“I’m quite bullish about building new tests for existing benchmarks,” says Princeton researcher Sayash Kapoor, who put forward a similar idea in a recent paper. “Without such experiments, we can’t actually tell if the issue is contamination, or even just targeting the SWE-Bench leaderboard with a human in the loop.”

For Konwinski, it’s not just a better benchmark, but an open challenge to the rest of the industry. “If you listen to the hype, it’s like we should be seeing AI doctors and AI lawyers and AI software engineers, and that’s just not true,” he says. “If we can’t even get more than 10% on a contamination free SWE-Bench, that’s the reality check for me.”

Total
0
Shares
Share 0
Tweet 0
Pin it 0
Roubens Andy King

Previous Article
Ethereum Maxi Compares Bitcoin To Outdated Landlines, Reveals Why ETH Is Better
  • Crypto

Ethereum Maxi Compares Bitcoin To Outdated Landlines, Reveals Why ETH Is Better

  • July 24, 2025
  • Roubens Andy King
Read More
Next Article
Key drug maker files Chapter 11 bankruptcy plans liquidation
  • Trading

Key drug maker files Chapter 11 bankruptcy plans liquidation

  • July 24, 2025
  • Roubens Andy King
Read More
You May Also Like
Disney Settles FTC Complaint With YouTube Over Children’s Data Collection
Read More
  • Tech

Disney Settles FTC Complaint With YouTube Over Children’s Data Collection

  • Roubens Andy King
  • September 3, 2025
This HP laptop with an astonishing 32GB of RAM is just 1
Read More
  • Tech

This HP laptop with an astonishing 32GB of RAM is just $261

  • Roubens Andy King
  • September 3, 2025
Hot deal: Samsung Galaxy S25 Edge plummets to record-low price!
Read More
  • Tech

Hot deal: Samsung Galaxy S25 Edge plummets to record-low price!

  • Roubens Andy King
  • September 3, 2025
007 First Light looks like a hit, man
Read More
  • Tech

007 First Light looks like a hit, man

  • Roubens Andy King
  • September 3, 2025
Amazon’s Tomb Raider series will star Sophie Turner as Lara Croft
Read More
  • Tech

Amazon’s Tomb Raider series will star Sophie Turner as Lara Croft

  • Roubens Andy King
  • September 3, 2025
Orchard Robotics, founded by a Thiel fellow Cornell dropout, raises M for farm vision AI 
Read More
  • Tech

Orchard Robotics, founded by a Thiel fellow Cornell dropout, raises $22M for farm vision AI 

  • Roubens Andy King
  • September 3, 2025
Meta launches an Instagram app for the iPad, 15 years after its mobile app; it is slightly different than the mobile app, opening directly to a feed of Reels (Mia Sato/The Verge)
Read More
  • Tech

Meta launches an Instagram app for the iPad, 15 years after its mobile app; it is slightly different than the mobile app, opening directly to a feed of Reels (Mia Sato/The Verge)

  • Roubens Andy King
  • September 3, 2025
Acer Swift Air 16 laptop weighs less than 1kg, with a 16-inch screen, up to 32GB memory, and up to 1TB storage
Read More
  • Tech

Acer Swift Air 16 laptop weighs less than 1kg, with a 16-inch screen, up to 32GB memory, and up to 1TB storage

  • Roubens Andy King
  • September 3, 2025

Recent Posts

  • New Business Ideas from China 2026 | How to Import from China
  • The Next Wave of AI Safety Tools in Wearables
  • Sources of business finance | Chapter 8 | Business Studies | Class 11 | Part 3
  • From ₹5000 to X Crore -The Power of SIP Investing | #investing #mutualfunds #shorts |
  • 20 Things I Always Buy at the Dollar Store to Save Money
Featured Posts
  • New Business Ideas from China 2026 | How to Import from China 1
    New Business Ideas from China 2026 | How to Import from China
    • February 28, 2026
  • The Next Wave of AI Safety Tools in Wearables 2
    The Next Wave of AI Safety Tools in Wearables
    • February 28, 2026
  • Sources of business finance | Chapter 8 | Business Studies | Class 11 | Part 3 3
    Sources of business finance | Chapter 8 | Business Studies | Class 11 | Part 3
    • February 27, 2026
  • From ₹5000 to X Crore -The Power of SIP Investing | #investing  #mutualfunds  #shorts | 4
    From ₹5000 to X Crore -The Power of SIP Investing | #investing #mutualfunds #shorts |
    • February 26, 2026
  • 20 Things I Always Buy at the Dollar Store to Save Money 5
    20 Things I Always Buy at the Dollar Store to Save Money
    • February 26, 2026
Recent Posts
  • Laziest Way To Make Money With AI (3/day+)
    Laziest Way To Make Money With AI ($373/day+)
    • February 25, 2026
  • Financial Maths Grade 10 | Simple Interest Introduction
    Financial Maths Grade 10 | Simple Interest Introduction
    • February 24, 2026
  • Federal Reserve Board – Minutes of the Board’s discount rate meetings on January 20 and 28, 2026
    Federal Reserve Board – Minutes of the Board’s discount rate meetings on January 20 and 28, 2026
    • February 24, 2026
Categories
  • Business (2,057)
  • Crypto (2,023)
  • Economy (220)
  • Finance Expert (1,687)
  • Forex (2,016)
  • Invest News (2,441)
  • Investing (2,040)
  • Tech (2,056)
  • Trading (2,024)
  • Uncategorized (2)
  • Videos (986)

Subscribe

Subscribe now to our newsletter

Money Visa
  • Privacy Policy
  • DMCA
  • Terms of Use
Money & Invest Advices

Input your search keywords and press Enter.