OpenAI has released a preprint study titled “SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?” (arXiv: 2502.12115).
SWE-Lancer is designed to evaluate AI performance on both individual coding tasks and managerial decision-making, where models must select the best solution from multiple freelancer submissions.
One of SWE-Lancer’s key strengths is that it uses end-to-end testing instead of isolated modular checks.
The benchmark includes nearly 1,500 real-world freelance tasks from Expensify, which were originally posted on Upwork. AI models were assigned these same tasks and given a virtual “budget” to earn as much as possible. Importantly, harder tasks had higher payouts.
The tasks were divided into two main categories:
IC SWE tasks vary from quick bug fixes (which take 15 minutes or less) to complex feature additions that may require several weeks.
Unlike many existing AI benchmarks that rely solely on unit tests, SWE-Lancer uses end-to-end tests built by experienced engineers. These automated browser tests simulate real-world usage scenarios and reflect typical review processes in freelance projects. Additionally, the results were reviewed three times by professional developers to confirm accuracy.
SWE Manager tasks evaluate how well AI can assess multiple freelancer proposals and select the best one.
The AI’s decisions were compared to those made by human engineering managers in real projects. Since multiple proposals can be technically correct, these tasks required deep repository knowledge and an understanding of the project’s context to identify the optimal solution.
Researchers assessed not just task completion rates but also total earnings, measuring both:
The evaluation covered two main datasets:
Claude 3.5 Sonnet
GPT-4o
o1
AI models vary in effectiveness, but all are capable of solving some real freelance tasks.
SWE-Lancer categorizes freelance tasks into three real-world case types, showing where AI excels (or struggles).
Typically involve minor UI tweaks or logic adjustments, fixable in minutes or hours.
In Application Logic (IC SWE) tasks:
For SWE Manager tasks (choosing the best proposal):
Conclusion: Simple bug fixes are the easiest AI tasks.
Involves adding new UI/UX components, improving system logic, or refining user experience.
In UI/UX tasks (IC SWE):
In Server-Side Logic tasks:
Conclusion: AI struggles with UX-heavy tasks but performs better in backend optimizations.
Includes architecture refactoring and full system upgrades.
System-Wide Quality & Reliability (IC SWE): 0% success across all models.
For managerial tasks in this category (SWE Manager):
Conclusion: AI can evaluate plans for complex projects but cannot execute them independently.
Entry-level tasks (e.g., $20–$100 bug fixes) may now be automated, reducing demand for junior developers.
Some freelancers already offer AI-assisted automation services, integrating tools like Copilot or AI-generated code pipelines.
AI is already reshaping the freelance market, but it does not eliminate human specialists.
AI complements human expertise rather than replacing it. Freelancers who adapt and learn to leverage AI-powered tools (like Copilot, DeepResearch, and AI-driven testing) will remain highly competitive.
Complex, creative, and high-level decision-making tasks still require human involvement.
Freelancers who understand how to integrate AI into their workflow will have a significant advantage in the evolving gig economy.
Upwork's new variable fee (0%–15%) is here. What changes, what stays the same, and how freelancers can navigate it smartly.
How freelancer loneliness affects mental health and what you can do to stay connected, supported, and emotionally balanced
Learn how to choose good Upwork clients, avoid risks, and build successful long-term relationships with practical tips from Etcetera Agency.
How to recognize depression, protect your mental health, and find support as a freelancer.
Etcetera’s winter recap: Upwork updates, team growth, performance highlights, and our big plans for spring 2025.
Learn how to protect your freelancer rights on Upwork, avoid disputes, and ensure fair treatment by following platform rules and documenting agreement
This article explores real-world AI case studies, benchmark data, and key insights into how AI impacts freelance work.
Upwork’s 2024 financial report reveals AI-driven revenue gains, fewer active clients, and new monetization strategies. Learn how freelancers can adapt
Optimize your Upwork profile to attract more clients! Key strategies for improving your title, skills, portfolio, and visibility to land more jobs
Freelance market growth 2025 is booming! Explore trends, top platforms, and expert strategies to thrive in the evolving gig economy.
Top freelance skills for 2025 on Upwork: AI, UX/UI, marketing, and consulting. Stay ahead in the evolving freelance market
Learn how to navigate cultural differences and build stronger relationships with global clients using Erin Meyer’s *The Culture Map*.