How AI is Transforming

Freelancing: Facts

and Real-World Cases

For years, Expensify has been outsourcing a significant portion of its frontend and backend tasks via Upwork, offering monetary rewards to anyone who successfully completes an assignment. Freelancers are given access to the code when needed—via a fully open repository. These real-world tasks have now become the foundation of the SWE-Lancer benchmark.

The company posts specific tasks (ranging from minor UI fixes to significant mobile app upgrades) along with a set payout. Budgets vary from $20 for simple bug fixes to over $30,000 for complex projects.

The total publicly available project value exceeds $1 million, with at least $500,000 worth of tasks openly published.

SWE-Lancer: A New Benchmark for AI in Freelancing

OpenAI has released a preprint study titled “SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?” (arXiv: 2502.12115).

SWE-Lancer is designed to evaluate AI performance on both individual coding tasks and managerial decision-making, where models must select the best solution from multiple freelancer submissions.

One of SWE-Lancer’s key strengths is that it uses end-to-end testing instead of isolated modular checks.

The benchmark includes nearly 1,500 real-world freelance tasks from Expensify, which were originally posted on Upwork. AI models were assigned these same tasks and given a virtual “budget” to earn as much as possible. Importantly, harder tasks had higher payouts.

Task Categories

The tasks were divided into two main categories:

1. Individual Engineering Tasks (IC SWE tasks)

IC SWE tasks vary from quick bug fixes (which take 15 minutes or less) to complex feature additions that may require several weeks.

Unlike many existing AI benchmarks that rely solely on unit tests, SWE-Lancer uses end-to-end tests built by experienced engineers. These automated browser tests simulate real-world usage scenarios and reflect typical review processes in freelance projects. Additionally, the results were reviewed three times by professional developers to confirm accuracy.

2. Managerial Decision-Making Tasks (SWE Manager tasks)

SWE Manager tasks evaluate how well AI can assess multiple freelancer proposals and select the best one.

The AI’s decisions were compared to those made by human engineering managers in real projects. Since multiple proposals can be technically correct, these tasks required deep repository knowledge and an understanding of the project’s context to identify the optimal solution.

Open Data & AI Benchmarks

Researchers assessed not just task completion rates but also total earnings, measuring both:

  1. Effectiveness (how often the model successfully completed a task on the first attempt).
  2. Economic impact (how much money the model could “earn” from the full set of tasks).

The evaluation covered two main datasets:

  • Diamond Set – Valued at approximately $236,000
  • Full Task Set – Exceeding $1 million

AI Model Performance on SWE-Lancer

Claude 3.5 Sonnet

  • Best overall performer: Earned $58,000 from $236,000 in the Diamond set and $403,000 from $1 million in the full set.
  • Solved 26.2% of IC SWE tasks (Diamond) and 47.0% of SWE Manager tasks (Full).

GPT-4o

  • Earned $303,500 from the full task set—less than o1 and Claude 3.5 Sonnet.
  • Had a low success rate of 8.0% for IC SWE (Diamond) but performed slightly better in managerial tasks (38.7%).

o1

  • Earned $380,000 from the full task set, outperforming GPT-4o.
  • Task completion rates: 16.5% (IC SWE, Diamond) and 46.3% (SWE Manager, Full)—a middle ground between GPT-4o and Claude 3.5 Sonnet.

AI models vary in effectiveness, but all are capable of solving some real freelance tasks.

  • Claude 3.5 Sonnet performs best, particularly in managerial decision-making (47% success rate).
  • GPT-4o struggles the most in IC SWE tasks (only 8%) but compensates with slightly better performance in management-related tasks.
  • o1 is a balanced performer, outperforming GPT-4o but still trailing Claude 3.5 Sonnet in most metrics.

Real-World AI Use Cases in Freelancing

SWE-Lancer categorizes freelance tasks into three real-world case types, showing where AI excels (or struggles).

Small Bug Fixes

  • Typically involve minor UI tweaks or logic adjustments, fixable in minutes or hours.

  • In Application Logic (IC SWE) tasks:

    • GPT-4o: 8% success
    • o1: 15.9% success
    • Claude 3.5 Sonnet: 23.9% success
  • For SWE Manager tasks (choosing the best proposal):

    • GPT-4o: 36.3%
    • o1: 42.3%
    • Sonnet: 45.8%

Conclusion: Simple bug fixes are the easiest AI tasks.

Mid-Level Feature Development

  • Involves adding new UI/UX components, improving system logic, or refining user experience.

  • In UI/UX tasks (IC SWE):

    • GPT-4o: 2.4% success
    • o1: 17.1% success
    • Sonnet: 31.7% success
  • In Server-Side Logic tasks:

    • GPT-4o & o1: 23.5%
    • Sonnet: 41.2%

Conclusion: AI struggles with UX-heavy tasks but performs better in backend optimizations.

Large-Scale Projects (System-Wide Changes)

  • Includes architecture refactoring and full system upgrades.

  • System-Wide Quality & Reliability (IC SWE): 0% success across all models.

  • For managerial tasks in this category (SWE Manager):

    • GPT-4o & Sonnet: 100% (small dataset)
    • o1: 50%

Conclusion: AI can evaluate plans for complex projects but cannot execute them independently.

What This Means for Freelancers on Upwork

1. Increased Competition—but Not Entirely

Entry-level tasks (e.g., $20–$100 bug fixes) may now be automated, reducing demand for junior developers.

2. A Growing Market for “AI-Powered Freelancing”

Some freelancers already offer AI-assisted automation services, integrating tools like Copilot or AI-generated code pipelines.

Final Takeaways

AI is already reshaping the freelance market, but it does not eliminate human specialists.

AI complements human expertise rather than replacing it. Freelancers who adapt and learn to leverage AI-powered tools (like Copilot, DeepResearch, and AI-driven testing) will remain highly competitive.

Complex, creative, and high-level decision-making tasks still require human involvement.

Freelancers who understand how to integrate AI into their workflow will have a significant advantage in the evolving gig economy.

More Articles

Upwork Fee Changes 2025: What Freelancers Need to Know
15-04-2025

Upwork's new variable fee (0%–15%) is here. What changes, what stays the same, and how freelancers can navigate it smartly.

Freelancer Loneliness: How Social Isolation Impacts Mental Health
30-03-2025

How freelancer loneliness affects mental health and what you can do to stay connected, supported, and emotionally balanced

How to Choose the Right Clients on Upwork – Practical Tips
28-03-2025

Learn how to choose good Upwork clients, avoid risks, and build successful long-term relationships with practical tips from Etcetera Agency.

Depression and Freelancing: When to Seek Help
24-03-2025

How to recognize depression, protect your mental health, and find support as a freelancer.

Etcetera Winter in Numbers
23-03-2025

Etcetera’s winter recap: Upwork updates, team growth, performance highlights, and our big plans for spring 2025.

Freelancer Rights on Upwork: How to Protect Yourself
19-03-2025

Learn how to protect your freelancer rights on Upwork, avoid disputes, and ensure fair treatment by following platform rules and documenting agreement

How AI is Transforming Freelancing: Facts and Real-World Cases
28-02-2025

This article explores real-world AI case studies, benchmark data, and key insights into how AI impacts freelance work.

Upwork 2024 Financial Results: AI, Monetization & Freelancer Impact
28-02-2025

Upwork’s 2024 financial report reveals AI-driven revenue gains, fewer active clients, and new monetization strategies. Learn how freelancers can adapt

Upwork Profile Optimization: How to Land More Jobs
27-02-2025

Optimize your Upwork profile to attract more clients! Key strategies for improving your title, skills, portfolio, and visibility to land more jobs

Freelance Market Growth 2025: Key Trends & Opportunities
08-02-2025

Freelance market growth 2025 is booming! Explore trends, top platforms, and expert strategies to thrive in the evolving gig economy.

Upwork Trends 2025: What Skills Will Define Success?
02-02-2025

Top freelance skills for 2025 on Upwork: AI, UX/UI, marketing, and consulting. Stay ahead in the evolving freelance market

Cultural Differences: Working with Clients Worldwide
29-01-2025

Learn how to navigate cultural differences and build stronger relationships with global clients using Erin Meyer’s *The Culture Map*.