GitHub Copilot's Coding Agent: What Happens When You Assign an Issue to an AI and Walk Away

The .NET runtime team assigned 878 GitHub issues to Copilot's coding agent over ten months. 67.9% of the resulting pull requests got merged. That's not a marketing stat from GitHub. It's real data from one of the largest open-source repositories on the platform, with strict code review standards and a codebase complex enough to make most humans sweat.

Copilot's coding agent isn't a toy, but it's not replacing your senior engineers either. It's a specific kind of tool with a specific architecture, and how that architecture works explains both why it succeeds and where it breaks.

How Assign-and-Forget Actually Works Under the Hood

You open a GitHub issue, click the assignee dropdown, and pick Copilot. Optionally, you add guidance in the prompt field: constraints, requirements, context the issue description doesn't cover. Then you walk away.

What happens next is where it gets interesting. Copilot spins up an ephemeral virtual machine through GitHub Actions. It clones your repo, runs any setup steps you've defined in .github/workflows/copilot-setup-steps.yml, and starts analyzing the codebase using RAG powered by GitHub's code search index. It's not just reading the files you pointed it to. It's searching across the repo for related code, type definitions, test patterns, and existing conventions.

The agent breaks the issue into a checklist, opens a draft PR tagged [WIP], and starts pushing commits. Each completed subtask gets checked off in the PR body. If tests or linters fail, it reads the output, adjusts, and tries again. You can tag @copilot in a PR comment mid-flight and it'll incorporate your feedback, revise, push new commits. Closer to a junior dev working async than a code generator that dumps output and disappears.

The setup file is the part most people skip and shouldn't. That copilot-setup-steps.yml workflow runs before the agent touches any code. If your project needs specific Node versions, database seeds, or environment variables, you define them there. Without it, the agent is guessing at your environment, and guessing poorly on anything beyond a standard Node or Python setup.

878 Pull Requests Later: What the .NET Team Actually Learned

The dotnet/runtime case study is the most honest assessment of Copilot's coding agent I've found, because it comes from a team that had no incentive to spin the results. Out of 878 PRs, 535 merged. The remaining 343 were closed without merging.

Success rates varied wildly by task type. Cleanup and removal tasks (deleting dead code, removing deprecated APIs, updating annotations) hit an 84.7% merge rate. Performance optimization? Much lower. The pattern is clear: the more constrained and well-defined the task, the better the agent performs.

But here's the stat that tells the real story. 52% of merged PRs received human commits on top of the agent's work. The agent did the heavy lifting, but someone on the team still needed to tweak the result. A naming convention the agent missed. A test edge case. Often, the human fix was faster to just write than to explain to the agent in a follow-up comment.

The .NET team's conclusion was blunt: it's a force multiplier, not a replacement. They used it most effectively for tasks they understood well enough to review quickly but didn't want to spend the time writing themselves. Narrow band, but a valuable one.

Some of the closed PRs weren't even failures. The team deliberately used the agent for exploration, knowing they might throw away the result. That's a legitimate workflow, and one that doesn't show up in a raw merge rate.

Where the Agent Breaks Down (and Where Cursor and Windsurf Pull Ahead)

Copilot's coding agent runs in a sandboxed GitHub Actions VM. That's its biggest strength and its biggest constraint. The sandbox means it can't accidentally mess up your local environment. But it also means it has no awareness of your local setup, your running dev server, or the state of files you haven't committed.

Complex multi-file refactors involving ten or more files are where the wheels come off. The agent loses track of cross-file dependencies when the scope gets wide because it's working from code search results, not a full semantic index of your project.

This is exactly where Cursor's agent mode has an edge. Because Cursor controls the entire IDE, it builds a deep semantic index of your project, seeing type relationships, import chains, and call graphs that Copilot's RAG-based approach misses. Picture a refactor that touches your API routes, the service layer, three data models, and twenty test files. Cursor 2.5's async subagents can split that work: one agent handles the API layer while another updates the tests in parallel. On SWE-bench, Copilot actually scores higher (56% vs Cursor's 52%), but benchmarks test individual issue resolution. When you need to rename a core interface and update every file that imports it, Cursor's full-project awareness wins.

Windsurf's Cascade approaches the problem differently. Where Copilot works asynchronously in the cloud and Cursor works interactively in the editor, Cascade is built for speed inside a persistent session. Its Fast Context system (SWE-grep with 8 parallel tool calls per turn) means you ask "where does this function get called?" and get results in under a second, not the 5-10 seconds a typical agentic search takes. SWE-1.5, Windsurf's purpose-built coding model, runs 13x faster inference than Sonnet 4.5, so the feedback loop between "try this change" and "here's the result" feels immediate. If you're prototyping a feature and want to iterate through four different approaches in an afternoon, Cascade's speed makes that practical where Copilot's async pipeline doesn't.

These tools aren't competing for the same moment in your workflow. Copilot's agent is for tasks you want handled in the background while you do something else. Cursor and Windsurf are for tasks you want to work through interactively. The most productive teams in 2026 are stacking them.

The Bigger Shift: Agentic Code Review, Jira, and the CLI

The coding agent is the headline feature, but three other moves GitHub made in early 2026 matter just as much for how this fits into a real team's workflow.

Agentic code review, generally available as of March 5, uses tool calling to explore the repository before commenting on your PR. Instead of just looking at the diff, it reads related files, traces cross-file dependencies, and builds context about how your changes fit the broader codebase. 60 million reviews completed. One in five code reviews on GitHub now goes through Copilot. The practical difference is fewer false positives: it stops flagging a variable name as confusing when the surrounding code makes the convention obvious.

The Jira integration, in public preview since the same date, lets you assign Jira tickets directly to Copilot. The agent reads the issue description and comments, asks clarifying questions back in Jira if it needs more context, and includes the ticket number in the branch name and PR title. For teams that live in Jira but code in GitHub, this kills the manual translation step between "ticket accepted" and "PR opened."

Copilot CLI hit general availability on February 25. Autopilot mode is the standout: press Shift+Tab to cycle into it, give an instruction, and the CLI executes commands, reads output, and iterates without stopping for your approval. GitHub recommends it for "well-defined tasks like writing tests, refactoring files, or fixing CI failures," and that's accurate. Open-ended work isn't its strength. But if you're a terminal-native developer who fixes CI in a tmux pane and doesn't want to open an IDE, it fills a real gap.

All four pieces signal the same shift. Copilot isn't an autocomplete tool anymore. It's an asynchronous teammate that takes assignments, reviews your work, and operates from wherever you work: browser, IDE, or terminal.

If you're on a Copilot plan already, try this: pick a well-scoped issue with clear acceptance criteria and existing tests, assign it to the coding agent, and review what comes back. That's how the .NET team calibrated their expectations, and it's the fastest way to figure out whether this fits how you build software.

Sources