An experiment suggests that using AI 'impairs conceptual understanding, code reading, and debugging abilities' when it comes to mastering programming tasks

A new preprint suggests whilst the use of AI assistants can improve the productivity of workers - especially novice ones - they may also lead to the failure of their human users to develop their own skills.

We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI.

We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library.

…

Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation – particularly in safety-critical domains.

To be fair, they did find that it might be how you use AI, rather than that you use it at all, that is a determining factor of your failure to educate yourself.

They identify 3 “interaction patterns” that associated with low scores on the human’s post-treatment skill tests:

AI Delegation (n=4): Participants in this group wholly relied on AI to write code and complete the task. This group completed the task the fastest and encountered few or no errors in the process.

Progressive AI Reliance (n=4): Participants in this group started by asking 1 or 2 questions and eventually delegated all code writing to the AI assistant. This group scored poorly on the quiz largely due to not mastering any of the concepts in the second task.

Iterative AI Debugging (n=4): Participants in this group relied on AI to debug or verify their code. This group made a higher number of queries to the AI assistant, but relied on the assistant to solve problems, rather than clarifying their own understanding. As a result, they scored poorly on the quiz and were relatively slower at completing the two tasks

Versus 3 patterns associated with higher human skill test results.

Generation-Then-Comprehension (n=2): Participants in this group first generated code and then manually copied or pasted the code into their work. After their code was generated, they then asked the AI assistant follow-up questions to improve understanding. These participants were not particularly fast when using AI, but demonstrated a high level of understanding on the quiz. Importantly, this approach looks nearly the same as the AI delegation group, but additionally uses AI to check their own understanding.

Hybrid Code-Explanation (n=3): Participants in this group composed hybrid queries in which they asked for code generation along with explanations of the generated code. Reading and understanding the explanations they asked for took more time.

Conceptual Inquiry (n=7): Participants in this group only asked conceptual questions and relied on their improved understanding to complete the task. Although this group encountered many errors, they also independently resolved these errors. On average, this mode was the fastest among high-scoring patterns and second fastest overall after the AI Delegation mode.

Clearly the sample sizes are rather small and the measures may not be the be-all-and-end-all of understanding the progress of users. Hopefully folk will continue doing much more research on this topic to add to the generalisability and rigour of this kind of important result. But, even so, this feels like an(other) important warning to at least be thoughtful around our professional use of these systems.