📚 Taxonomy of Quality Issues

We establish a unified taxonomy encompassing two core dimensions: Generated Code Quality Issues and Training Data Quality Issues.

💻 RQ1: Generated Code Quality Issues

We categorize quality issues in LLM-generated code into 9 core dimensions:

Dimension	Description	Typical Manifestations
Correctness	Functional accuracy and executability	Syntax errors, logical flaws, API misuse
Security	Resilience against malicious exploitation	Inherent design flaws, external vulnerabilities
Compliance	Adherence to legal, ethical, and safety standards	Copyright infringement, privacy leakage, malicious code
Robustness	Ability to handle abnormal inputs gracefully	Inadequate error handling, boundary condition failures
Maintainability	Ease of long-term code modification	Disorganized structure, low reusability
Understandability	Human-readability and clarity	Poor naming conventions, lack of documentation
Efficiency	Optimal system resource utilization	Suboptimal time complexity, improper memory management
Parsimony	Conciseness of generated results	Redundant logic, useless loops, extreme verbosity
Miscellaneous	Anomalies outside core dimensions	Instruction-following failures

Fig. 3. Taxonomy of Generated Code Quality Issues

📄 Referenced Papers

LLMs Meet Library Evolution

LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion

2024-06 View Paper ↗

Copilot Security

Is GitHub’s Copilot as Bad as Humans at Introducing Vulnerabilities in Code?

2022-04 View Paper ↗

Copilot Evaluation

An Empirical Evaluation of GitHub Copilot’s Code Suggestions

2025-01 View Paper ↗

HalluCode

Exploring and Evaluating Hallucinations in LLM-Powered Code Generation

2024-04 View Paper ↗

CodeHalu

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

2024-05 View Paper ↗

EffiBench

EffiBench: Benchmarking the Efficiency of Automatically Generated Code

2024-02 View Paper ↗

Mercury

Mercury: A Code Efficiency Benchmark for Code Large Language Models

2024-02 View Paper ↗

SStuBs

Large Language Models and Simple, Stupid Bugs

2023-03 View Paper ↗

package hallucinations

We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs

2024-06 View Paper ↗

HallTrigger

Code Hallucination

2024-07 View Paper ↗

Large Language Models for Code

Large Language Models for Code: Security Hardening and Adversarial Testing

2023-02 View Paper ↗

Purple Llama CYBERSECEVAL

Purple Llama CYBERSECEVAL: A Secure Coding Benchmark for Language Models

2023-12 View Paper ↗

Lost at C

Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants

2022-08 View Paper ↗

AI Assistants Security

Do Users Write More Insecure Code with AI Assistants?

2022-11 View Paper ↗

The Counterfeit Conundrum

The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?

2024-02 View Paper ↗

Bugs in LLM-generated Code

Bugs in Large Language Models Generated Code: An Empirical Stud

2024-03 View Paper ↗

GitHub Copilot, Amazon CodeWhisperer, ChatGPT

Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT

2023-04 View Paper ↗

ChatGPT Code Quality

No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT

2023-08 View Paper ↗

CloudAPIBench

On Mitigating Code LLM Hallucinations with API Documentation

2024-07 View Paper ↗

CodeMirage

CodeMirage: Hallucinations in Code Generated by Large Language Models

2024-08 View Paper ↗

LLM-generated Code Efficiency

On Evaluating the Efficiency of Source Code Generated by LLMs

2024-04 View Paper ↗

AutoAPIEval

A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models

2024-09 View Paper ↗

DeSec

Decoding Secret Memorization in Code LLMs Through Token-Level Characterization

2024-10 View Paper ↗

When Fine-Tuning LLMs Meets Data Privacy

When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair

2024-12 View Paper ↗

Bias Unveiled

Bias Unveiled: Investigating Social Bias in LLM-Generated Code

2024-11 View Paper ↗

FairCoder

FairCoder: Evaluating Social Bias of LLMs in Code Generation

2025-01 View Paper ↗

CodeIP

CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

2024-04 View Paper ↗

From Effectiveness to Efficiency

From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions

2024-06 View Paper ↗

ENAMEL

How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark

2024-06 View Paper ↗

DeVAIC

DeVAIC: A Tool for Security Assessment of AI-generated Code

2024-04 View Paper ↗

PTMs

Comparing Robustness Against Adversarial Attacks in Code Generation: LLM-Generated vs. Human-Written

2024-11 View Paper ↗

Software Librarian

Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations

2024-08 View Paper ↗

Codequal Analyzer

Improving LLM-Generated Code Quality with GRPO

2025-06 View Paper ↗

Artificial-Intelligence Generated Code Considered Harmful

Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code Generation

2024-09 View Paper ↗

Unveiling Inefficiencies in LLM-Generated Code

Unveiling Inefficiencies in LLM-Generated Code: Toward a Comprehensive Taxonomy

2025-03 View Paper ↗

Python Tests Quality

Quality Assessment of Python Tests Generated by Large Language Models

2025-06 View Paper ↗

CoQuIR

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

2025-06 View Paper ↗

REAL

Training Language Models to Generate Quality Code with Program Analysis Feedback

2025-05 View Paper ↗

CIDRe

CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement

2025-05 View Paper ↗

Infinite-Instruct

Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification

2025-05 View Paper ↗

Quality In, Quality Out

Quality In, Quality Out: Investigating Training Data's Role in AI Code Generation

2025-03 View Paper ↗

Security and Quality in LLM-Generated Code

Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis

2025-02 View Paper ↗

SwallowCode

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

2025-05 View Paper ↗

ROSE

ROSE: Transformer-Based Refactoring Recommendation for Architectural Smells

2025-07 View Paper ↗

Refining ChatGPT-Generated Code

Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues

2023-07 View Paper ↗

ReCode

ReCode: Updating Code API Knowledge with Reinforcement Learning

2025-06 View Paper ↗

Seed-Coder

Seed-Coder: Let the Code Model Curate Data for Itself

2025-06 View Paper ↗

Data-efficient Fine-tuning

Data-efficient LLM Fine-tuning for Code Generation

2025-04 View Paper ↗

CRPE

CRPE: Expanding The Reasoning Capability of Large Language Model for Code Generation

2025-05 View Paper ↗

DeepSeek-Coder

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

2024-01 View Paper ↗

CodeSmellEval

How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study

2024-12 View Paper ↗

RPG

Rethinking Repetition Problems of LLMs in Code Generation

2025-05 View Paper ↗

Repetition In Repetition Out

Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

2023-10 View Paper ↗

Beyond Correctness

Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models

2024-07 View Paper ↗

Generated Code Diversity

Is Functional Correctness Enough to Evaluate Code Language Models? Exploring Diversity of Generated Codes

2024-08 View Paper ↗

CodeMI

Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach

2024-04 View Paper ↗

CodeCipher

CodeCipher: Learning to Obfuscate Source Code Against LLMs

2024-10 View Paper ↗

Code Llama

Code Llama: Open Foundation Models for Code

2023-08 View Paper ↗

Codex

Evaluating Large Language Models Trained on Code

2021-07 View Paper ↗

Path Planning Evaluation

Assessing LLM code generation quality through path planning tasks

2025-04 View Paper ↗

CODEJUDGE

CODEJUDGE : Evaluating Code Generation with Large Language Models

2024-01 View Paper ↗

Synthetic Data Generation

Synthetic Data Generation Using Large Language Models: Advances in Text and Code

2025-01 View Paper ↗

Unseen Horizons

Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar

2025-04 View Paper ↗

Code Generation Survey

A Survey on Large Language Models for Code Generation

2024-08 View Paper ↗

DataRecipe

DataRecipe --- How to Cook the Data for CodeLLM?

2024-10 View Paper ↗

aiXcoder-7B

aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing

2025-04 View Paper ↗

Imperfect Code Generation

Imperfect Code Generation: Uncovering Weaknesses in Automatic Code Generation by Large Language Models

2024-05 View Paper ↗

ClassEval

Evaluating Large Language Models in Class-Level Code Generation

2024-6 View Paper ↗

UCD-Training

Unseen-Codebases-Domain Data Synthesis and Training Based on Code Graphs

2026-02 View Paper ↗

DRAINCODE

DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context PoisoningPreprint

2026-01 View Paper ↗

RealSec-Bench

RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories

2026-01 View Paper ↗

ShortCoder

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code GenerationPreprint

2026-01 View Paper ↗

APIKG4SYN

Framework-Aware Code Generation with API Knowledge Graph-Constructed Data: A Study on HarmonyOS

2025-11 View Paper ↗

MultiCodeIF

A hierarchical and evolvable benchmark for fine-grained code instruction following with multi-turn feedback

2025-07 View Paper ↗

Beyond Functional Correctness

Beyond functional correctness: Investigating coding style inconsistencies in large language models

2024-06 View Paper ↗

Adadec

Adadec: Uncertainty-guided adaptive decoding for llm-based code generation

2025-06 View Paper ↗

Code Copycat Conundrum

Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation

2025-04 View Paper ↗

AllianceCoder

What to retrieve for effective retrieval-augmented code generation? an empirical study and beyond

2025-03 View Paper ↗

RustEvo^ 2

RustEvo^ 2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation

2025-03 View Paper ↗

RobGen

A Preliminary Study on the Robustness of Code Generation by Large Language Models

2025-03 View Paper ↗

Llm Hallucinations in Practical Code Generation

Llm hallucinations in practical code generation: Phenomena, mechanism, and mitigation

2024-09 View Paper ↗

COFFE

COFFE: A Code Efficiency Benchmark for Code Generation

2025-02 View Paper ↗

AATK Benchmark

Asleep at the keyboard? assessing the security of github copilot's code contributions

2021-08 View Paper ↗

📊 RQ2: Training Data Quality Issues

We categorize intrinsic flaws within pre-training and fine-tuning corpora:

1. Code Attribute Quality Issues

Inherent defects within individual code samples that models explicitly learn (correctness, security, etc.).

2. Non-Code Attribute Quality Issues

Non-code textual noise and macro-level dataset flaws:

Compliance & Security: Illegal/harmful, copyright-infringing, privacy-leaking text.
Distribution Imbalance: Skewed proportions across languages, domains, or types.
Redundancy: Excessive repetition or synthetic data degradation.
Diversity: Insufficient coverage of real-world scenarios.
Contamination: Leakage of evaluation data into training sets.
Low-Value Data: Meaningless text, format noise, low-information density.

Fig. 4. Taxonomy of Training Data Quality Issues

📄 Referenced Papers

LLMs Meet Library Evolution

LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion

2024-06 View Paper ↗

Less is More

Less is More: On the Importance of Data Quality for Unit Test Generation

2025-02 View Paper ↗

DataMan

DataMan: Data Manager for Pre-training Large Language Models

2025-02 View Paper ↗

Phi-4

Phi-4 Technical Report

2024-12 View Paper ↗

SStuBs

Large Language Models and Simple, Stupid Bugs

2023-03 View Paper ↗

DeSec

Decoding Secret Memorization in Code LLMs Through Token-Level Characterization

2024-10 View Paper ↗

CIDRe

CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement

2025-05 View Paper ↗

Infinite-Instruct

Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification

2025-05 View Paper ↗

Quality In, Quality Out

Quality In, Quality Out: Investigating Training Data's Role in AI Code Generation

2025-03 View Paper ↗

SwallowCode

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

2025-05 View Paper ↗

Seed-Coder

Seed-Coder: Let the Code Model Curate Data for Itself

2025-06 View Paper ↗

CRPE

CRPE: Expanding The Reasoning Capability of Large Language Model for Code Generation

2025-05 View Paper ↗

Code Pretraining

How Does Code Pretraining Affect Language Model Task Performance?

2024-09 View Paper ↗

StarCoder 2 and The Stack v2

StarCoder 2 and The Stack v2: The Next Generation

2024-02 View Paper ↗

Repetition In Repetition Out

Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

2023-10 View Paper ↗

Every Sample Matters

Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM

2025-03 View Paper ↗

Code Data Training Stage

At Which Training Stage Does Code Data Help LLMs Reasoning?

2023-09 View Paper ↗

WaveCoder

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning

2023-12 View Paper ↗

Brevity is the soul of wit

Brevity is the soul of wit: Pruning long files for code generation

2024-07 View Paper ↗

Benchmark Builders

Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks

2025-04 View Paper ↗

CodeMI

Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach

2024-04 View Paper ↗

CodeCipher

CodeCipher: Learning to Obfuscate Source Code Against LLMs

2024-10 View Paper ↗

Code Pre-training Impact

To Code, or Not To Code? Exploring Impact of Code in Pre-training

2024-08 View Paper ↗

DataComp-LM

DataComp-LM: In search of the next generation of training sets for language models

2024-06 View Paper ↗

Logical Inference Pre-training

Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?

2024-10 View Paper ↗

Code Llama

Code Llama: Open Foundation Models for Code

2023-08 View Paper ↗

Codex

Evaluating Large Language Models Trained on Code

2021-07 View Paper ↗

Path Planning Evaluation

Assessing LLM code generation quality through path planning tasks

2025-04 View Paper ↗

Datasets for Large Language Models

Datasets for Large Language Models: A Comprehensive Survey

2024-02 View Paper ↗

Synthetic Data Generation

Synthetic Data Generation Using Large Language Models: Advances in Text and Code

2025-01 View Paper ↗

Cracks in The Stack

Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets

2025-05 View Paper ↗

Unseen Horizons

Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar

2025-04 View Paper ↗

RTL-Breaker

RTL-Breaker: Assessing the Security of LLMs Against Backdoor Attacks on HDL Code Generation

2025-03 View Paper ↗

MG-Verilog

MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation

2024-06 View Paper ↗

Code Generation Survey

A Survey on Large Language Models for Code Generation

2024-08 View Paper ↗

DataRecipe

DataRecipe --- How to Cook the Data for CodeLLM?

2024-10 View Paper ↗

Training Data Extraction

Understanding Privacy Risks of Large Language Models in Japanese Based on Training Data Extraction Attacks

2025-08 View Paper ↗

aiXcoder-7B

aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing

2025-04 View Paper ↗

Imperfect Code Generation

Imperfect Code Generation: Uncovering Weaknesses in Automatic Code Generation by Large Language Models

2024-05 View Paper ↗

LLM-ProS

LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving

2025-05 View Paper ↗

Uncovering Pretraining Code in LLMs

Uncovering Pretraining Code in LLMs: A Syntax-Aware Attribution Approach

2025-11 View Paper ↗

APIKG4SYN

Framework-Aware Code Generation with API Knowledge Graph-Constructed Data: A Study on HarmonyOS

2025-11 View Paper ↗

MultiCodeIF

A hierarchical and evolvable benchmark for fine-grained code instruction following with multi-turn feedback

2025-07 View Paper ↗

RustEvo^ 2

RustEvo^ 2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation

2025-03 View Paper ↗

AATK Benchmark

Asleep at the keyboard? assessing the security of github copilot's code contributions

2021-08 View Paper ↗