📚 Taxonomy of Quality Issues

We establish a unified taxonomy encompassing two core dimensions: Generated Code Quality Issues and Training Data Quality Issues.


💻 RQ1: Generated Code Quality Issues

We categorize quality issues in LLM-generated code into 9 core dimensions:

Dimension Description Typical Manifestations
Correctness Functional accuracy and executability Syntax errors, logical flaws, API misuse
Security Resilience against malicious exploitation Inherent design flaws, external vulnerabilities
Compliance Adherence to legal, ethical, and safety standards Copyright infringement, privacy leakage, malicious code
Robustness Ability to handle abnormal inputs gracefully Inadequate error handling, boundary condition failures
Maintainability Ease of long-term code modification Disorganized structure, low reusability
Understandability Human-readability and clarity Poor naming conventions, lack of documentation
Efficiency Optimal system resource utilization Suboptimal time complexity, improper memory management
Parsimony Conciseness of generated results Redundant logic, useless loops, extreme verbosity
Miscellaneous Anomalies outside core dimensions Instruction-following failures


Taxonomy of Generated Code Quality Issues

Fig. 3. Taxonomy of Generated Code Quality Issues

📄 Referenced Papers

LLMs Meet Library Evolution
LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion
2024-06 View Paper ↗
Copilot Security
Is GitHub’s Copilot as Bad as Humans at Introducing Vulnerabilities in Code?
2022-04 View Paper ↗
Copilot Evaluation
An Empirical Evaluation of GitHub Copilot’s Code Suggestions
2025-01 View Paper ↗
HalluCode
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation
2024-04 View Paper ↗
CodeHalu
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
2024-05 View Paper ↗
EffiBench
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
2024-02 View Paper ↗
Mercury
Mercury: A Code Efficiency Benchmark for Code Large Language Models
2024-02 View Paper ↗
SStuBs
Large Language Models and Simple, Stupid Bugs
2023-03 View Paper ↗
package hallucinations
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
2024-06 View Paper ↗
HallTrigger
Code Hallucination
2024-07 View Paper ↗
Large Language Models for Code
Large Language Models for Code: Security Hardening and Adversarial Testing
2023-02 View Paper ↗
Purple Llama CYBERSECEVAL
Purple Llama CYBERSECEVAL: A Secure Coding Benchmark for Language Models
2023-12 View Paper ↗
Lost at C
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants
2022-08 View Paper ↗
AI Assistants Security
Do Users Write More Insecure Code with AI Assistants?
2022-11 View Paper ↗
The Counterfeit Conundrum
The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?
2024-02 View Paper ↗
Bugs in LLM-generated Code
Bugs in Large Language Models Generated Code: An Empirical Stud
2024-03 View Paper ↗
GitHub Copilot, Amazon CodeWhisperer, ChatGPT
Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT
2023-04 View Paper ↗
ChatGPT Code Quality
No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT
2023-08 View Paper ↗
CloudAPIBench
On Mitigating Code LLM Hallucinations with API Documentation
2024-07 View Paper ↗
CodeMirage
CodeMirage: Hallucinations in Code Generated by Large Language Models
2024-08 View Paper ↗
LLM-generated Code Efficiency
On Evaluating the Efficiency of Source Code Generated by LLMs
2024-04 View Paper ↗
AutoAPIEval
A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
2024-09 View Paper ↗
DeSec
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization
2024-10 View Paper ↗
When Fine-Tuning LLMs Meets Data Privacy
When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair
2024-12 View Paper ↗
Bias Unveiled
Bias Unveiled: Investigating Social Bias in LLM-Generated Code
2024-11 View Paper ↗
FairCoder
FairCoder: Evaluating Social Bias of LLMs in Code Generation
2025-01 View Paper ↗
CodeIP
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code
2024-04 View Paper ↗
From Effectiveness to Efficiency
From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions
2024-06 View Paper ↗
ENAMEL
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
2024-06 View Paper ↗
DeVAIC
DeVAIC: A Tool for Security Assessment of AI-generated Code
2024-04 View Paper ↗
PTMs
Comparing Robustness Against Adversarial Attacks in Code Generation: LLM-Generated vs. Human-Written
2024-11 View Paper ↗
Software Librarian
Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations
2024-08 View Paper ↗
Codequal Analyzer
Improving LLM-Generated Code Quality with GRPO
2025-06 View Paper ↗
Artificial-Intelligence Generated Code Considered Harmful
Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code Generation
2024-09 View Paper ↗
Unveiling Inefficiencies in LLM-Generated Code
Unveiling Inefficiencies in LLM-Generated Code: Toward a Comprehensive Taxonomy
2025-03 View Paper ↗
Python Tests Quality
Quality Assessment of Python Tests Generated by Large Language Models
2025-06 View Paper ↗
CoQuIR
CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
2025-06 View Paper ↗
REAL
Training Language Models to Generate Quality Code with Program Analysis Feedback
2025-05 View Paper ↗
CIDRe
CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement
2025-05 View Paper ↗
Infinite-Instruct
Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification
2025-05 View Paper ↗
Quality In, Quality Out
Quality In, Quality Out: Investigating Training Data's Role in AI Code Generation
2025-03 View Paper ↗
Security and Quality in LLM-Generated Code
Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis
2025-02 View Paper ↗
SwallowCode
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
2025-05 View Paper ↗
ROSE
ROSE: Transformer-Based Refactoring Recommendation for Architectural Smells
2025-07 View Paper ↗
Refining ChatGPT-Generated Code
Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues
2023-07 View Paper ↗
ReCode
ReCode: Updating Code API Knowledge with Reinforcement Learning
2025-06 View Paper ↗
Seed-Coder
Seed-Coder: Let the Code Model Curate Data for Itself
2025-06 View Paper ↗
Data-efficient Fine-tuning
Data-efficient LLM Fine-tuning for Code Generation
2025-04 View Paper ↗
CRPE
CRPE: Expanding The Reasoning Capability of Large Language Model for Code Generation
2025-05 View Paper ↗
DeepSeek-Coder
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
2024-01 View Paper ↗
CodeSmellEval
How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study
2024-12 View Paper ↗
RPG
Rethinking Repetition Problems of LLMs in Code Generation
2025-05 View Paper ↗
Repetition In Repetition Out
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
2023-10 View Paper ↗
Beyond Correctness
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
2024-07 View Paper ↗
Generated Code Diversity
Is Functional Correctness Enough to Evaluate Code Language Models? Exploring Diversity of Generated Codes
2024-08 View Paper ↗
CodeMI
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach
2024-04 View Paper ↗
CodeCipher
CodeCipher: Learning to Obfuscate Source Code Against LLMs
2024-10 View Paper ↗
Code Llama
Code Llama: Open Foundation Models for Code
2023-08 View Paper ↗
Codex
Evaluating Large Language Models Trained on Code
2021-07 View Paper ↗
Path Planning Evaluation
Assessing LLM code generation quality through path planning tasks
2025-04 View Paper ↗
CODEJUDGE
CODEJUDGE : Evaluating Code Generation with Large Language Models
2024-01 View Paper ↗
Synthetic Data Generation
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
2025-01 View Paper ↗
Unseen Horizons
Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar
2025-04 View Paper ↗
Code Generation Survey
A Survey on Large Language Models for Code Generation
2024-08 View Paper ↗
DataRecipe
DataRecipe --- How to Cook the Data for CodeLLM?
2024-10 View Paper ↗
aiXcoder-7B
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing
2025-04 View Paper ↗
Imperfect Code Generation
Imperfect Code Generation: Uncovering Weaknesses in Automatic Code Generation by Large Language Models
2024-05 View Paper ↗
ClassEval
Evaluating Large Language Models in Class-Level Code Generation
UCD-Training
Unseen-Codebases-Domain Data Synthesis and Training Based on Code Graphs
2026-02 View Paper ↗
DRAINCODE
DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context PoisoningPreprint
2026-01 View Paper ↗
RealSec-Bench
RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories
2026-01 View Paper ↗
ShortCoder
ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code GenerationPreprint
2026-01 View Paper ↗
APIKG4SYN
Framework-Aware Code Generation with API Knowledge Graph-Constructed Data: A Study on HarmonyOS
2025-11 View Paper ↗
MultiCodeIF
A hierarchical and evolvable benchmark for fine-grained code instruction following with multi-turn feedback
2025-07 View Paper ↗
Beyond Functional Correctness
Beyond functional correctness: Investigating coding style inconsistencies in large language models
2024-06 View Paper ↗
Adadec
Adadec: Uncertainty-guided adaptive decoding for llm-based code generation
2025-06 View Paper ↗
Code Copycat Conundrum
Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation
2025-04 View Paper ↗
AllianceCoder
What to retrieve for effective retrieval-augmented code generation? an empirical study and beyond
2025-03 View Paper ↗
RustEvo^ 2
RustEvo^ 2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation
2025-03 View Paper ↗
RobGen
A Preliminary Study on the Robustness of Code Generation by Large Language Models
2025-03 View Paper ↗
Llm Hallucinations in Practical Code Generation
Llm hallucinations in practical code generation: Phenomena, mechanism, and mitigation
2024-09 View Paper ↗
COFFE
COFFE: A Code Efficiency Benchmark for Code Generation
2025-02 View Paper ↗
AATK Benchmark
Asleep at the keyboard? assessing the security of github copilot's code contributions
2021-08 View Paper ↗

📊 RQ2: Training Data Quality Issues

We categorize intrinsic flaws within pre-training and fine-tuning corpora:

1. Code Attribute Quality Issues

Inherent defects within individual code samples that models explicitly learn (correctness, security, etc.).

2. Non-Code Attribute Quality Issues

Non-code textual noise and macro-level dataset flaws:

  • Compliance & Security: Illegal/harmful, copyright-infringing, privacy-leaking text.
  • Distribution Imbalance: Skewed proportions across languages, domains, or types.
  • Redundancy: Excessive repetition or synthetic data degradation.
  • Diversity: Insufficient coverage of real-world scenarios.
  • Contamination: Leakage of evaluation data into training sets.
  • Low-Value Data: Meaningless text, format noise, low-information density.


Taxonomy of Dataset Quality Issues

Fig. 4. Taxonomy of Training Data Quality Issues

📄 Referenced Papers

LLMs Meet Library Evolution
LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion
2024-06 View Paper ↗
Less is More
Less is More: On the Importance of Data Quality for Unit Test Generation
2025-02 View Paper ↗
DataMan
DataMan: Data Manager for Pre-training Large Language Models
2025-02 View Paper ↗
Phi-4
Phi-4 Technical Report
2024-12 View Paper ↗
SStuBs
Large Language Models and Simple, Stupid Bugs
2023-03 View Paper ↗
DeSec
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization
2024-10 View Paper ↗
CIDRe
CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement
2025-05 View Paper ↗
Infinite-Instruct
Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification
2025-05 View Paper ↗
Quality In, Quality Out
Quality In, Quality Out: Investigating Training Data's Role in AI Code Generation
2025-03 View Paper ↗
SwallowCode
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
2025-05 View Paper ↗
Seed-Coder
Seed-Coder: Let the Code Model Curate Data for Itself
2025-06 View Paper ↗
CRPE
CRPE: Expanding The Reasoning Capability of Large Language Model for Code Generation
2025-05 View Paper ↗
Code Pretraining
How Does Code Pretraining Affect Language Model Task Performance?
2024-09 View Paper ↗
StarCoder 2 and The Stack v2
StarCoder 2 and The Stack v2: The Next Generation
2024-02 View Paper ↗
Repetition In Repetition Out
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
2023-10 View Paper ↗
Every Sample Matters
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
2025-03 View Paper ↗
Code Data Training Stage
At Which Training Stage Does Code Data Help LLMs Reasoning?
2023-09 View Paper ↗
WaveCoder
WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning
2023-12 View Paper ↗
Brevity is the soul of wit
Brevity is the soul of wit: Pruning long files for code generation
2024-07 View Paper ↗
Benchmark Builders
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
2025-04 View Paper ↗
CodeMI
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach
2024-04 View Paper ↗
CodeCipher
CodeCipher: Learning to Obfuscate Source Code Against LLMs
2024-10 View Paper ↗
Code Pre-training Impact
To Code, or Not To Code? Exploring Impact of Code in Pre-training
2024-08 View Paper ↗
DataComp-LM
DataComp-LM: In search of the next generation of training sets for language models
2024-06 View Paper ↗
Logical Inference Pre-training
Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?
2024-10 View Paper ↗
Code Llama
Code Llama: Open Foundation Models for Code
2023-08 View Paper ↗
Codex
Evaluating Large Language Models Trained on Code
2021-07 View Paper ↗
Path Planning Evaluation
Assessing LLM code generation quality through path planning tasks
2025-04 View Paper ↗
Datasets for Large Language Models
Datasets for Large Language Models: A Comprehensive Survey
2024-02 View Paper ↗
Synthetic Data Generation
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
2025-01 View Paper ↗
Cracks in The Stack
Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets
2025-05 View Paper ↗
Unseen Horizons
Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar
2025-04 View Paper ↗
RTL-Breaker
RTL-Breaker: Assessing the Security of LLMs Against Backdoor Attacks on HDL Code Generation
2025-03 View Paper ↗
MG-Verilog
MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation
2024-06 View Paper ↗
Code Generation Survey
A Survey on Large Language Models for Code Generation
2024-08 View Paper ↗
DataRecipe
DataRecipe --- How to Cook the Data for CodeLLM?
2024-10 View Paper ↗
Training Data Extraction
Understanding Privacy Risks of Large Language Models in Japanese Based on Training Data Extraction Attacks
2025-08 View Paper ↗
aiXcoder-7B
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing
2025-04 View Paper ↗
Imperfect Code Generation
Imperfect Code Generation: Uncovering Weaknesses in Automatic Code Generation by Large Language Models
2024-05 View Paper ↗
LLM-ProS
LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving
2025-05 View Paper ↗
Uncovering Pretraining Code in LLMs
Uncovering Pretraining Code in LLMs: A Syntax-Aware Attribution Approach
2025-11 View Paper ↗
APIKG4SYN
Framework-Aware Code Generation with API Knowledge Graph-Constructed Data: A Study on HarmonyOS
2025-11 View Paper ↗
MultiCodeIF
A hierarchical and evolvable benchmark for fine-grained code instruction following with multi-turn feedback
2025-07 View Paper ↗
RustEvo^ 2
RustEvo^ 2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation
2025-03 View Paper ↗
AATK Benchmark
Asleep at the keyboard? assessing the security of github copilot's code contributions
2021-08 View Paper ↗

© 2026 SYSUSELab. Systematic Review of Quality Issues in LLMs for Code.

This site uses Just the Docs, a documentation theme for Jekyll.