🛠️ RQ5: Governance Strategies

We synthesize a Multi-layered Governance Framework spanning the data lifecycle and model inference stages.


💻 1. Code-Level Mitigation

  • Model-level: SFT, RLHF/DPO, Reward-based optimization (execution correctness + static metrics).
  • Generation-level:
    • Pre-generation: Prompt Engineering, RAG, and Agent-based workflows.
    • In-generation: Adaptive decoding constraints and Iterative Self-reflection.
    • Post-generation: Automated AST-level repairs and sandbox execution filtering.
Taxonomy of Code Issue Mitigation Strategies

Fig. 8. Taxonomy of Code Issue Mitigation Strategies


📊 2. Data-Level Mitigation

  • Cleaning & Filtering: Execution-feedback elimination and LLM-driven semantic cleaning.
  • Data Balancing: Stratified resampling across languages and domains to mitigate bias.
  • Data Enhancement: Refactoring, adding docstrings, and standardizing low-quality code.
  • Data Augmentation: High-quality synthetic generation and integration of curated OS repos.
Taxonomy of Dataset Issue Mitigation Strategies

Fig. 9. Taxonomy of Training Data Issue Mitigation Strategies


📄 Referenced Papers

LLMs Meet Library Evolution
LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion
2024-06 View Paper ↗
Less is More
Less is More: On the Importance of Data Quality for Unit Test Generation
2025-02 View Paper ↗
Qwen
Qwen Technical Report
2023-09 View Paper ↗
Qwen2
Qwen2 Technical Report
2024-07 View Paper ↗
DataMan
DataMan: Data Manager for Pre-training Large Language Models
2025-02 View Paper ↗
Phi-4
Phi-4 Technical Report
2024-12 View Paper ↗
SStuBs
Large Language Models and Simple, Stupid Bugs
2023-03 View Paper ↗
package hallucinations
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
2024-06 View Paper ↗
Large Language Models for Code
Large Language Models for Code: Security Hardening and Adversarial Testing
2023-02 View Paper ↗
CloudAPIBench
On Mitigating Code LLM Hallucinations with API Documentation
2024-07 View Paper ↗
AutoAPIEval
A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
2024-09 View Paper ↗
Codequal Analyzer
Improving LLM-Generated Code Quality with GRPO
2025-06 View Paper ↗
REAL
Training Language Models to Generate Quality Code with Program Analysis Feedback
2025-05 View Paper ↗
CIDRe
CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement
2025-05 View Paper ↗
Infinite-Instruct
Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification
2025-05 View Paper ↗
Quality In, Quality Out
Quality In, Quality Out: Investigating Training Data's Role in AI Code Generation
2025-03 View Paper ↗
SwallowCode
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
2025-05 View Paper ↗
Refining ChatGPT-Generated Code
Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues
2023-07 View Paper ↗
Qwen3
Qwen3 Technical Report
2025-05 View Paper ↗
Qwen2.5
Qwen2.5 Technical Report
2024-12 View Paper ↗
TeleChat
Technical Report of TeleChat2, TeleChat2.5 and T1
2025-07 View Paper ↗
Kimi K2
Kimi K2: Open Agentic Intelligence
2025-07 View Paper ↗
ReCode
ReCode: Updating Code API Knowledge with Reinforcement Learning
2025-06 View Paper ↗
Seed-Coder
Seed-Coder: Let the Code Model Curate Data for Itself
2025-06 View Paper ↗
Data-efficient Fine-tuning
Data-efficient LLM Fine-tuning for Code Generation
2025-04 View Paper ↗
CRPE
CRPE: Expanding The Reasoning Capability of Large Language Model for Code Generation
2025-05 View Paper ↗
DeepSeek-Coder
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
2024-01 View Paper ↗
Code Pretraining
How Does Code Pretraining Affect Language Model Task Performance?
2024-09 View Paper ↗
StarCoder 2 and The Stack v2
StarCoder 2 and The Stack v2: The Next Generation
2024-02 View Paper ↗
CodeSmellEval
How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study
2024-12 View Paper ↗
RPG
Rethinking Repetition Problems of LLMs in Code Generation
2025-05 View Paper ↗
Repetition In Repetition Out
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
2023-10 View Paper ↗
Brevity is the soul of wit
Brevity is the soul of wit: Pruning long files for code generation
2024-07 View Paper ↗
Benchmark Builders
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
2025-04 View Paper ↗
CodeCipher
CodeCipher: Learning to Obfuscate Source Code Against LLMs
2024-10 View Paper ↗
DataComp-LM
DataComp-LM: In search of the next generation of training sets for language models
2024-06 View Paper ↗
RedStone
RedStone: Curating General, Code, Math, and QA Data for Large Language Models
2024-12 View Paper ↗
Code Llama
Code Llama: Open Foundation Models for Code
2023-08 View Paper ↗
Codex
Evaluating Large Language Models Trained on Code
2021-07 View Paper ↗
Path Planning Evaluation
Assessing LLM code generation quality through path planning tasks
2025-04 View Paper ↗
CODEJUDGE
CODEJUDGE : Evaluating Code Generation with Large Language Models
2024-01 View Paper ↗
Synthetic Data Generation
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
2025-01 View Paper ↗
Cracks in The Stack
Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets
2025-05 View Paper ↗
MG-Verilog
MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation
2024-06 View Paper ↗
Code Generation Survey
A Survey on Large Language Models for Code Generation
2024-08 View Paper ↗
DataRecipe
DataRecipe --- How to Cook the Data for CodeLLM?
2024-10 View Paper ↗
aiXcoder-7B
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing
2025-04 View Paper ↗
Imperfect Code Generation
Imperfect Code Generation: Uncovering Weaknesses in Automatic Code Generation by Large Language Models
2024-05 View Paper ↗
Inter-Dataset Code Duplication
On Inter-Dataset Code Duplication and Data Leakage in Large Language Models
2025-01 View Paper ↗
LLM-ProS
LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving
2025-05 View Paper ↗
UCD-Training
Unseen-Codebases-Domain Data Synthesis and Training Based on Code Graphs
2026-02 View Paper ↗
ShortCoder
ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code GenerationPreprint
2026-01 View Paper ↗
APIKG4SYN
Framework-Aware Code Generation with API Knowledge Graph-Constructed Data: A Study on HarmonyOS
2025-11 View Paper ↗
MultiCodeIF
A hierarchical and evolvable benchmark for fine-grained code instruction following with multi-turn feedback
2025-07 View Paper ↗
Beyond Functional Correctness
Beyond functional correctness: Investigating coding style inconsistencies in large language models
2024-06 View Paper ↗
Adadec
Adadec: Uncertainty-guided adaptive decoding for llm-based code generation
2025-06 View Paper ↗
Code Copycat Conundrum
Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation
2025-04 View Paper ↗
AllianceCoder
What to retrieve for effective retrieval-augmented code generation? an empirical study and beyond
2025-03 View Paper ↗
RustEvo^ 2
RustEvo^ 2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation
2025-03 View Paper ↗
RobGen
A Preliminary Study on the Robustness of Code Generation by Large Language Models
2025-03 View Paper ↗
Llm Hallucinations in Practical Code Generation
Llm hallucinations in practical code generation: Phenomena, mechanism, and mitigation
2024-09 View Paper ↗
COFFE
COFFE: A Code Efficiency Benchmark for Code Generation
2025-02 View Paper ↗

© 2026 SYSUSELab. Systematic Review of Quality Issues in LLMs for Code.

This site uses Just the Docs, a documentation theme for Jekyll.