🔍 RQ4: Detection Methods

Detection techniques are evolving from rigid static analysis to dynamic, model-driven, and hybrid evaluation frameworks. They form the diagnostic foundation of LLM quality governance.


💻 1. Code-Level Detection

Identifies defects in generated code using three main paradigms:

  • Dynamic Analysis: Test-based execution and runtime monitoring to assess accuracy and efficiency.
  • Static Analysis: Rule-based detection (SonarQube, Semgrep) for syntax errors and vulnerabilities.
  • Model-based Detection: “LLM-as-a-judge” techniques and ML classifiers for semantic filtering.
Taxonomy of Code Issue Detection Methods

Fig. 6. Taxonomy of Code Issue Detection Techniques


📊 2. Data-Level Detection

Targets the integrity, provenance, and representativeness of training data:

  • Dynamic Analysis: Execution-based validation and metric drift monitoring (detecting data leakage).
  • Static Analysis: Rule-based detection and provenance tracing using file hashes.
  • Model-based Detection: Semantic screening using LLMs to evaluate readability and hazards.
Taxonomy of Dataset Issue Detection Methods

Fig. 7. Taxonomy of Training Data Issue Detection Techniques


📄 Referenced Papers

LLMs Meet Library Evolution
LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-based Code Completion
2024-06 View Paper ↗
Less is More
Less is More: On the Importance of Data Quality for Unit Test Generation
2025-02 View Paper ↗
Qwen
Qwen Technical Report
2023-09 View Paper ↗
Qwen2
Qwen2 Technical Report
2024-07 View Paper ↗
DataMan
DataMan: Data Manager for Pre-training Large Language Models
2025-02 View Paper ↗
Phi-4
Phi-4 Technical Report
2024-12 View Paper ↗
Copilot Security
Is GitHub’s Copilot as Bad as Humans at Introducing Vulnerabilities in Code?
2022-04 View Paper ↗
Copilot Evaluation
An Empirical Evaluation of GitHub Copilot’s Code Suggestions
2025-01 View Paper ↗
HalluCode
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation
2024-04 View Paper ↗
CodeHalu
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
2024-05 View Paper ↗
EffiBench
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
2024-02 View Paper ↗
Mercury
Mercury: A Code Efficiency Benchmark for Code Large Language Models
2024-02 View Paper ↗
SStuBs
Large Language Models and Simple, Stupid Bugs
2023-03 View Paper ↗
package hallucinations
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
2024-06 View Paper ↗
HallTrigger
Code Hallucination
2024-07 View Paper ↗
Large Language Models for Code
Large Language Models for Code: Security Hardening and Adversarial Testing
2023-02 View Paper ↗
Purple Llama CYBERSECEVAL
Purple Llama CYBERSECEVAL: A Secure Coding Benchmark for Language Models
2023-12 View Paper ↗
Lost at C
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants
2022-08 View Paper ↗
AI Assistants Security
Do Users Write More Insecure Code with AI Assistants?
2022-11 View Paper ↗
The Counterfeit Conundrum
The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?
2024-02 View Paper ↗
Bugs in LLM-generated Code
Bugs in Large Language Models Generated Code: An Empirical Stud
2024-03 View Paper ↗
GitHub Copilot, Amazon CodeWhisperer, ChatGPT
Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT
2023-04 View Paper ↗
ChatGPT Code Quality
No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT
2023-08 View Paper ↗
CloudAPIBench
On Mitigating Code LLM Hallucinations with API Documentation
2024-07 View Paper ↗
CodeMirage
CodeMirage: Hallucinations in Code Generated by Large Language Models
2024-08 View Paper ↗
LLM-generated Code Efficiency
On Evaluating the Efficiency of Source Code Generated by LLMs
2024-04 View Paper ↗
Syntactic Robustness
Syntactic Robustness for LLM-based Code Generation
2024-04 View Paper ↗
DeSec
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization
2024-10 View Paper ↗
Bias Unveiled
Bias Unveiled: Investigating Social Bias in LLM-Generated Code
2024-11 View Paper ↗
FairCoder
FairCoder: Evaluating Social Bias of LLMs in Code Generation
2025-01 View Paper ↗
From Effectiveness to Efficiency
From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions
2024-06 View Paper ↗
ENAMEL
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
2024-06 View Paper ↗
DeVAIC
DeVAIC: A Tool for Security Assessment of AI-generated Code
2024-04 View Paper ↗
PTMs
Comparing Robustness Against Adversarial Attacks in Code Generation: LLM-Generated vs. Human-Written
2024-11 View Paper ↗
Codequal Analyzer
Improving LLM-Generated Code Quality with GRPO
2025-06 View Paper ↗
Artificial-Intelligence Generated Code Considered Harmful
Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code Generation
2024-09 View Paper ↗
Unveiling Inefficiencies in LLM-Generated Code
Unveiling Inefficiencies in LLM-Generated Code: Toward a Comprehensive Taxonomy
2025-03 View Paper ↗
Python Tests Quality
Quality Assessment of Python Tests Generated by Large Language Models
2025-06 View Paper ↗
CoQuIR
CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
2025-06 View Paper ↗
REAL
Training Language Models to Generate Quality Code with Program Analysis Feedback
2025-05 View Paper ↗
CIDRe
CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement
2025-05 View Paper ↗
Infinite-Instruct
Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification
2025-05 View Paper ↗
Quality In, Quality Out
Quality In, Quality Out: Investigating Training Data's Role in AI Code Generation
2025-03 View Paper ↗
Security and Quality in LLM-Generated Code
Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis
2025-02 View Paper ↗
SwallowCode
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
2025-05 View Paper ↗
ROSE
ROSE: Transformer-Based Refactoring Recommendation for Architectural Smells
2025-07 View Paper ↗
Refining ChatGPT-Generated Code
Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues
2023-07 View Paper ↗
Qwen3
Qwen3 Technical Report
2025-05 View Paper ↗
Qwen2.5
Qwen2.5 Technical Report
2024-12 View Paper ↗
TeleChat
Technical Report of TeleChat2, TeleChat2.5 and T1
2025-07 View Paper ↗
Kimi K2
Kimi K2: Open Agentic Intelligence
2025-07 View Paper ↗
ReCode
ReCode: Updating Code API Knowledge with Reinforcement Learning
2025-06 View Paper ↗
Seed-Coder
Seed-Coder: Let the Code Model Curate Data for Itself
2025-06 View Paper ↗
Data-efficient Fine-tuning
Data-efficient LLM Fine-tuning for Code Generation
2025-04 View Paper ↗
CRPE
CRPE: Expanding The Reasoning Capability of Large Language Model for Code Generation
2025-05 View Paper ↗
DeepSeek-Coder
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
2024-01 View Paper ↗
StarCoder 2 and The Stack v2
StarCoder 2 and The Stack v2: The Next Generation
2024-02 View Paper ↗
CodeSmellEval
How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study
2024-12 View Paper ↗
RPG
Rethinking Repetition Problems of LLMs in Code Generation
2025-05 View Paper ↗
Repetition In Repetition Out
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
2023-10 View Paper ↗
Every Sample Matters
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
2025-03 View Paper ↗
WaveCoder
WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning
2023-12 View Paper ↗
Brevity is the soul of wit
Brevity is the soul of wit: Pruning long files for code generation
2024-07 View Paper ↗
Benchmark Builders
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
2025-04 View Paper ↗
Beyond Correctness
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
2024-07 View Paper ↗
Generated Code Diversity
Is Functional Correctness Enough to Evaluate Code Language Models? Exploring Diversity of Generated Codes
2024-08 View Paper ↗
CodeMI
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach
2024-04 View Paper ↗
DataComp-LM
DataComp-LM: In search of the next generation of training sets for language models
2024-06 View Paper ↗
Codex
Evaluating Large Language Models Trained on Code
2021-07 View Paper ↗
Path Planning Evaluation
Assessing LLM code generation quality through path planning tasks
2025-04 View Paper ↗
CODEJUDGE
CODEJUDGE : Evaluating Code Generation with Large Language Models
2024-01 View Paper ↗
Datasets for Large Language Models
Datasets for Large Language Models: A Comprehensive Survey
2024-02 View Paper ↗
Synthetic Data Generation
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
2025-01 View Paper ↗
Cracks in The Stack
Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets
2025-05 View Paper ↗
Unseen Horizons
Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar
2025-04 View Paper ↗
MG-Verilog
MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation
2024-06 View Paper ↗
Code Generation Survey
A Survey on Large Language Models for Code Generation
2024-08 View Paper ↗
DataRecipe
DataRecipe --- How to Cook the Data for CodeLLM?
2024-10 View Paper ↗
Training Data Extraction
Understanding Privacy Risks of Large Language Models in Japanese Based on Training Data Extraction Attacks
2025-08 View Paper ↗
aiXcoder-7B
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing
2025-04 View Paper ↗
Imperfect Code Generation
Imperfect Code Generation: Uncovering Weaknesses in Automatic Code Generation by Large Language Models
2024-05 View Paper ↗
Inter-Dataset Code Duplication
On Inter-Dataset Code Duplication and Data Leakage in Large Language Models
2025-01 View Paper ↗
LLM-ProS
LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving
2025-05 View Paper ↗
ClassEval
Evaluating Large Language Models in Class-Level Code Generation
Uncovering Pretraining Code in LLMs
Uncovering Pretraining Code in LLMs: A Syntax-Aware Attribution Approach
2025-11 View Paper ↗
RealSec-Bench
RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories
2026-01 View Paper ↗
ShortCoder
ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code GenerationPreprint
2026-01 View Paper ↗
APIKG4SYN
Framework-Aware Code Generation with API Knowledge Graph-Constructed Data: A Study on HarmonyOS
2025-11 View Paper ↗
MultiCodeIF
A hierarchical and evolvable benchmark for fine-grained code instruction following with multi-turn feedback
2025-07 View Paper ↗
Beyond Functional Correctness
Beyond functional correctness: Investigating coding style inconsistencies in large language models
2024-06 View Paper ↗
Adadec
Adadec: Uncertainty-guided adaptive decoding for llm-based code generation
2025-06 View Paper ↗
Code Copycat Conundrum
Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation
2025-04 View Paper ↗
AllianceCoder
What to retrieve for effective retrieval-augmented code generation? an empirical study and beyond
2025-03 View Paper ↗
RustEvo^ 2
RustEvo^ 2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation
2025-03 View Paper ↗
RobGen
A Preliminary Study on the Robustness of Code Generation by Large Language Models
2025-03 View Paper ↗
Llm Hallucinations in Practical Code Generation
Llm hallucinations in practical code generation: Phenomena, mechanism, and mitigation
2024-09 View Paper ↗
COFFE
COFFE: A Code Efficiency Benchmark for Code Generation
2025-02 View Paper ↗
AATK Benchmark
Asleep at the keyboard? assessing the security of github copilot's code contributions
2021-08 View Paper ↗

© 2026 SYSUSELab. Systematic Review of Quality Issues in LLMs for Code.

This site uses Just the Docs, a documentation theme for Jekyll.