Xu Zhihao
naiweizi
AI & ML interests
Trustworthy AI
Recent Activity
authored
a paper
about 5 hours ago
Uncovering Safety Risks of Large Language Models through Concept
Activation Vector
authored
a paper
about 5 hours ago
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training
and Deployment
authored
a paper
about 5 hours ago
Internal Value Alignment in Large Language Models through Controlled
Value Vector Activation
Organizations
None yet