LLM Benchmark for Hong Kong Fire Safety Codes
Building the first expert-verified benchmark for Hong Kong Fire Safety Codes using Large Language Models. Over 500 complex regulatory questions have been tested against top AI models including GPT-4, Gemini, DeepSeek, and Qwen — examining accuracy, clarity, and hallucination rates for AEC regulatory compliance.
Preliminary Findings
- Some models excel at accuracy but fail on clarity
- Others are fast but hallucinate critical compliance details
- Human expertise remains essential — algorithms alone are insufficient
Call for Contributors
We are inviting HKIA Registered Architects and HK Registered Fire Engineers to join our verification panel. Just 1.5 hours of your time to review AI-generated answers and help establish the definitive "Golden Answers."
Get Involved