You are shipping an LLM-powered feature and your current end-to-end response time is about 3 seconds. That is too slow for the product experience you want, so you need a plan to get it under 1 second without making answer quality unacceptable.
How would you reduce LLM inference latency from 3 seconds to under 1 second?