
You are building retrieval for technical documentation that mixes prose, API references, and code examples. The main issue is that naive chunking splits code blocks across boundaries, which hurts retrieval and makes answers less reliable.
How would you design a chunking strategy for this documentation so code blocks stay intact and the retrieved context remains useful?
You are building retrieval for technical documentation that mixes prose, API references, and code examples. The main issue is that naive chunking splits code blocks across boundaries, which hurts retrieval and makes answers less reliable.
How would you design a chunking strategy for this documentation so code blocks stay intact and the retrieved context remains useful?