Business Context
At LexiSearch, a legal research platform, product managers want a lightweight internal demo that explains how large language models process long passages of text before the team integrates an LLM-backed summarization assistant. Your task is to build a small educational prototype that illustrates the foundations of LLMs and shows how transformer-based architectures represent and use context.
Data
- Volume: ~120,000 English text passages from product docs, legal memos, and help-center articles
- Text length: 50-1,500 words per document; most examples used in the demo should be truncated or chunked to 512 tokens
- Language: English only
- Labels/outputs: No manual labels required; the system should generate next-token probabilities and attention-based context visualizations
- Distribution: Mix of short FAQ-style text, medium technical documentation, and longer multi-paragraph legal prose
Success Criteria
A strong solution should clearly demonstrate tokenization, embeddings, positional encoding, self-attention, and next-token prediction using a modern transformer implementation. The prototype should produce interpretable outputs for non-ML stakeholders and run inference on a single GPU or CPU-only fallback for small examples.
Constraints
- Use open-source Python tooling only
- Keep inference under 500ms for short passages in demo mode
- Limit model size to something practical for local experimentation
- Focus on explanation and implementation clarity rather than state-of-the-art generation quality
Requirements
- Build a prototype that tokenizes text and shows how context windows are formed.
- Use a transformer-based causal language model to demonstrate next-token prediction.
- Explain how embeddings, positional information, and self-attention contribute to context handling.
- Include preprocessing, model loading/fine-tuning or probing, and evaluation code.
- Propose how you would validate that the model is using relevant context rather than only local word patterns.