You are adapting a large language model to perform well on a client-specific NLP task where generic prompting is not reliable enough. You need a practical fine-tuning approach that covers data preparation, training setup, and how you would judge whether the tuned model is actually better for the target use case.
How would you fine-tune a large language model for a client-specific task?