You are comparing two sequence models used in NLP, one based on recurrent state updates and one based on self-attention. Both can be used for text classification, tagging, and language modeling, but they learn and process context in different ways.
What are the basic machine learning concepts behind transformers and LSTMs?