You are comparing sequence models used in NLP. The question is about how Transformers differ from RNNs and CNNs.