Understanding Transformers: A Comprehensive Review
DOI:
https://doi.org/10.59247/jahir.v2i2.292Keywords:
Transformer, Self Attention, Deep Learning, Visual Transformer, Positional EncodingAbstract
Transformers have been recognized as one of the most significant innovations in the development of deep learning technology, with widespread application to Natural Language Processing (NLP), Computer Vision (CV), and multimodal data analysis. The self-attention mechanism, which is at the core of this architecture, is designed to capture global relationships in sequential and spatial data in parallel, enabling more efficient and accurate processing than Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN)-based approaches. Models such as BERT, GPT, and Vision Transformer (ViT) have been used for a variety of tasks, including text classification, translation, object detection, and image segmentation. Although the advantages of this model are significant, the high computing power requirements and reliance on large datasets are major challenges. Efforts to overcome these limitations have been made through the development of lightweight variants, such as the MobileViT and Swin Transformer, which are designed to improve efficiency without sacrificing accuracy. Further research is also directed at the application of transformers for multimodal data and specific domains, such as medical image analysis. With its high flexibility and adaptability, transformers continue to be regarded as a key component in the development of more advanced and far-reaching artificial intelligence.
References
A. Ma’Arif, A. I. Cahyadi, S. Herdjunanto, and O. Wahyunggoro, “Tracking control of high order input reference using integrals state feedback and coefficient diagram method tuning,” IEEE Access, vol. 8, pp. 182731–182741, 2020, doi: 10.1109/ACCESS.2020.3029115.
Y. Yang et al., “Transformers Meet Visual Learning Understanding: A Comprehensive Review,” pp. 1–20, 2022.
H. Jiang and Q. Li, “Approximation Rate of the Transformer Architecture for Sequence Modeling,” no. NeurIPS, pp. 1–30, 2023.
T. Ergen, B. Neyshabur, and H. Mehta, “Convexifying Transformers: Improving optimization and understanding of transformer networks,” pp. 1–22, 2022.
Y. Bondarenko, M. Nagel, and T. Blankevoort, “Understanding and Overcoming the Challenges of Efficient Transformer Quantization,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 7947–7969, 2021, doi: 10.18653/v1/2021.emnlp-main.627.
L. Ma, W. Zhang, R. Sun, and T. Liu, “A compare aggregate transformer for understanding document-grounded dialogue,” Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020, pp. 1358–1367, 2020, doi: 10.18653/v1/2020.findings-emnlp.122.
D. Yunia and M. I. Ibrahim, “Memprediksi Arus Kas Dengan Laba Bersih Dan Total Pendapatan Komprehensif Lain,” Monex Journal Research Accounting Politeknik Tegal, vol. 10, no. 1, pp. 64–72, 2021, doi: 10.30591/monex.v10i1.2207.
D. Szelogowski, “Deep Learning for Protein Structure Prediction: Advancements in Structural Bioinformatics,” Bioinformatics, vol. 2023, pp. 1–8, 2023.
M. Farhan Naeem et al., “A novel method for life estimation of power transformers using fuzzy logic systems: An intelligent predictive maintenance approach,” Front Energy Res, vol. 10, no. September, pp. 1–20, 2022, doi: 10.3389/fenrg.2022.977665.
R. Anggrainingsih, G. M. Hassan, and A. Datta, Transformer-based models for combating rumours on microblogging platforms: a review, vol. 57, no. 8. Springer Netherlands, 2024. doi: 10.1007/s10462-024-10837-9.
I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The Long-Document Transformer,” 2020.
B. Yang, B. Zhang, Y. Han, B. Liu, J. Hu, and Y. Jin, “Vision transformer-based visual language understanding of the construction process,” Alexandria Engineering Journal, vol. 99, no. May, pp. 242–256, 2024, doi: 10.1016/j.aej.2024.05.015.
M. ELAffendi and K. Alrajhi, “Beyond the Transformer: A Novel Polynomial Inherent Attention (PIA) Model and Its Great Impact on Neural Machine Translation,” Comput Intell Neurosci, vol. 2022, pp. 1–14, Sep. 2022, doi: 10.1155/2022/1912750.
S. Ren and X. Li, “HResFormer: Hybrid Residual Transformer for Volumetric Medical Image Segmentation,” vol. 14, no. 8, pp. 1–10, 2024.
T. Liu et al., “The Role of Transformer Models in Advancing Blockchain Technology: A Systematic Survey,” pp. 1–37, 2024.
S. Liu, Y. Hou, Z. Xiong, Y. Fang, and L. Tong, “Study on Impact Response Characteristics of Capacitive Voltage Transformer,” J Phys Conf Ser, vol. 1486, no. 6, 2020, doi: 10.1088/1742-6596/1486/6/062017.
S. Jamil, M. Jalil Piran, and O. J. Kwon, “A Comprehensive Survey of Transformers for Computer Vision,” Drones, vol. 7, no. 5, pp. 1–27, 2023, doi: 10.3390/drones7050287.
O. Hourrane and E. H. Benlahmar, “Topic-Transformer for Document-Level Language Understanding,” Journal of Computer Science, vol. 18, no. 1, pp. 18–25, 2022, doi: 10.3844/jcssp.2022.18.25.
C. Chen et al., “Understanding the brain with attention: A survey of transformers in brain sciences,” Brain‐X, vol. 1, no. 3, 2023, doi: 10.1002/brx2.29.
C. Sanford et al., “Understanding Transformer Reasoning Capabilities via Graph Algorithms,” 2024.
P. C. Chen, H. Tsai, S. Bhojanapalli, H. W. Chung, Y. W. Chang, and C. S. Ferng, “A Simple and Effective Positional Encoding for Transformers,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 2974–2988, 2021, doi: 10.18653/v1/2021.emnlp-main.236.
K. Wu, H. Peng, M. Chen, J. Fu, and H. Chao, “Rethinking and Improving Relative Position Encoding for Vision Transformer,” Proceedings of the IEEE International Conference on Computer Vision, pp. 10013–10021, 2021, doi: 10.1109/ICCV48922.2021.00988.
K. M. Choromanski et al., “Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers,” Proc Mach Learn Res, vol. 238, pp. 2278–2286, 2024.
N. Tyagi and B. Bhushan, “Demystifying the Role of Natural Language Processing (NLP) in Smart City Applications: Background, Motivation, Recent Advances, and Future Research Directions,” Wirel Pers Commun, vol. 130, no. 2, pp. 857–908, 2023, doi: 10.1007/s11277-023-10312-8.
Zulkarnain and T. D. Putri, “Intelligent transportation systems (ITS): A systematic review using a Natural Language Processing (NLP) approach,” Heliyon, vol. 7, no. 12, p. e08615, 2021, doi: 10.1016/j.heliyon.2021.e08615.
C. Yang and C. Huang, “Natural Language Processing (NLP) in Aviation Safety: Systematic Review of Research and Outlook into the Future,” Aerospace, vol. 10, no. 7, pp. 1–20, 2023, doi: 10.3390/aerospace10070600.
H. Haidir, T. Muhamad, R. Roviati, E. Evi, and D. Deka, “Penerapan Chat GPT dalam Pembelajaran Biologi,” Jurnal Sosial Teknologi, vol. 4, no. 3, pp. 182–189, 2024, doi: 10.59188/jurnalsostech.v4i3.1064.
S. Hadi and F. A. Diantoro, “Peluang dan Ancaman: Penggunaan Chat GPT (Generative Pre-Trained Transformer) Terhadap Praktik Akuntansi,” Jurnal Ekonomi dan Bisnis Islam (JEBI), vol. 4, no. 1, pp. 13–28, 2024, doi: 10.56013/jebi.v4i1.2711.
E. W. Ambarsari et al., “Pemanfaatan AI-Language Model Tools untuk Menunjang Copywriting Skill Jurnalis Media Have Fun,” Prioritas: Jurnal Pengabdian Kepada Masyarakat, vol. 6, no. 01, pp. 20–28, 2024.
A. El-Komy, O. R. Shahin, R. M. Abd El-Aziz, and A. I. Taloba, “Integration of Computer Vision and Natural Language Processing in Multimedia Robotics Application,” Information Sciences Letters, vol. 11, no. 3, pp. 765–775, 2022, doi: 10.18576/isl/110309.
A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.11929
A. Mandal, S. Little, and S. Leavy, “Multimodal Bias: Assessing Gender Bias in Computer Vision Models with NLP Techniques,” ACM International Conference Proceeding Series, pp. 416–424, 2023, doi: 10.1145/3577190.3614156.
S. Khanna, “Identifying Privacy Vulnerabilities in Key Stages of Computer Vision, Natural Language Processing, and Voice Processing Systems,” International Journal of Business Intelligence and Big Data Analytics (IJBIBDA), vol. 4, no. 1, 2021.
Q. Pu, Z. Xi, S. Yin, Z. Zhao, and L. Zhao, “Advantages of transformer and its application for medical image segmentation: a survey,” Biomed Eng Online, vol. 23, no. 1, pp. 1–22, 2024, doi: 10.1186/s12938-024-01212-4.
K. He et al., “Transformers in medical image analysis,” Intelligent Medicine, vol. 3, no. 1, pp. 59–78, 2023, doi: 10.1016/j.imed.2022.07.002.
J. Liao, C. Li, and Z. Huang, “A Lightweight Swin Transformer-Based Pipeline for Optical Coherence Tomography Image Denoising in Skin Application,” Photonics, vol. 10, no. 4, 2023, doi: 10.3390/photonics10040468.
M. Gwak, J. Cha, H. Yoon, D. Kang, and D. An, “Lightweight Transformer Model for Mobile Application Classification,” Sensors, vol. 24, no. 2, pp. 1–14, 2024, doi: 10.3390/s24020564.
A. Sharma, “Solid State Transformer: An Overview of Application and Advantages,” Int J Res Appl Sci Eng Technol, vol. 12, no. 7, pp. 335–337, 2024, doi: 10.22214/ijraset.2024.63557.
B. Pacewska and I. Wilińska, “Usage of supplementary cementitious materials: advantages and limitations: Part I. C–S–H, C–A–S–H and other products formed in different binding mixtures,” J Therm Anal Calorim, vol. 142, no. 1, pp. 371–393, 2020, doi: 10.1007/s10973-020-09907-1.
D. Bischof et al., “Advantages, Challenges and Limitations of Audit Experiments with Constituents,” Political Studies Review, vol. 20, no. 2, pp. 192–200, 2022, doi: 10.1177/14789299211037865.
G. P. Tsafaras, P. Ntontsi, and G. Xanthou, “Advantages and Limitations of the Neonatal Immune System,” Front Pediatr, vol. 8, no. January, pp. 1–10, 2020, doi: 10.3389/fped.2020.00005.
J. Y. Hong et al., “Animal Models of Intervertebral Disc Diseases: Advantages, Limitations, and Future Directions,” Neurol Int, vol. 16, no. 6, pp. 1788–1818, 2024, doi: 10.3390/neurolint16060129.
W. Hariri, “Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing,” 2023.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Berlina Rahmadhani, Purwono Purwono, Safar Dwi Kurniawan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All articles published in the JAHIR Journal are licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. This license grants the following permissions and obligations:
1. Permitted Uses:
- Sharing – You may copy and redistribute the material in any medium or format.
- Adaptation – You may remix, transform, and build upon the material for any purpose, including commercial use.
2. Conditions of Use:
- Attribution – You must give appropriate credit to the original author(s), provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in a way that suggests the licensor endorses you or your use.
- ShareAlike – If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original (CC BY-SA 4.0).
- No Additional Restrictions – You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
3. Disclaimer:
- The JAHIR Journal and the authors are not responsible for any modifications, interpretations, or derivative works made by third parties using the published content.
- This license does not affect the ownership of copyrights, and authors retain full rights to their work.
For further details, please refer to the official Creative Commons Attribution-ShareAlike 4.0 International License.



