Understanding Transformers: A Comprehensive Review

Berlina Rahmadhani; Purwono Purwono; Safar Dwi Kurniawan

doi:10.59247/jahir.v2i2.292

Authors

Berlina Rahmadhani Universitas Harapan Bangsa
Purwono Purwono Universitas Harapan Bangsa
Safar Dwi Kurniawan Politeknik Harapan Bersama Tegal

DOI:

https://doi.org/10.59247/jahir.v2i2.292

Keywords:

Transformer, Self Attention, Deep Learning, Visual Transformer, Positional Encoding

Abstract

Transformers have been recognized as one of the most significant innovations in the development of deep learning technology, with widespread application to Natural Language Processing (NLP), Computer Vision (CV), and multimodal data analysis. The self-attention mechanism, which is at the core of this architecture, is designed to capture global relationships in sequential and spatial data in parallel, enabling more efficient and accurate processing than Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN)-based approaches. Models such as BERT, GPT, and Vision Transformer (ViT) have been used for a variety of tasks, including text classification, translation, object detection, and image segmentation. Although the advantages of this model are significant, the high computing power requirements and reliance on large datasets are major challenges. Efforts to overcome these limitations have been made through the development of lightweight variants, such as the MobileViT and Swin Transformer, which are designed to improve efficiency without sacrificing accuracy. Further research is also directed at the application of transformers for multimodal data and specific domains, such as medical image analysis. With its high flexibility and adaptability, transformers continue to be regarded as a key component in the development of more advanced and far-reaching artificial intelligence.

References

A. Ma’Arif, A. I. Cahyadi, S. Herdjunanto, and O. Wahyunggoro, “Tracking control of high order input reference using integrals state feedback and coefficient diagram method tuning,” IEEE Access, vol. 8, pp. 182731–182741, 2020, doi: 10.1109/ACCESS.2020.3029115.

Y. Yang et al., “Transformers Meet Visual Learning Understanding: A Comprehensive Review,” pp. 1–20, 2022.

H. Jiang and Q. Li, “Approximation Rate of the Transformer Architecture for Sequence Modeling,” no. NeurIPS, pp. 1–30, 2023.

T. Ergen, B. Neyshabur, and H. Mehta, “Convexifying Transformers: Improving optimization and understanding of transformer networks,” pp. 1–22, 2022.

Y. Bondarenko, M. Nagel, and T. Blankevoort, “Understanding and Overcoming the Challenges of Efficient Transformer Quantization,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 7947–7969, 2021, doi: 10.18653/v1/2021.emnlp-main.627.

L. Ma, W. Zhang, R. Sun, and T. Liu, “A compare aggregate transformer for understanding document-grounded dialogue,” Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020, pp. 1358–1367, 2020, doi: 10.18653/v1/2020.findings-emnlp.122.

D. Yunia and M. I. Ibrahim, “Memprediksi Arus Kas Dengan Laba Bersih Dan Total Pendapatan Komprehensif Lain,” Monex Journal Research Accounting Politeknik Tegal, vol. 10, no. 1, pp. 64–72, 2021, doi: 10.30591/monex.v10i1.2207.

D. Szelogowski, “Deep Learning for Protein Structure Prediction: Advancements in Structural Bioinformatics,” Bioinformatics, vol. 2023, pp. 1–8, 2023.

M. Farhan Naeem et al., “A novel method for life estimation of power transformers using fuzzy logic systems: An intelligent predictive maintenance approach,” Front Energy Res, vol. 10, no. September, pp. 1–20, 2022, doi: 10.3389/fenrg.2022.977665.

R. Anggrainingsih, G. M. Hassan, and A. Datta, Transformer-based models for combating rumours on microblogging platforms: a review, vol. 57, no. 8. Springer Netherlands, 2024. doi: 10.1007/s10462-024-10837-9.

I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The Long-Document Transformer,” 2020.

B. Yang, B. Zhang, Y. Han, B. Liu, J. Hu, and Y. Jin, “Vision transformer-based visual language understanding of the construction process,” Alexandria Engineering Journal, vol. 99, no. May, pp. 242–256, 2024, doi: 10.1016/j.aej.2024.05.015.

M. ELAffendi and K. Alrajhi, “Beyond the Transformer: A Novel Polynomial Inherent Attention (PIA) Model and Its Great Impact on Neural Machine Translation,” Comput Intell Neurosci, vol. 2022, pp. 1–14, Sep. 2022, doi: 10.1155/2022/1912750.

S. Ren and X. Li, “HResFormer: Hybrid Residual Transformer for Volumetric Medical Image Segmentation,” vol. 14, no. 8, pp. 1–10, 2024.

T. Liu et al., “The Role of Transformer Models in Advancing Blockchain Technology: A Systematic Survey,” pp. 1–37, 2024.

S. Liu, Y. Hou, Z. Xiong, Y. Fang, and L. Tong, “Study on Impact Response Characteristics of Capacitive Voltage Transformer,” J Phys Conf Ser, vol. 1486, no. 6, 2020, doi: 10.1088/1742-6596/1486/6/062017.

S. Jamil, M. Jalil Piran, and O. J. Kwon, “A Comprehensive Survey of Transformers for Computer Vision,” Drones, vol. 7, no. 5, pp. 1–27, 2023, doi: 10.3390/drones7050287.

O. Hourrane and E. H. Benlahmar, “Topic-Transformer for Document-Level Language Understanding,” Journal of Computer Science, vol. 18, no. 1, pp. 18–25, 2022, doi: 10.3844/jcssp.2022.18.25.

C. Chen et al., “Understanding the brain with attention: A survey of transformers in brain sciences,” Brain‐X, vol. 1, no. 3, 2023, doi: 10.1002/brx2.29.

C. Sanford et al., “Understanding Transformer Reasoning Capabilities via Graph Algorithms,” 2024.

P. C. Chen, H. Tsai, S. Bhojanapalli, H. W. Chung, Y. W. Chang, and C. S. Ferng, “A Simple and Effective Positional Encoding for Transformers,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 2974–2988, 2021, doi: 10.18653/v1/2021.emnlp-main.236.

K. Wu, H. Peng, M. Chen, J. Fu, and H. Chao, “Rethinking and Improving Relative Position Encoding for Vision Transformer,” Proceedings of the IEEE International Conference on Computer Vision, pp. 10013–10021, 2021, doi: 10.1109/ICCV48922.2021.00988.

K. M. Choromanski et al., “Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers,” Proc Mach Learn Res, vol. 238, pp. 2278–2286, 2024.

N. Tyagi and B. Bhushan, “Demystifying the Role of Natural Language Processing (NLP) in Smart City Applications: Background, Motivation, Recent Advances, and Future Research Directions,” Wirel Pers Commun, vol. 130, no. 2, pp. 857–908, 2023, doi: 10.1007/s11277-023-10312-8.

Zulkarnain and T. D. Putri, “Intelligent transportation systems (ITS): A systematic review using a Natural Language Processing (NLP) approach,” Heliyon, vol. 7, no. 12, p. e08615, 2021, doi: 10.1016/j.heliyon.2021.e08615.

C. Yang and C. Huang, “Natural Language Processing (NLP) in Aviation Safety: Systematic Review of Research and Outlook into the Future,” Aerospace, vol. 10, no. 7, pp. 1–20, 2023, doi: 10.3390/aerospace10070600.

H. Haidir, T. Muhamad, R. Roviati, E. Evi, and D. Deka, “Penerapan Chat GPT dalam Pembelajaran Biologi,” Jurnal Sosial Teknologi, vol. 4, no. 3, pp. 182–189, 2024, doi: 10.59188/jurnalsostech.v4i3.1064.

S. Hadi and F. A. Diantoro, “Peluang dan Ancaman: Penggunaan Chat GPT (Generative Pre-Trained Transformer) Terhadap Praktik Akuntansi,” Jurnal Ekonomi dan Bisnis Islam (JEBI), vol. 4, no. 1, pp. 13–28, 2024, doi: 10.56013/jebi.v4i1.2711.

E. W. Ambarsari et al., “Pemanfaatan AI-Language Model Tools untuk Menunjang Copywriting Skill Jurnalis Media Have Fun,” Prioritas: Jurnal Pengabdian Kepada Masyarakat, vol. 6, no. 01, pp. 20–28, 2024.

A. El-Komy, O. R. Shahin, R. M. Abd El-Aziz, and A. I. Taloba, “Integration of Computer Vision and Natural Language Processing in Multimedia Robotics Application,” Information Sciences Letters, vol. 11, no. 3, pp. 765–775, 2022, doi: 10.18576/isl/110309.

A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.11929

A. Mandal, S. Little, and S. Leavy, “Multimodal Bias: Assessing Gender Bias in Computer Vision Models with NLP Techniques,” ACM International Conference Proceeding Series, pp. 416–424, 2023, doi: 10.1145/3577190.3614156.

S. Khanna, “Identifying Privacy Vulnerabilities in Key Stages of Computer Vision, Natural Language Processing, and Voice Processing Systems,” International Journal of Business Intelligence and Big Data Analytics (IJBIBDA), vol. 4, no. 1, 2021.

Q. Pu, Z. Xi, S. Yin, Z. Zhao, and L. Zhao, “Advantages of transformer and its application for medical image segmentation: a survey,” Biomed Eng Online, vol. 23, no. 1, pp. 1–22, 2024, doi: 10.1186/s12938-024-01212-4.

K. He et al., “Transformers in medical image analysis,” Intelligent Medicine, vol. 3, no. 1, pp. 59–78, 2023, doi: 10.1016/j.imed.2022.07.002.

J. Liao, C. Li, and Z. Huang, “A Lightweight Swin Transformer-Based Pipeline for Optical Coherence Tomography Image Denoising in Skin Application,” Photonics, vol. 10, no. 4, 2023, doi: 10.3390/photonics10040468.

M. Gwak, J. Cha, H. Yoon, D. Kang, and D. An, “Lightweight Transformer Model for Mobile Application Classification,” Sensors, vol. 24, no. 2, pp. 1–14, 2024, doi: 10.3390/s24020564.

A. Sharma, “Solid State Transformer: An Overview of Application and Advantages,” Int J Res Appl Sci Eng Technol, vol. 12, no. 7, pp. 335–337, 2024, doi: 10.22214/ijraset.2024.63557.

B. Pacewska and I. Wilińska, “Usage of supplementary cementitious materials: advantages and limitations: Part I. C–S–H, C–A–S–H and other products formed in different binding mixtures,” J Therm Anal Calorim, vol. 142, no. 1, pp. 371–393, 2020, doi: 10.1007/s10973-020-09907-1.

D. Bischof et al., “Advantages, Challenges and Limitations of Audit Experiments with Constituents,” Political Studies Review, vol. 20, no. 2, pp. 192–200, 2022, doi: 10.1177/14789299211037865.

G. P. Tsafaras, P. Ntontsi, and G. Xanthou, “Advantages and Limitations of the Neonatal Immune System,” Front Pediatr, vol. 8, no. January, pp. 1–10, 2020, doi: 10.3389/fped.2020.00005.

J. Y. Hong et al., “Animal Models of Intervertebral Disc Diseases: Advantages, Limitations, and Future Directions,” Neurol Int, vol. 16, no. 6, pp. 1788–1818, 2024, doi: 10.3390/neurolint16060129.

W. Hariri, “Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing,” 2023.

About the Journal	Journal Policies	Author	Information
Focus and Scope Editorial Board International Reviewer Open Access Statement Sponsorships Contact Us	Publication Ethics Peer Review Policy Review Guideline Digital Archiving Advertising Policy	Author Guidelines Online Submission Author Fee / Article Publication Charge Plagiarism Policy Article Retraction	For Readers For Authors For Librarians Journal History

Understanding Transformers: A Comprehensive Review

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

1. Permitted Uses:

2. Conditions of Use:

3. Disclaimer:

Most read articles by the same author(s)

Submission

Article Template

Navigation

Tools