Journal of Fuzzy Systems and Control, Vol. 4, No 2, 2026 |
Development of an AI and Webserver-integrated Smart Automated Storage and Retrieval System
Quang-Thien Nguyen 1, Thien-Bao Truong 2, Tan-Huy Tran 3,*, Tan-Loc Nguyen 4, Ngoc-Son Vo 5, Nguyen-Khang Bui 6, Van-Dong-Hai Nguyen 7, Thanh-An Cao 8, Thi-Ngoc-Thao Nguyen 9, Thi-Hong-Lam Le 10
1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Ho Chi Minh City University of Technology and Engineering (HCM-UTE), Ho Chi Minh City (HCMC), Vietnam
Email: 1 23161196@student.hcmute.edu.vn, 2 21146428@student.hcmute.edu.vn, 3 21146469@student.hcmute.edu.vn,
4 21146125@student.hcmute.edu.vn, 5 24151256@student.hcmute.edu.vn, 6 24151170@student.hcmute.edu.vn,
7 hainvd@hcmute.edu.vn, 8 22161044@student.hcmute.edu.vn, 9 thaontn@hcmute.edu.vn, 10 lamlth@hcmute.edu.vn
*Corresponding Author
Abstract—In recent years, Automated Storage and Retrieval Systems (AS/RS) and their development have been a notable trend of modern warehouse management by automating the sequential and precise processes of storing, sorting, and retrieving goods. Driven by the convergence of mechatronic systems, Industrial Internet of Things (IIoT), Artificial Intelligence (AI), cloud storage, and edge-based management systems, the potential and practical benefits of AS/RS can be significantly amplified when effectively combined with these trends. In this field, although some works are presented, they often lack specialization for the Vietnamese industrial environment and sustainability. Therefore, this research presents the development of an intelligent AS/RS, incorporating AI-based label processing and webserver-based control to enhance warehouse management efficiency. Experimental evaluations demonstrate that the system achieves high reliability in product classification and storage tasks, providing a scalable solution for modern smart logistics with real-time data synchronization capabilities via a Node-RED web server.
Keywords—Automated Storage and Retrieval System; YOLOv8; Vintern-1B; PLC S7-1200; IoT; Webserver
In the context of the e-commerce boom and the rise of complex supply chains, requirements for response times have exposed the inherent limitations of manual warehousing systems [1]. These constraints include inefficiencies in physical space allocation, high error rates due to human intervention, increasing labor costs, and poor adaptability to market conditions. Consequently, there is a transition from traditional warehouses to modern urban storage systems to minimize transportation costs and further address traffic and environmental issues [2]. Within this landscape, Automated Storage and Retrieval Systems (AS/RS) are redefining the entire structure of modern warehouse management [3]. By leveraging technologies such as robotics and control software, AS/RS optimizes storage space by utilizing the full warehouse height, eliminating unnecessary gaps between racking units, and automating the sequential and precise storage, sorting, and retrieval of goods [4]. In an AS/RS, the retrieval process is computer-managed, including decisions on storage locations, movement from input to output, retrieval sequencing, order execution, and inventory status updates, allowing it to operate independently or integrate into workstations [5].
Furthermore, aligned with the convergence of Artificial Intelligence and the Industrial Internet of Things (IIoT), AS/RS design trends are shifting to meet more complex corporate demands. Notably, applications of computer vision, especially CNNs and webservers, are emerging to overcome AS/RS drawbacks in harsh environments, improving operational efficiency, management, and system utilization. Prominent studies such as [5]-[10] have confirmed the vital role of integrating AI and IIoT into AS/RS. These works investigate the use of semantic segmentation networks like YOLOv8 for identifying and counting cargo boxes [8], monitoring systems via webservers [6], or cloud-based programmable control to enhance system flexibility [7]. Additionally, research aimed at improving industrial AI performance, such as class-specific dataset splitting for YOLOv8 [11] and enhanced Optical Character Recognition (OCR) through information localization, has laid the foundation for subsequent applications.
However, current works still exhibit weaknesses, particularly a lack of specialization for the Vietnamese industrial environment. QR code recognition systems prove inadequate in harsh conditions where cargo collisions and friction lead to torn or blurred code may cause significant losses for warehouses [12]. To address this, various studies have utilized AI to process product character labels when QR codes cannot be extracted. A common improvement involves combining the YOLO series (e.g., YOLOv5, YOLOv8) for high-speed label localization with an OCR network like PaddleOCR [13] for character recognition, as demonstrated in [12], [14]. Although this addresses the QR issue, traditional OCR networks like PaddleOCR are not optimized for the Vietnamese language and lack semantic awareness, requiring rigid regular expressions (regex), which are error-prone and inflexible. Furthermore, YOLO models pre-trained on the COCO dataset [15] only recognize common objects and are not specialized for industrial environments. On this basis, this research proposes and successfully implements an AI-integrated AS/RS model deployed on a web server tailored for not only Vietnam, but also maintains high adaptability for various industrial environments. The model employs YOLOv8 to detect and localize QR codes or labels via camera and transmits this data to Vintern-1B [16], a 1B-parameter multimodal large language model (MLLM) with robust Vietnamese OCR capabilities suitable for resource-constrained devices. Subsequently, a webserver was developed on the Node.js platform, utilizing Socket.IO for real-time data transmission and the Nodes7 library for direct TCP/IP communication with a Siemens S7-1200 PLC. This web server serves not only as a user-friendly Human-Machine Interface (HMI) on browsers but also as a processing distribution center: receiving JSON results from the AI model, logging them into a database, and outputting trigger signals to activate PLC I/O pins automatically.
The structure of this paper is as follows: Section. II presents the hardware fabrication principles and software design; Section III describes the experimental evaluation and results; and Section. IV provides the conclusion and future research directions.
The mechanical structure is developed based on a 3-axis Cartesian coordinate robot. Each axis is driven by a NEMA 17 stepper motor, ensuring high torque and precision during the transportation of goods into the Selective racking system. The control logic is executed on a Siemens S7-1200 PLC (CPU 1214C DC/DC/DC). Motion trajectories are optimized using a linear interpolation algorithm when delivering cargo into the storage cells. The general system is described in Fig. 1.
The physical system model (Fig. 2), consists of three independent motion axes: the X-axis moves along the aisle to cover the warehouse length (Fig. 3), the Z-axis performs lifting and lowering of the platform to access various rack levels (Fig. 4 and Fig. 5), and the Y-axis executes the reach operation to place or retrieve goods from the storage cells. The single selective racking system is divided into defined coordinate grids, allowing the robot to directly access any cargo position without moving surrounding items. To implement intelligent functions, a high-resolution camera is installed at the infeed conveyor area to collect product images for identification data processing by AI models (YOLOv8 and Vintern-1B).
The system's control functions are divided into two main operations: storage and retrieval. During the storage mode, goods are scanned and must be transported to a specific point equipped with an identification sensor. If the sensor fails to detect any cargo, the robot will stop at its last working position and await further instructions. For instance, if the robot completes delivering an item to slot 5 and no product signal is detected by the identification sensor, the robot will remain stationary at slot 5 until a new command is issued. These automatic storage processes are described in
Fig. 6 and Fig. 7.
In retrieval mode, the robot moves to the designated slot, retrieves the item, and transports it back to the staging area. Upon completion of the retrieval task, the robot returns to the Home position and remains on standby for the next command to resume operations. The process is demonstrated in Fig. 8.
The vision pipeline is established to replace manual inspection processes. The workflow is executed through two primary stages (Fig. 9):
The system integrates a WebServer built on the Node-RED platform, enabling remote monitoring and control of the warehouse status. Data, including product classification, storage slot locations, and storage timestamps, is continuously updated in real-time. Fig. 10 to Fig. 12 show basic working interfaces of the system
Interface Components in Fig. 13:
Users can interact via the webserver to execute automated storage tasks and manual or scan-based retrieval operations. For automated storage, the user places the cargo at the storage position and initiates the QR code (or OCR) scanning process. After that, the result will be displayed on the screen as shown in Fig. 13.
For retrieving, users can choose storage cells (green) to mark it (yellow) or use images/QR, working interfaces are shown in Fig. 14 to Fig. 16.
Based on the aforementioned design, the research team conducted experiments to evaluate the improvement and effectiveness of the experimental simulation. The hardware configuration utilized includes a Siemens S7-1200 1214 DC/DC/DC PLC, NEMA 17 stepper motors (size 42), and TB6600 drivers, with the wiring diagram as illustrated in the figure. The sensing system comprises E3F-DS30P1 obstacle detection sensors and FOTEK PL-05P PNP inductive proximity sensors. The system operates fully automatically through the integration of the Webserver, PLC, robot, and AI-powered camera. The AI model is responsible for product identification via QR codes or OCR, ensuring high accuracy in warehouse management.
Following the training and validation processes, the YOLOv8 model demonstrated high precision in identifying a diverse range of product classes.
The training process as shown in Fig. 17 indicates high learning efficiency without overfitting, as both training and validation losses (box, class, and dfl loss) consistently converged toward zero over 100 epochs. Key performance metrics reached stability, with precision maintaining approximately 0.8, while recall improved from 0.7 to 0.9. Notably, the model achieved a mAP50 of 0.9, and a mAP50-95 ranging between 0.6 and 0.8, demonstrating robust detection capabilities while suggesting further optimization potential for higher IoU thresholds.
The bar chart in Fig. 18 illustrates the total number of scans, successful QR code scans, and successful OCR scans for each product code from F1 to F9. To ensure experimental consistency, each product code (F1–F9) was subjected to 40 scans, this sample size ensures the reliability of the system in long-term operation, while keeping the hardware from being overloaded in experimental environment. Most products (F2, F3, and F5–F7) achieved a 100% QR success rate (40/40). However, F1, F4, F8, and F9 exhibited lower rates (36/40, 37/40, 39/40, and 38/40, respectively), reflecting real-world scenarios of corrupted QR codes. The 'OCR scan count' reached 40 successful attempts for nearly all products, even when the QR code was successfully read at 100%. This confirms that OCR operates in parallel with QR scanning as a dual verification and backup mechanism. In cases of QR failure (F1, F4, and F8), OCR successfully restoring the required information.
Fig. 19 illustrates the success rates (in percentage) for both QR code scanning and OCR for each product code from F1 to F9.
The research team measured the total cycle time for a single storage/retrieval operation, encompassing both AI processing and mechanical robot travel time. The results indicate an average cycle time of 17 seconds, with a 100% successful mechanical positioning rate.
The web server is designed to ensure ease of operation, intuitive monitoring, efficient risk handling, and overall system safety. Within this system, the web server supports user account authentication with persistent login credential storage (as illustrated). It provides a comprehensive and visually intuitive interface comprising the following components: a logging site (Fig. 20); an operating mode panel supporting both automatic and manual control modes
(Fig. 21); an AI model interface featuring dedicated control buttons, real-time operational status indicators and live visual feedback streamed directly from the camera module (Fig. 22) and a data logging table enabling full traceability of operational records, with export functionality to Excel
format for systematic data management and reporting purposes (Fig. 23).
In modern industrialization, optimizing storage space and automating warehouse management processes have become crucial to enhance corporate competitiveness. This research addressed the core limitations of traditional warehousing systems, specifically the lack of automated quality control and the insufficient flexibility in handling damaged product labels. The implemented solution involved developing an intelligent AS/RS that synchronously integrates Siemens S7-1200 PLC logic control, advanced AI-based computer vision, and real-time monitoring via a Webserver. A key breakthrough of this study lies in the integration of the YOLOv8 model for object identification and the Vintern-1B multimodal large language model (MLLM) for OCR data restoration in cases of QR code failure.
Experimental results have confirmed the effectiveness and reliability of the proposed model under practical operating conditions. The system achieved an average storage/retrieval cycle time of 17 seconds, successfully meeting the initial productivity target of under 20 seconds. Regarding AI performance, the detection model reached a mAP@0.5 precision of 0.992, while OCR-based data verification achieved an accuracy range of 92.5% to 100%, even with corrupted QR labels. The system maintained a 100% successful positioning rate, with inventory data consistently synchronized to the Node-RED monitoring interface via TCP/IP communication.
Despite these positive outcomes, certain limitations remain, such as the AI processing speed's dependence on computer hardware configuration and network latency occasionally affecting real-time data updates on the Webserver. Furthermore, the current model primarily focuses on uniform cargo sizes and labels within a single Selective racking system. Future research will focus on optimizing AI model architectures to reduce inference time, expanding recognition capabilities for more complex packaging types, and integrating advanced identification protocols such as RFID. Developing path optimization algorithms for robots in multi-aisle warehouse environments also remains a key objective to move towards a comprehensive and flexible intelligent warehouse management system for large-scale industrial applications.
This research was funded by Ho Chi Minh City University of Technology and Engineering, Vietnam, under grant No. SV2026-267. We want to give thanks to Ph.D. Quang-Huy Vu (HCM-UTE) due to his supervision for this project. We, authors, are grateful for these supports.
Link of operation of system is: https://youtu.be/5uLDbmLTdiE?si=iPUxwkEgU3hs7JnG.
Quang-Thien Nguyen, Development of an AI and Webserver-integrated Smart Automated Storage and Retrieval System