Journal of Fuzzy Systems and Control, Vol. 3, No 2, 2025 |
LQR Controller Based on BAT Algorithm for Rotary Double Parallel Inverted Pendulum
Thanh-Tri-Dai Le 1, Ngoc-Kien Nguyen 2, Phuc-Truong Le 3, Minh-Nguyen-Bao Bui 4, Trong-Tin Nguyen 5, Chi-Anh Tran 6, Phuong-Tu Doan 7, Duc-Nhan Dao 8, Van-Dong-Hai Nguyen 9, Thanh-Tung Nguyen 10,*
1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Ho Chi Minh City University of Technology and Education (HCMUTE),
Ho Chi Minh City (HCMC), Vietnam
10 Ho Chi Minh City Bach Nghe College, Ho Chi Minh City (HCMC), Vietnam
Email: 1 19151051@student.hcmute.edu.vn , 2 21146477@student.hcmute.edu.vn, 3 21146528@student.hcmute.edu.vn,
4 20151013@student.hcmute.edu.vn, 5 21151174@student.hcmute.edu.vn, 6 20151334@student.hcmute.edu.vn,
7 21146531@student.hcmute.edu.vn, 8 21145224@student.hcmute.edu.vn, 9 hainvd@hcmute.edu.vn,
*Corresponding Author
Abstract—This paper presents an enhanced approach to stabilizing the Rotary Double Parallel Inverted Pendulum (RDPIP) through a combination of the LQR method and the BAT algorithm. Traditionally, selecting appropriate Q and R matrices relies on designers' intuitions or trial-and-error processes, often resulting in suboptimal performance. By leveraging the BAT algorithm’s swarm intelligence, the proposed method automatically optimizes the cost function to yield improved control performance. Key improvements include shorter stabilization time, reduced overshoot, and minimized oscillations. Simulation results show that the BAT-enhanced LQR controller significantly outperforms traditional design in terms of convergence speed and system damping. These findings underscore the potential of metaheuristic algorithms in refining classical control strategies for complex, nonlinear systems.
Keywords—LQR Controller; BAT Algorithm; Rotary Double Parallel Inverted Pendulum; Swarm Intelligence
Inverted pendulum (IP)—characterized by its under-actuated, nonlinear, and strongly coupled nature—is a classic benchmark in control theory. It effectively exemplifies key attributes such as stability, robustness, and tracking performance, making it an ideal platform for comparing different control strategies and assessing their respective strengths and weaknesses. Consequently, extensive research on this system has significantly advanced the field of control theory [1]-[3].
Fig. 1 illustrates fundamental structure of a novel benchmark model: the first-order RPDIP. With its intricate mechanical design, higher degrees of freedom, and pronounced nonlinear dynamics, RPDIP poses a considerably greater challenge in terms of control. Studies on RPDIP are expected to provide valuable insights that will guide the development of control methods for similar systems. LQR and nonlinear controllers have been applied to Rotary IP (RIP) [4][5] and other kinds of pendulums [6]. PRDIP is a developed form from RIP in mechanical design by adding parallel link to existing arm [7]. Based on that research, a fuzzy controller is designed to operate this model well from imitating LQR traditional controller [8]. Thence, it is proved that intelligent control can be applied for this model from a linear controller. However, a direction of optimizing LQR control has not been mentioned yet for this model. In this research, we utilize results from previously tested LQR control [7] but we develop a searching method to optimize the traditional LQR control.
LQR is a classical control that is applied widely in IP [8]- [12]. This method is designed by modeling the system structure. Its accuracy and efficiency have been verified. But, weight matrices
and
are still chosen through the trial-and-error test. In [13]-[17], methods of choosing those matrices are shown, but difficulties in simultaneously ensuring satisfaction of the objective function and satisfying the criteria still exist. In this paper, we use the BAT algorithm for multivariate problems. This algorithm learns from the movement of bats in finding prey and avoiding obstacles in order to find solutions for the objective function [18]. LQR control is designed from on swarm optimization. The BAT algorithm is used to optimize 𝑄 and 𝑅. By updating matrix parameter values, we use weighting techniques to incorporate designers' experiences in algorithm's search. Simulation proves the ability of BAT algorithm. Optimized control law is shown to be effective when compared with conventional LQR control rules.
Mathematical model of system forms the foundation for designing the LQR controller, which is enhanced through the application of swarm optimization techniques. Specifically, the BAT algorithm is employed to optimize the 𝑄 and 𝑅 matrices, leveraging its robust search capabilities. To further refine optimization process, a weighting technique is integrated into parameter update mechanism, allowing the incorporation of designer's expertise into the algorithm's search process. This approach aims to achieve a globally optimal solution for the matrices, enabling LQR controller to generate an optimal state feedback control matrix. By doing so, it addresses the limitations associated with traditional methods that rely heavily on experience, trial-and-error for selecting
and
matrices. Effectiveness of the proposed controller is demonstrated through simulation, where its performance is compared against conventional LQR control.
Novelty of BAT algorithm in comparison to other-based LQR designs, such as PSO or genetic algorithm (GA), lies in its bio-inspired echolocation mechanism, which enables a dynamic balance between exploration and exploitation through adaptive adjustments of frequency, loudness, and pulse emission rates. Unlike PSO, which relies on position and velocity updates influenced by social learning, or GA, which depends on stochastic crossover and mutation operators, the BAT algorithm offers a more structured and directed search process. It facilitates efficient local exploitation near the best-known solutions while preserving global search capabilities, thereby improving the likelihood of escaping local optimization in high-dimensional and nonlinear optimization problems. The BAT algorithm requires fewer control parameters, simplifies tuning efforts, and generally exhibits faster convergence. These features make it particularly well-suited for optimizing the Q and R weighting matrices, especially for complex and under-actuated systems such as RDPIP.
System parameters are listed in Table 1 [7], Table 2 presents key parameters relevant to PRDIP [7].
Parameters | Definition | Unit |
| First pendulum's mass |
|
| First pendulum's length | m |
| Angle of ith pendulum | rad |
| The inertia of "i-th" pendulum | kgm2 |
| Effective moment of inertia of pendulum | kgm2 |
| Angle of arm | rad |
| Length of arm |
|
| Arm’s inertia | kgm2 |
| Torque of DC motor | Nm |
| Gravitational constant | m/s2 |
| Coefficient of viscosity for i-th pendulum | Nms |
| Coefficient of viscosity of arm | Nms |
Parameter | Pendulum 1 | Pendulum 2 | Arm |
| 0.059 | 0.038 | Na |
| 0.0127 | 0.082 | Na |
| Na | Na | 0.51 |
| 0.0001526 | 0.082 | Na |
| Na | Na | 0.75 |
| 1.526*10-4 | 4.0693*10-4 | Na |
| Na | Na | 4.978 |
Lagrange operator is
| (1) |
The following is the Lagrange equation.
| (2) |
Kinetic energy is
| (3) |
where
,
, respectively, are the velocities of the first and second pendulums.
Kinetic energy is rewritten as.
| (4) |
The system's potential energy is
| (5) |
Energy dissipation of PRDIP is
| (6) |
The Lagrange equation is.
| (7) |
DC servo motor is used to control whole model. Relation between torque and voltage is
| (8) |
Parameters of DC motor are in Table 3 [7].
Parameter | Unit | Value |
| Vs/rad | 0.064944 |
| Vs/rad | 0.064944 |
|
| 6.835271 |
Dynamic equations of PRDIP are
| (9) |
Where:
| (10) |
| (11) |
| (12) |
| (13) |
| (14) |
| (15) |
| (16) |
| (17) |
| (18) |
| (19) |
| (20) |
| (21) |
Equilibrium point (the upright position) is chosen as
| (22) |
| (23) |
The linear model at the equilibrium can be obtained by
| (24) |
Matrices A and B are calculated to be
| (25) |
Where:
| (26) |
| (27) |
| (28) |
Thence, it yields
| (26) |
| (27) |
If using pole-placement, MATLAB can be obtained by using the command eig (
):
| (28) |
Nonlinear system is shown as
| (29) |
where
is state variable matrix; u is control signal.
Equilibrium point is chosen as:
| (30) |
When u=0, system is balanced. We can approximate system in (29) to linear form as
| (31) |
where
|
Around equilibrium point, we consider system approximately a linear system. Thence, LQR control structure at working point of a linear system in (31) can be shown in Fig. 2.
Control signal is
| (32) |
Command to compute K is executed as follows:
| (33) |
Where: A and B are calculated in (31), 𝑄 and 𝑅 are weight matrices selected as follows.
| (34) |
Matrices
and
impose mutual restrictions, where
is directly proportional to the system's anti-interference capability. Increasing
enhances this capability and shortens the system's adjustment time. However, it also amplifies system oscillations and raises energy consumption. Conversely, increasing
reduces power consumption but extends the adjustment time. Therefore, the key to effective design lies in determining the appropriate weight matrices
and
. Once these matrices are established, the state feedback matrix
is determined. However, selecting
and
largely relies on experience and a trial-and-error approach in the LQR controller design process. This subjectivity can lead to an imperfect controller design, ultimately affecting control efficiency.
| (35) |
BAT algorithm [19], is inspired by the echolocation behavior of microbats. A key distinguishing feature of this algorithm is its use of frequency tuning, making it the first of its kind to integrate optimization with computational intelligence. In this approach, each bat is represented by a velocity
and a position
at iteration
within a 𝑑-dimensional search space. The position serves as a solution vector associated with a specific objective function. Among the 𝑛 bats in the population, the best solution
found during the iterative search is retained for reference. The algorithm follows these fundamental assumptions [17]-[23]:
Bats utilize sound wave echolocation to assess distances and can distinguish between food sources, prey, and obstacles.
Each bat moves randomly with velocity
at position
, adjusting frequency
(or wavelength) of emitted pulses as well as the pulse emission rate
based on the target’s proximity. While echo intensity can vary in different ways, it is generally assumed to decrease from an initial maximum value
to a minimum threshold
.
Many studies, for simplicity, do not incorporate ray tracing into this algorithm. Instead, they leverage variations in frequency
or wavelength
to adapt to different applications, depending on factors such as ease of implementation [18]. Regarding this problem, LQR controller will be utilized in conjunction with BAT algorithm to determine the optimal Q and R parameter set.
First, we determine the parameters of algorithm and generate an initial BAT population. The following key parameters are initialized in Table 4:
Parameters | Definition |
| Number of BATs |
| Number of parameters to find |
| Maximum number of iterations |
| Initial loudness |
| Initial pulse emission rate |
| The search space limits |
| The frequency limits |
After defining the parameters of the BAT algorithm, first, we initialize the initial random position
, velocity
, and frequency
of the bat.
| (36) |
| (37) |
| (38) |
Next is to evaluate the quality of each initial solution in the overall scheme. If the solution violates constraints (i.e., any value ≤ 0), assign an infinite fitness value.
| (39) |
The weight matrices Q and R for LQR controller are constructed as follows to match the search process of BAT algorithm:
| (40) |
The optimal controller K is determined by solving the Riccati equation using the matrices
,
,
and
.
| (41) |
Performance of controller is evaluated using the fitness function.
| (42) |
Where
is the error of 
Updating of the best-known solution, best-known solution
is updated by selecting solution with minimum fitness value.
| (43) |
Update bat positions
, velocities
, and frequencies
iteratively.
| (44) |
| (45) |
| (46) |
To ensure adherence to the defined search space, boundary constraints are enforced on the position.
| (47) |
If a bat emits fewer pulses, it performs a local search near the best-known solution.
| (48) |
Where:
Next, repeat the process from formula number (39) to (42). The solution is updated if an improvement is observed.
| (49) |
The algorithm parameters are adjusted to decrease loudness and increase the pulse emission rate, thereby achieving a balance between exploration and exploitation.
Decrease loudness over iterations:
| (50) |
Increase pulse emission rate:
| (51) |
Where
and
are optional constants that can be adjusted to suit specific cases. The best solution found so far has been retained. If a new solution outperforms the current global best, it is updated accordingly.
| (52) |
The entire BAT algorithm is iteratively repeated until the termination condition of loop
is met.
Fig. 3 illustrates a block diagram of controller synthesis utilizing LQR method integrated with BAT algorithm.
Parameter set obtained through the trial-and-error method as well as the parameter set derived from the BAT algorithm. The results from both approaches were then compared to evaluate their effectiveness. To synthesize LQR controller, we first select parameter determination scenario as follows: bring pendulums from initial position
to original equilibrium position
.
Parameter values for Q and R are specified as follows.
| (53) |
Control parameters are determined by the usual trial and error method and the resulting
matrix is
| (54) |
The simulation results of traditional LQR controller are shown in Fig. 4, Fig. 5, Fig. 6, and Fig. 7. Fig. 4 shows position of the crank arm. It shows initial oscillation amplitude of approximately 0.12 radians. The system exhibits a significant damping effect, with oscillations gradually decreasing over time; however, complete stabilization is achieved after approximately 30 seconds, indicating a relatively slow convergence. In contrast, Fig. 5 shows the position of Pendulum 1, which demonstrates an initial oscillation amplitude of around ±0.01 radians and stabilizes fully within approximately 7 seconds, reflecting a notably faster convergence compared to the crank arm. Similarly, Fig. 6 presents position of Pendulum 2, with initial oscillation amplitude of approximately ±0.01 radians, comparable to Pendulum 1. However, Pendulum 2 requires slightly more time to stabilize, reaching full stability after approximately 8 seconds. These results highlight the differences in stabilization dynamics between the crank arm and the pendulums, with the latter exhibiting significantly faster convergence and more efficient damping.
Fig. 7 shows control signal U, which initially exhibits significant oscillations, reaching amplitudes of approximately ±12V. Over time, the signal demonstrates a gradual stabilization, with oscillations diminishing significantly after approximately 10 seconds, ultimately maintaining a stable and reduced level. This behavior indicates the effective convergence of the control system toward a steady-state condition.
In this section, we present results obtained from determining
and
parameters of LQR controller using BAT algorithm can be seen in Table 5.
| (54) |
Parameters | Values |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Control parameters are determined using BAT algorithm and the resulting K matrix is calculated as
| (54) |
Simulation results of LQR controller combined with BAT algorithm are shown in Fig. 8 to Fig. 11. Fig. 8 shows the position of the crank arm, showing initial oscillation amplitude of approximately 0.04 rad. The system exhibits a gradual reduction in oscillations, stabilizing after approximately 10 sec, with a smooth decrease in amplitude and no excessive overshoot, indicating an effective damping effect. Similarly, Fig. 9 shows the position of Pendulum 1, which demonstrates initial oscillation amplitude of around ±0.005 rad. The system converges rapidly, achieving full stabilization within 3 sec, reflecting highly effective control with minimal residual fluctuations. Fig. 10 presents the position of Pendulum 2, with an initial oscillation amplitude comparable to Pendulum 1 (~±0.005 rad). However, Pendulum 2 stabilizes fully after approximately 4 sec, slightly later than Pendulum 1, yet still demonstrating quick damping and a well-balanced control strategy. These results collectively highlight the efficiency and robustness of the control system in achieving rapid stabilization across all components.
In Fig. 11, it shows control signal U, initially exhibits fluctuations. Amplitude ranges from ±12V. Signal gradually decreases and stabilizes within 6 seconds, demonstrating a smooth reduction in control effort. This behavior suggests efficient energy utilization while ensuring system stability. A comparison of convergence times across system components further highlights the effectiveness of the control strategy, with the crank arm stabilizing in approximately 10 sec, Pendulum 1 in 3 sec, and Pendulum 2 in 4 sec, collectively underscoring balanced and rapid stabilization achieved by the control approach.
The simulation comparison of state variables of the system under traditional LQR and optimized LQR controllers is shown in Fig. 12 to Fig. 15. The details of these figures are listed in Table 6 to Table 8 for discussion.
In Table 6, the comparison of settling times of variables is shown for both traditional LQR and BAT-optimized LQR In Table 6, LQR + BAT Algorithm significantly improves stabilization time across all components. The crank arm now stabilizes three times faster, while the pendulums converge in half the time compared to the traditional LQR.
Component | LQR Traditional | LQR + BAT algorithm |
Arm | ~30 sec | ~10 sec |
Pendulum 1 | ~7 sec | ~3 sec |
Pendulum 2 | ~8 sec | ~4 sec |
Control Signal U | ~10 sec | ~6 sec |
In Table 7, the vibrations of variables are compared in both traditional LQR and BAT-optimized LQR controllers. In Table 7, BAT algorithm significantly reduces the initial peak oscillations, resulting in a smoother and more controlled system response. Specifically, the crank arm's initial oscillation amplitude is reduced by 67%, demonstrating improved damping performance. Similarly, the pendulums exhibit a 50% reduction in oscillation amplitude, contributing to enhanced stability. While the control signal maintains the same peak amplitude, it stabilizes more rapidly, indicating both efficient energy utilization and faster convergence. These improvements collectively highlight the effectiveness of the BAT algorithm in optimizing system performance.
Component | LQR Traditional | LQR + BAT algorithm |
Arm | ~0.12 rad | ~0.04 rad |
Pendulum 1 | ±0.01 rad | ±0.005 rad |
Pendulum 2 | ±0.01 rad | ±0.005 rad |
Control Signal U | ±12V | ±12V |
To further validate the performance improvement achieved by the BAT-optimized LQR controller, a statistical analysis was conducted based on multiple simulation runs. Specifically, we computed the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) of the angular positions of the crank arm and both pendulums. Table 8 presents the comparative results between the conventional LQR and the BAT-based LQR controllers.
In Table 8, statistical data further corroborate the visual observations. The BAT-based LQR consistently reduces both the MAE and RMSE values across all components, indicating more precise tracking and better damping characteristics.
Component | Controller Type | MAE (Rad) | RMSE (Rad) |
Arm | LQR | 0.0385 | 0.0497 |
Arm | LQR + BAT | 0.0127 | 0.0181 |
Pendulum 1 | LQR | 0.0096 | 0.0124 |
Pendulum 1 | LQR + BAT | 0.0042 | 0.0055 |
Pendulum 2 | LQR | 0.0103 | 0.0130 |
Pendulum 2 | LQR + BAT | 0.0049 | 0.0061 |
However, despite the improved performance, the proposed approach also introduces certain limitations that should be acknowledged. One primary concern is the computational cost associated with the BAT algorithm. Since the optimization requires multiple iterations of Riccati equation solving and dynamic simulations per candidate solution, the total computation time may become significant, especially for high-dimensional systems or real-time applications.
Furthermore, the current implementation assumes an offline optimization setting, where the optimal gain matrix is computed prior to deployment. Applying this approach to real-time systems would require additional considerations, such as real-time feasibility of matrix updates, computational load on embedded hardware, and stability under model uncertainties.
These aspects highlight the need for future work focused on hardware-in-the-loop testing, algorithmic simplification, or hybrid methods that combine fast-converging techniques with the global search capacity of BAT to balance performance and real-time capability.
This study proposed a hybrid control design that combines LQR control with the BAT algorithm to stabilize RDPIP. By leveraging the global search capability of the BAT algorithm, the proposed method effectively optimized Q and R weighting matrices, overcoming limitations of conventional trial-and-error approaches. The simulation results demonstrated significant improvements in stabilization time, oscillation suppression, and control efficiency compared to the traditional LQR controller.
In addition to the simulation-based validation, statistical metrics such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) confirmed the superior tracking accuracy of the BAT-optimized controller. These findings highlight the practical potential of integrating metaheuristic optimization with classical control strategies for under-actuated and nonlinear systems.
For future work, several concrete directions are recommended. First, real-time implementation of the proposed method on embedded hardware or robotic platforms should be investigated to evaluate its practical feasibility and robustness under physical uncertainties and disturbances. Second, comparative studies with other metaheuristic algorithms such as PSO, GA, or newer variants like Grey Wolf Optimizer (GWO) would further validate the effectiveness of BAT in this context. Additionally, developing a simplified or adaptive version of the BAT algorithm with lower computational cost could enable online or real-time tuning of controller parameters.
Finally, the scalability and generalizability of this approach should be explored by applying it to other complex, high-order, or multi-input-multi-output (MIMO) control systems beyond the RDPIP, such as aerial vehicles, robotic manipulators, or balancing robots. Such extensions would demonstrate the broader applicability of the proposed control framework in diverse domains.
This paper belongs to project SV2025-157 and it is funded by Ho Chi Minh City University of Technology and Education (HCMUTE). We, authors, want to give thanks for that support.
Thanh-Tri-Dai Le, LQR Controller Based on BAT Algorithm for Rotary Double Parallel Inverted Pendulum