Public Library of Science
Browse

Architecture of the GLU.

Download (281.71 kB)
figure
posted on 2025-04-24, 17:26 authored by Qingduan Meng, Jiadong Guo, Hui Zhang, Yaoqi Zhou, Xiaoling Zhang

Computer vision holds tremendous potential in crop disease classification, but the complex texture and shape characteristics of crop diseases make disease classification challenging. To address these issues, this paper proposes a dual-branch model for crop disease classification, which combines Convolutional Neural Network (CNN) with Vision Transformer (ViT). Here, the convolutional branch is utilized to capture the local features while the Transformer branch is utilized to handle global features. A learnable parameter is used to achieve a linear weighted fusion of these two types of features. An Aggregated Local Perceptive Feed Forward Layer (ALP-FFN) is introduced to enhance the model’s representation capability by introducing locality into the Transformer encoder. Furthermore, this paper constructs a lightweight Transformer block using ALP-FFN and a linear self-attention mechanism to reduce the model’s parameters and computational cost. The proposed model achieves an exceptional classification accuracy of 99.71% on the PlantVillage dataset with only 4.9M parameters and 0.62G FLOPs, surpassing the state-of-the-art TNT-S model (accuracy: 99.11%, parameters: 23.31M, FLOPs: 4.85G) by 0.6%. On the Potato Leaf dataset, the model attains 98.78% classification accuracy, outperforming the advanced ResNet-18 model (accuracy: 98.05%, parameters: 11.18M, FLOPs: 1.82G) by 0.73%. The model proposed in this paper effectively combines the advantages of CNN and ViT while maintaining a lightweight design, providing an effective method for the precise identification of crop diseases.

History