By Nidhi DhullReviewed by Susha Cheriyedath, M.Sc.Jun 25 2024
A recent article published in Sustainability presented a single-shot deep neural network (DNN) model capable of simultaneously performing proximity and relationship detections. It aimed to address the risk of unwanted forcible contacts associated with construction robots.
Background
The construction industry is increasingly employing robotic automation and digitization (or digital twinning) to improve capital productivity and as a solution to growing labor shortages. However, a critical safety issue remains when deploying robots alongside field workers due to the risk of forcible collisions.
The industry’s response to the issue of unwanted collisions is mostly limited to proximity monitoring. However, identifying a hazard solely based on proximity is sub-optimal. Accurately identifying a potential hazard also requires considering the relationship of the associated entities to determine whether or not they are co-working, and therefore permitted to be close.
Thus, this study explored the potential of DNN-based single-shot visual relationship detection in inferring relationships among construction objects directly from a site image. When equipped with well-matched architecture, a DNN can integrate local and global features into a single composite feature map and use it to detect intuitive relationships, much like a human vision system.
Methods
The research employed three DNN models tailored to different complexity levels in visual relationship detection, exploring their effectiveness in identifying relationships between construction objects in images. The Pixel2Graph DNN architecture, known for its capability in multi-scale feature abstraction and relationship detection, was utilized. Initial training used the Visual Genome dataset, followed by fine-tuning with construction-specific data. The models were developed and tested using Python 3, with Recall@X serving as the evaluation metric.
Three models were tested:
- Model #1 (Only-Rel): This basic model detected relationships using provided object bounding boxes and classes.
- Model #2 (Cla-Rel): This model handled both object classification and relationship detection.
- Model #3 (Loc-Cla-Rel): The most complex model, performing object localization, classification, and relationship detection.
Testing involved new construction images sourced from ongoing site videos and YouTube, with image annotation facilitated through Amazon Mechanical Turk.
Results and Discussion
The research conducted on the three distinct DNN models aimed to improve safety in the construction sector by enabling robots and humans to coexist safely. As evidenced by their performance metrics, the models demonstrated varying levels of efficacy.
-
OnlyRel Model: This model was designed to recognize relationships between predefined entities in an image, where both the bounding boxes and classes of these entities were given. It showed remarkable consistency between the fine-tuning stage, with a Recall@5 of 90.89 %, and the test performance, achieving 90.63 %. The negligible discrepancy between these figures indicates a robust model with excellent generalization capabilities, suggesting no overfitting occurred. This model's stable performance across both datasets underscores its reliability for deployment in real-world scenarios.
-
Cla-Rel Model: This intermediate model was responsible for deducing both the classification and relationships of entities based on provided bounding boxes. Although it performed well during the fine-tuning phase with a Recall@5 of 90.54 %, its performance dropped significantly to 72.02 % on the test dataset. This considerable decline highlights potential challenges in accurately classifying objects under variable real-world conditions, which in turn affects the accuracy of relationship detection. The disparity suggests that the model may benefit from enhanced training techniques or data to better handle the complexities introduced by varied construction environments.
-
Loc-Cla-Rel Model: The most complex model aimed to perform object detection (both localization and classification) and relationship detection within a single integrated framework. Despite achieving a high Recall@5 of 92.96 % during fine-tuning, its performance markedly decreased to 66.28 % in the testing phase. This drop is significant and indicates difficulties in simultaneously managing multiple tasks—particularly the integration of object detection with relationship inference. The model's intricate architecture, although capable of learning detailed contextual nuances of construction sites, appears to struggle when applied to the unpredictability of new, unstructured test data.
The differential performance of these models on test datasets, particularly the decrease in accuracy with increased task complexity, underscores the critical need for further optimization. These could include refining the network architecture or employing more sophisticated training regimes that better simulate the diverse conditions of construction sites. Additionally, increasing the dataset variety and volume might help in enhancing the model's ability to generalize across different scenarios. Such improvements are essential for the practical deployment of these technologies to ensure the safety of construction workers alongside robotic systems.
Conclusion
While construction workers continue to play a vital role in automated environments, ensuring their safety in the presence of robots is paramount. The study introduces a novel approach using DNNs for single-shot visual relationship detection, which mimics human-like perception in identifying potential hazards from a single image. This technology could complement existing proximity monitoring systems without the need for additional hardware.
Despite the promising initial results, the test performances indicate a need for further refinement. Future improvements could involve more extensive training data and enhancements in DNN architectures and training methodologies to better prepare the models for real-world applications.
Journal Reference
Kim, D., Goyal, A., Lee, S., Kamat, V. R., & Liu, M. (2024). Single-Shot Visual Relationship Detection for the Accurate Identification of Contact-Driven Hazards in Sustainable Digitized Construction. Sustainability, 16(12), 5058. https://doi.org/10.3390/su16125058, https://www.mdpi.com/2071-1050/16/12/5058
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.