By Nidhi DhullReviewed by Susha Cheriyedath, M.Sc.Oct 2 2024
A recent article published in Buildings leveraged natural language processing (NLP) and knowledge graph (KG) modeling techniques to recommend a framework for construction safety guidelines. Unstructured safety texts were transformed into a structured, interconnected KG and then ranked according to relevance using PageRank and Louvain Clustering algorithms.
Background
The construction industry is large and labor-intensive, contributing significantly to the economy but also facing high risks due to frequent accidents.
Countries like the United States, Australia, and the United Kingdom enforce advanced safety practices to reduce these risks. Government agencies in these nations regularly publish standardized safety guidelines to improve safety across the industry. However, these guidelines are often fragmented and presented as lengthy PDF documents, making them difficult for professionals to apply effectively in their daily work.
To address this challenge, there is a need for more streamlined and accessible ways to distribute and use these guidelines. This study proposes integrating the fragmented safety management guidelines into a unified, structured knowledge graph (KG) using natural language processing (NLP) and KG modeling techniques, ensuring more practical and effective use in enhancing construction safety.
Methods
The proposed KG modeling and recommendation framework involved three primary steps: preprocessing construction safety guidelines, creating KG models, and applying ranking and clustering algorithms to identify the most critical and relevant safety items.
Text data was extracted from 86 construction safety guideline documents provided by the Korea Occupational Safety and Health Agency (KOSHA). These PDFs, classified by work type (e.g., building demolition, bridge construction), were converted into CSV format containing 5,988 rows with category, title, and content columns split by statements.
Using the Soynlp library, the content was tokenized into L-tokens (nouns) and R-tokens (particles, conjunctions, etc.). The L-tokens, representing core concepts and technical terms, were used to build the knowledge graph (KG). To quantify their relevance, the TF-IDF weight function was applied to rank tokens within the sentences.
The KG was constructed using Neo4j, leveraging its Cypher language to connect guideline items based on document hierarchy and shared keywords. To enhance insights, PageRank and Louvain Clustering algorithms were applied to the KG, identifying the most relevant guideline items for practical use.
Results and Discussion
The graph generation procedure produced 669 category nodes, 5,988 content nodes, and 102,923 index nodes. Preprocessing established connections (termed Relate edges) between content nodes, forming a web of relationships among the information entities. The weight of each Relate edge was determined by the number of shared indexes.
After preprocessing and constructing the graph, the network included 217,220 Include edges, representing associations or inclusions between different elements, and 13,301,903 Relate edges, reflecting interconnectedness between content nodes based on shared indexes.
The PageRank algorithm was applied to the projected graph extracted from the KOSHA guidelines using the keyword "scaffolding." This algorithm assessed the importance and connectivity of each content item through shared keyword counts. In parallel, the Louvain algorithm identified clusters with high modularity, representing groups of highly related content.
Out of the 26 content items extracted from the graph, 13 were selected based on their PageRank values and the clustering results from the Louvain algorithm. The highest PageRank values corresponded to critical safety topics, such as falls, drops, and scaffolding—topics essential in multiple safety contexts. Furthermore, content within the same clusters generally shared themes like fall prevention, scaffolding, and structural integrity, demonstrating thematic cohesion.
Notably, the content items with high PageRank values focused on significant worksite safety aspects, including preventing tripping hazards around scaffolding, minimizing wind pressure risks, and ensuring adherence to safety protocols during scaffold ascents. Thus, users would then be able to efficiently identify the most crucial safety measures interconnected with various other guidelines, sharpening their focus on the most critical safety topics.
The Louvain algorithm facilitated the organization of safety content by grouping related guidelines, such as fall protection and scaffolding safety. This clustering allows users to navigate through related safety topics easily, enhancing comprehensive safety management by addressing related safety measures within their respective clusters.
Conclusion
Overall, the researchers successfully developed a KG-based method utilizing NLP to organize and systematize construction safety guidelines. The application of the PageRank and Louvain clustering algorithms enabled the efficient extraction of key information from the graph database, facilitating the retrieval of safety-related data pertinent to construction trades.
While the proposed recommendation system offers valuable insights, further rigorous field testing is necessary to evaluate its practical applicability. The researchers recommend enhancing the system to account for contextual factors, such as the specific types of construction tasks and their conditions.
Journal Reference
Lee, J., & Ahn, S. (2024). PageRank Algorithm-Based Recommendation System for Construction Safety Guidelines. Buildings, 14(10), 3041. DOI: 10.3390/buildings14103041, https://www.mdpi.com/2075-5309/14/10/3041
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.
Article Revisions
- Oct 3 2024 - Revised sentence structure, word choice, punctuation, and clarity to improve readability and coherence.