Land use and land cover (LULC) change information is often embedded in unstructured scientific texts, which impedes large-scale analysis of land-use trends. This paper presents a pipeline to automatically extract LULC entities and their relationships from such texts, enabling the construction of a structured knowledge graph of land-use change events. The proposed pipeline integrates three key components: sentence classification, named entity recognition (NER), and rule-based relation extraction.
First, a transformer-based sentence classifier is fine-tuned to identify sentences that describe LULC changes, achieving an accuracy of 0.7531, F1-score of 0.6552, and recall of 0.791 on a domain-specific corpus. Next, a domain-adapted NER model recognizes important LULC concepts (e.g., land-cover types, change processes, quantitative changes, and locations) within those sentences. Finally, a rule-based relation extraction module, augmented by a large language model (Meta-Llama-3-70B), links the extracted entities into meaningful relations.
We compare zero-shot and few-shot prompting strategies for the LLM: the few-shot approach significantly improves the quality and number of valid relations extracted compared to zero-shot. The results demonstrate the effectiveness of our pipeline in transforming textual descriptions into a coherent LULC knowledge base, which can support automated trend analysis across regions and time. We also discuss challenges encountered, including non-standardized LULC terminology, limited labeled data, and underrepresented relation types, and outline future directions to address these issues.


