To run this code, download the file “AdaptationElements” available at: https://doi.org/10.7910/DVN/VK3CP9. Adjust the file path below with the correct location where you would saved the “AdaptationElements” file
This section classifies climate-related hazards mentioned in the
ElementText
column of the adaptdata
dataset
using regex-based keyword matching, aligned with the protocol’s
definitions.
Column Creation
A new column HazardType
is added to store the hazard
category for each entry. This column is positioned after
ElementText
.
Keyword Mapping
Hazard categories are mapped to regex patterns that represent keywords
found in the protocol definitions. These keywords are matched
case-insensitively within the ElementText
.
Tagging
The script loops over all hazard patterns and tags any matching rows
accordingly.
Hazard Category | Keywords Used |
---|---|
Extreme temperature | heat wave, heatwave, excessive heat, high temperature, extreme temperature, extreme heat, extremely low, cold waves, cold wave, snow, ice, frost, freeze, severe winter, maximum and minimum, warmer, higher occurrence of hot, glacier, snowfall, evapo, extreme weather, heat stress, hot days, hot nights |
Storm | storm, tropical storm, cyclone, cyclones, Cyclonic activity, typhoon, typhoons, hail, lightning, thunderstorm, heavy rain, windstorm, sand storm, dust storm, tornado, violent rains, torrential, strong winds |
Drought | drought, drought cycle, prolonged droughts, drougths, dry spell, dry days, aridity |
Wildfire | wildfire, wild fire, forest fire, land fire, bush fire, pasture fire, fire |
Landslide | landslide, land slide |
Flood | flood, coastal flood, riverine flood, flash flood, ice jam flood, inundation |
Change in temperature | change in temperature, alteration in average temperature, temperature change, temperature rise, temperature drop, consistent change in temperature, rise in temperature, increase in temperature, increases in temperature, temperature increase, increasing temperature, increased temperature, increase temperatures, higher temperatures, average temperature, annual temperature, annual mean temperature, annual air temperature, minimum temperatures, maximum temperatures, number of warm days, rising temperatures, warming temperatures |
Change in precipitation | change in precipitation, alteration in precipitation, precipitation patterns, shift in timing, shift in amount, shift in intensity, shift in frequency, rainfall, precipitation, distribution of rains, disruption of rains, fluctuation of rains, shorter rain, earlier and ending later |
Salinization | salinization, salt content, saltwater, salt water, salinity, increase in salt content |
Land degradation | land degradation, decline in land quality, decline in land health, pasture degradation, desertification, loss of organic matter, degradation of land, erosion, soil erosion |
Sea level rise | sea level rise, sea level, increase in sea level, coastal flooding, coastal erosion, beach loss, coastline retreat, submersion, water mass |
Sea temperature | sea temperature, sea surface temperature, change in sea temperature, ocean temperature, water temperature, surface temperature, seawater surface |
Ocean acidification | ocean acidification, reduction in pH, acidity, acidification, acidic, coral |
Pest and disease | pest, disease, epidemic, infestation, invasion, insect infestation, vector-borne, biological event, invasive species |
## Hazard rows assigned a category: 1478 out of 1781
## **Coverage Rate:** 83 %
by regions
We assign a SystemType to each row of adaptdata where the element is “System at risk”. This classification is based on the official reporting protocol definitions, which describe different climate-sensitive systems affected by hazards (e.g., agriculture, biodiversity, infrastructure).
To do this, we use a dictionary of keyword patterns that match text found in the ElementText column. These keywords are derived from the protocol and expanded where needed to capture variations in language.
Only rows where the element is “System at risk” are considered.
The following table summarizes the system categories used for classification and the corresponding keyword patterns:
System Type | Keywords Used |
---|---|
Crop | crop, cropping systems, crop production, yield, cultivated area, agriculture, agricultural, agricultural production, agricultural pests, pest, irrigated, rain-fed, farming, land use planning, crop loss, agroecological zone, production |
Livestock | livestock, pasture, pastoralist, pastoral area, grazing, animal health, productivity losses, herder, livestock loss |
Fisheries and aquaculture | fish, fisheries, aquaculture, fishing, marine harvest |
Forest | forest, forestry, forest product, tree, non-timber |
Terrestrial | terrestrial ecosystem, terrestrial, drylands, land resource, natural resource, ecosystem structure, ecosystem services, ecological system, desertification, land degradation, soil degradation, soil erosion, environment |
Freshwater | freshwater, wetlands, inland wetlands, water resource, drinking water, potable water, water quality, water availability, water supply, river, water scarcity, water shortage, hydrological cycle, water stress, water table, water source, groundwater, water supplied, dams, eutrophication, algal bloom |
Biodiversity | biodiversity, flora, fauna, species, species extinction, extinction, range of species, ecosystem change |
Coastal | coast, coastal, marine, mangrove, ocean, coral, beach, coastal erosion, sea level rise, blue carbon, coastal ecosystem, coastal zone |
Food and nutrition | food security, food insecure, food insecurity, nutrition, malnutrition, hunger, food safety, food availability, famine, undernutrition, overnutrition, obesity |
Gender and inclusion | gender, women, youth, children, elderly, inclusion, social exclusion, vulnerable group, vulnerable population, minority group, indigenous, small-scale producer, pastoralist, fishing communities, forest-based communities, high-risk regions |
Livelihoods and poverty | livelihood, poverty, income, employment, labor, economic activity, loss of income, loss of livelihood, safety net, insurance, socio-economic development, tourism, subsistence, workforce, economic, tourists, outdoor activities, ski, vacation |
Health | health, mental health, morbidity, mortality, vector-borne disease, water-borne disease, infectious disease, respiratory disease, malaria, epidemic, climate-sensitive disease, heat morbidity, disease, deaths, human lives, pollution, life expectancy |
Infrastructure and services | infrastructure, critical infrastructure, services, critical services, road, bridge, electricity, power supply, energy, water supply, sanitation, hygiene, education, school, building, housing, settlement, evacuation, telecom, transport, waste management, power, hydropower, railways, port, industry, material, industries |
Human security and Peace | migration, displacement, conflict, armed conflict, national security, human security, organized conflict, climate-induced migration, refugee, peace |
## **System at Risk Coverage:** 1428 of 1605 rows were successfully assigned a category.
## **Coverage Rate:** 89 %
by regions
We classify sectors in two ways:
Sector
and SystemType
columns.
This follows the sector taxonomy defined in the MPGs (Table 1).The tagging logic first applies regular expressions to the text in
Sector
, with some fallback rules based on the
SystemType
when relevant. The classification is additive
and accounts for overlapping sector concepts.
Below is a summary table of sector categories and the associated keywords used.
IPCC Sector Category | Keywords / Match Terms |
---|---|
Food, fiber and other ecosystem products | agri, agro, agriculture, food, crop, livestock, animal, fish, fisheries, aquacultur, aquaculture, seed, irrigation, value chain, land use, land tenure, land and forestry, agroforest, Agriculture and food security, Agriculture, Climate services, Others, forest and other land uses, Agriculture, forest and other land uses, land affairs, land reforms, Food and nutrition, Crop, Livestock, Fisheries and aquaculture, Sustainable development, Agriculture |
Terrestrial and freshwater ecosystems | forest, environ, environment, enviornment, ecosystem, biodiversity, natural, ecology, ecolog, wildlife, REDD, peatland, protected area, Terrestrial, Freshwater, land |
Ocean and coastal ecosystems | ocean, marine, coast, coastal, coastal land use, coastal zone, blue carbon, mangrove, Tourism and Coastal Zone Management, Coastal zone management |
Water, sanitation and hygiene | sanitation, water and sanitation, sewerage, hygiene, water use, water security, water and energy, water, sanitation and waste |
Cities, settlements and key infrastructure | city, cities, urban, settlement, infrastructure, housing, habitat, industr, waste, transport, energy, landfills, sanitary landfills, mining, mineral resources, mineral products, telecommunications, Infrastructure, transport and building, Infrastructure and services, Cities and Built Environment, Land Use and Human Settlements Development, Habitat, urban planning and development of the territory, Housing, Territorial Development and Urban Planning, Renewable Energy, Urban planning and infrastructure, Urban Development & Tourism, Private sector/trade; Manufacturing; Business process |
Health, wellbeing and communities | health, well-being, wellbeing, nutrition, culture, territorial communit, territorial communities, local knowledge, Human security, Vulnerable communities, Territorial development |
Livelihoods, poverty and sustainable development | social, poverty, people, econom, capacity, education, employment, tourism, rural development, sustainable development, economic and social infrastructure, Livelihoods and poverty, protection, social protection, transfer, income, revenue, social infrastructures, Education, research, Education, training, research, Multiple: Social Economy, Tourism, Multiple: Social Affairs, Women and Family, Multiple: Planning, Rural development, Multiple: Sustainable development, Planning |
Crosscutting | cross-cutting, cross cutting, cross-sectoral, innovation, research, R&D, integration, empower, gender, women, youth, disaster, risk, climate, climate service, meteo, warning, governance, legislation, policy, policies, institution, M&E, devolution, private sector, public sector, territory, spatial planning, planning, weather |
Not specified | Not specified |
GGA Sector Theme | Keywords / Match Terms |
---|---|
Water and sanitation | sanitation, water and sanitation, sewerage, hygiene, water security, water use, water and energy, water, sanitation and waste |
Food and agriculture | agri, agro, agriculture, food, crop, livestock, animal, fish, fisheries, aquacultur, aquaculture, seed, irrigation, value chain, land use, land tenure, land and forestry, agroforest, nutrition, food secur, Agriculture and food security, Food and nutrition, Agriculture, Climate services, Others, Sustainable development, Agriculture, land affairs, land reforms, Agriculture, forest and other land uses |
Health | health, well-being, wellbeing, territorial communit, territorial communities, local knowledge, Human security, Vulnerable communities, Territorial development |
Biodiversity and ecosystems | ocean, marine, coast, coastal land use, coastal zone, blue carbon, mangrove, forest, environ, environment, enviornment, ecosystem, biodiversity, natural, ecology, ecolog, wildlife, REDD, peatland, protected area, freshwater, Terrestrial, Tourism and Coastal Zone Management, Coastal zone management |
Infrastructure and human settlements | city, cities, urban, settlement, infrastructure, housing, habitat, industr, waste, transport, energy, landfills, sanitary landfills, mining, mineral resources, mineral products, telecommunications, Infrastructure, transport and building, Infrastructure and services, Cities and Built Environment, Land Use and Human Settlements Development, Habitat, urban planning and development of the territory, Housing, Territorial Development and Urban Planning, Renewable Energy, Urban planning and infrastructure, Urban Development & Tourism, spatial planning, planning, territory, Private sector/trade; Manufacturing; Business process, private sector, public sector,buildings |
Poverty eradication and livelihoods | social, poverty, people, econom, capacity, education, employment, tourism, rural development, sustainable development, economic and social infrastructure, livelihood, income, revenue, social protection, protection, transfer, social infrastructures, Education, research, Education, training, research, Multiple: Social Economy, Tourism, Multiple: Social Affairs, Women and Family, Multiple: Planning, Rural development, Multiple: Sustainable development, Planning |
Cultural heritage | cultur, cultural heritage, heritage site, traditional knowledge, local knowledge, indigenous |
# Ensure column exists
if (!"SectorType_GGA" %in% names(adaptdata)) adaptdata$SectorType_GGA <- NA_character_
adaptdata <- adaptdata %>%
mutate(
# Use Sector if available; if not, fall back to ElementText
.gga_text = if_else(
!is.na(Sector) & str_trim(Sector) != "",
paste0(as.character(Sector), " ", coalesce(as.character(ElementText), "")),
coalesce(as.character(ElementText), "")
)
) %>%
rowwise() %>%
mutate(
.counts_gga = list(vapply(gga_patterns, function(pat) {
str_count(.gga_text, regex(pat, ignore_case = TRUE))
}, integer(1))),
SectorType_GGA = {
# Special case: Sector exactly "water"
if (!is.na(Sector) && str_to_lower(str_trim(Sector)) == "water") {
"Water and sanitation"
} else {
cnt <- unlist(.counts_gga)
if (all(cnt == 0L)) NA_character_ else {
winners <- which(cnt == max(cnt))
gga_priority[winners[1]]
}
}
}
) %>%
ungroup() %>%
select(-.gga_text, -.counts_gga)
## **SectorType (IPCC):** 6851 of 7295 rows tagged
## **SectorType_GGA (GGA):** 10352 of 7295 rows tagged
## **Coverage Rate:** 93.9 %
# adaptdata_previous<-read.csv("adaptdata_results_nano6.csv")
#
# adaptdata <- adaptdata_previous %>%
# select(-SectorType_GGA) %>% # drop old classification
# left_join(
# adaptdata %>% select(...1, SectorType_GGA),
# by = "...1" # replace with the correct unique key
# )
#
# write.csv(adaptdata,"data/adaptdata_results_nano6.csv")
# # Option 2: If you ran the improved regex classification *directly* on adaptdata
# # then just overwrite:
# adaptdata <- adaptdata %>%
# mutate(SectorType_GGA = adaptdata_new$SectorType_GGA)
## Sector consistency per country
# Elements of interest
elements_focus <- c("Hazard", "System at risk", "Action")
# Count rows per Country, Element, and SectorType_GGA
theme_counts_country <- adaptdata %>%
filter(Element %in% elements_focus,
!is.na(SectorType_GGA),
SectorType_GGA != "",
SectorType_GGA != "NA") %>%
group_by(Country, Element, SectorType_GGA) %>%
summarise(n = n(), .groups = "drop") %>%
arrange(Country, Element, desc(n))
# Interactive table
datatable(theme_counts_country,
options = list(pageLength = 15, autoWidth = TRUE),
rownames = FALSE)
theme_counts_country_all <- adaptdata %>%
filter(!is.na(SectorType_GGA),
SectorType_GGA != "",
SectorType_GGA != "NA") %>%
group_by(Country, Element, SectorType_GGA) %>%
summarise(n = n(), .groups = "drop") %>%
arrange(Country, Element, desc(n))
# Interactive table
datatable(theme_counts_country_all,
options = list(pageLength = 15, autoWidth = TRUE),
rownames = FALSE)
ImpactCategories
.
by regions
This show 10 random rows for each climate impact categories. The raw text is included as well as the keywords that justifiy the category.
Goal. Tag each Action with a single type to describe the primary adaptation approach.
This dataset classifies climate adaptation measures into broad
intervention categories.
Each intervention contains types, which group related
actions, and each type includes specific actions (and
sometimes subactions).
Focus on maintaining and restoring natural systems.
Includes intervention types such as:
- Biodiversity development – actions to enhance
biodiversity (e.g., ecological buffer zones, pollination support).
- Ecological restoration – restoring degraded
ecosystems (e.g., afforestation, climate-resilient trees).
- Green infrastructure – using vegetation and permeable
systems for adaptation (e.g., rain gardens, green roofs).
- Agroforestry – integrating trees into agricultural
systems.
- Animals – livestock and aquaculture resilience.
- Crop diversification – multiple crops to reduce
risk.
- Energy – biogas, improved cookstoves, etc.
- Migration – relocating crops/livestock in response to
climate.
- Nutrient management – compost, manure, fertility
practices.
- Pest and disease management – integrated pest
management, biological control.
- Postharvest – storage, drying, reducing food
loss.
- Schedule modification – shifting planting/harvest
calendars.
- Soil management – mulching, conservation tillage,
improved fallows.
- Water management – irrigation efficiency, water
harvesting.
Policies, governance, and frameworks enabling adaptation.
Includes:
- Economic (policy level) – finance and incentives from
institutions (grants, subsidies, PES).
- Government policies and programs – national/local
adaptation plans, sectoral programs.
- Laws and regulations – binding rules such as zoning,
building codes, protected areas.
- Research development – climate modeling, monitoring,
R&D, policy analysis.
Hard measures, services, and technology.
Includes:
- Engineered and built environment – seawalls, dams,
drainage, resilient transport.
- Services – service delivery platforms like social
protection, healthcare.
- Supply-chain improvement – logistics, storage, market
access.
- Technological – precision agriculture, IoT, improved
breeds/varieties, renewable energy.
Method
NA
if unclear).ActionType_GPT
and previewed in
a datatable.by regions
Action Level classification
Goal. Distinguish whether each Action reported is preparatory groundwork or a substantive adaptation measure.
Definitions
Method
NA
.ActionLevel_GPT
and previewed
in a datatable.
by regions
Goal. Label each Result as one of four types so we can compare what’s being reported across countries and sectors.
Method
- GPT reads the Result text and returns exactly one
label from the list above (or NA
if
unclear).
- We store the model’s reply verbatim in
ResultType_GPT
(no normalization here); any cleanup or
remapping happens later.
by regions
List of adaptation priorities : - Water use efficiency &
demand management — Reduce water losses and improve irrigation
efficiency/productivity.
- Alternative & non-conventional water resources —
Options like rainwater harvesting, greywater reuse, desalination, and
storage.
- Agribusiness enhancement & private sector
development — Strengthening agribusinesses, value chains, and
SME/market linkages.
- Legal, policy & institutional frameworks — Laws,
policies, governance, and institutional capacity for adaptation.
- Climate-smart agriculture (CSA) & resilient farming
systems — Practices like crop diversification, agroforestry,
and soil management.
- Sustainable land & farm management — Sustainable
land practices, organic/landscape approaches.
- Climate-resilient crops & seeds — Breeding and
access to drought/heat/pest-resistant varieties and seeds.
- Farmer capacity, extension & knowledge services —
Training, extension services, and farmer knowledge exchange.
- Agricultural water management — On-farm water
efficiency, watershed management, and irrigation.
- Rural development & livelihood diversification —
Enhancing rural infrastructure, markets, and income sources.
- Ecosystem protection, restoration & protected
areas — Conserving and restoring ecosystems and
biodiversity.
- Combat land degradation & desertification —
Tackling soil erosion, fertility loss, and desertification.
- Afforestation, reforestation & carbon sinks —
Tree planting and forest cover expansion for carbon sequestration.
- Invasive species management — Control and management
of invasive alien species.
- Land-use & spatial planning integration —
Adaptation in zoning, territorial, and land-use planning.
- Monitoring, data & hydrological observation —
Climate monitoring, data systems, and hydrological observation.
- National adaptation frameworks & strategies —
National-level plans/frameworks (e.g., NAP, NCCAS).
- Local adaptation capacity & community empowerment
— Community-based adaptation and local empowerment.
- Public awareness, education & engagement —
Campaigns, curricula, and awareness programs.
- Institutional & technical capacity building —
Training, institutional strengthening, inter-agency coordination.
- Policy mainstreaming & regulatory integration —
Integrating adaptation into policies and regulations.
- Cross-sectoral collaboration & partnerships —
Partnerships and coordination across sectors.
- Data management, knowledge & information services
— Knowledge management, sharing, and tools.
- Evidence base, risk & impact assessments —
Vulnerability, risk, and impact assessments.
- Subnational adaptation priorities & sector
packages — Region- or sector-specific strategies and action
packages.
- Agro-meteorological & climate information
services — Forecasts, advisories, and information services for
agriculture.
- Post-harvest, food loss reduction & risk transfer
— Reducing losses, value addition, insurance, and cold chains.
- Climate-resilient livestock systems — Livestock
management, resilient breeds, and animal health.
- Research, innovation & investment (agri/health) —
Research, technology, and innovation in agriculture and health.
- Fire management & response capability — Fire
prevention, management, and emergency response.
- Community-based forest management &
re/afforestation — Local forest management and
reforestation.
- Biodiversity assessment & monitoring —
Biodiversity inventories, monitoring, and assessments.
- Public health: surveillance, systems & workforce
— Disease surveillance, early warning, and health workforce.
- Public health infrastructure & services —
Facilities, WASH, energy, and resilient health services.
- Urban climate resilience & planning —
Climate-proofed urban planning and infrastructure.
- Urban green/blue infrastructure & heat mitigation
— Greening, cooling, water-sensitive infrastructure.
- Disaster risk reduction, EWS & emergency
management — DRR, early warning systems, and emergency
response.
- Energy & grid resilience — Strengthening energy
infrastructure and supply chains.
- Coastal & marine ecosystem protection/restoration
— Mangroves, reefs, wetlands, and coastal restoration.
- Agricultural resource efficiency & high-standard
farmland — Efficient farmland, fertilizer, and input use.
- Water allocation, security & quantified targets —
Ensuring water security with defined consumption/coverage targets.
- Transport infrastructure & services adaptation —
Climate-proofing roads, ports, rail, and mobility.
- Industry & mining adaptation — Industrial
resilience and climate-compatible mining.
- Sectoral adaptation plans & guidelines —
Guidelines and action plans for specific sectors (e.g., tourism).
- Finance, insurance & social protection — Climate
finance, insurance, and social safety nets.
- Tourism sector adaptation — Adaptation strategies for
tourism and ecotourism.
- Gender & social inclusion — Gender-sensitive and
socially inclusive adaptation approaches.
by regions
library(dplyr)
library(stringr)
library(tidyr)
library(ggplot2)
library(DT)
region_key <- overview %>%
select(Country, Region) %>%
distinct()
# add manual fixes for missing countries
manual_add <- tibble::tribble(
~Country, ~Region,
"Moldova", "Eastern Europe",
"Phillipines", "Asia-Pacific",
"Russia", "Eastern Europe",
"Türkiye", "Asia-Pacific",
"USA", "Western Europe and Other states"
)
# combine with existing region_key
region_key <- region_key %>%
bind_rows(manual_add) %>%
distinct()
adaptdata_with_region <- adaptdata %>%
left_join(region_key, by = "Country")
# 1) Counts by Region x Priority
priority_counts_region <- adaptdata_with_region %>%
filter(tolower(Element) == "adaptation priorities",
!is.na(PriorityCategories),
str_detect(PriorityCategories, "\\S")) %>%
separate_rows(PriorityCategories, sep = ";") %>%
mutate(
PriorityCategories = str_squish(PriorityCategories),
Region = if_else(is.na(Region) | !nzchar(Region), "Unknown", Region)
) %>%
filter(PriorityCategories != "") %>%
count(Region, PriorityCategories, sort = TRUE) %>%
rename(`Priority Category` = PriorityCategories,
Occurrences = n)
# 2) Pick TOP 20 categories by total across regions
top20 <- priority_counts_region %>%
group_by(`Priority Category`) %>%
summarise(Total = sum(Occurrences), .groups = "drop") %>%
slice_max(Total, n = 20, with_ties = FALSE)
priority_counts_region_top20 <- priority_counts_region %>%
semi_join(top20, by = "Priority Category") %>%
left_join(top20, by = "Priority Category") %>%
mutate(`Priority Category` = reorder(`Priority Category`, Total)) %>%
arrange(Total)
# 3) Stacked plot by Region (top 20 only)
priority_plot_region <- ggplot(priority_counts_region_top20,
aes(x = `Priority Category`, y = Occurrences, fill = Region)) +
geom_col(position = "stack") +
coord_flip() +
scale_fill_viridis_d(option = "plasma", name = "Region") +
labs(title = "Top 20 adaptation priorities by UNFCCC region (stacked)",
x = NULL, y = "Occurrences") +
theme_minimal(base_size = 10)
priority_plot_region
# ---------------- Setup ----------------
# suppressPackageStartupMessages({
# library(dplyr); library(stringr); library(glue); library(DT); library(ellmer); library(tidyr)
# })
#
# # Model + batching
# BARRIER_MODEL <- "gpt-5-nano"
# CHUNK_SIZE <- 30
# MAX_LABELS_PER_PASSAGE <- 3
#
# # ---- API key ----
# btr_key <- Sys.getenv("BTR_KEY")
# stopifnot("BTR_KEY is not set in your environment." = nzchar(btr_key))
# Sys.setenv(OPENAI_API_KEY = btr_key)
#
# # ---------------- Canonical labels ----------------
# BARRIER_LABELS <- c(
# "Financial",
# "Economic",
# "Human capacities",
# "Informational",
# "Institutional",
# "Organizational",
# "Technological",
# "Physical",
# "Social/ cultural",
# "Biological",
# "Other emerging issues"
# )
#
# # Map small variants to the exact allowed strings (extra safety)
# canonicalize_barrier_label <- function(x) {
# if (is.null(x) || is.na(x)) return("Other emerging issues")
# raw <- tolower(trimws(x))
# raw <- gsub("[[:punct:]]+$", "", raw)
# raw <- gsub("\\s+", " ", raw)
#
# map <- list(
# "financial" = c("financial","financing","funding","budget","costs"),
# "economic" = c("economic","macro-economic","macroeconomic","market","livelihood"),
# "human capacities" = c("human capacity","human capacities","capacity","capacities","skills","staffing"),
# "informational" = c("informational","information","knowledge","data gaps","monitoring and data","evidence","m&e","monitoring"),
# "institutional" = c("institutional","governance","policy","regulatory","mainstreaming"),
# "organizational" = c("organizational","organisational","coordination","mandates","roles and responsibilities","role clarity"),
# "technological" = c("technological","technology","tech","innovation","digital"),
# "physical" = c("physical","biophysical","infrastructure","geographical","terrain","remoteness"),
# "social/ cultural" = c("social/ cultural","social/cultural","sociocultural","social and cultural","social-cultural","social","cultural"),
# "biological" = c("biological","bio-physical limits","biophysical limits","climatic limits","ecophysiological"),
# "other emerging issues"= c("other emerging issues","other","emerging issues","pandemics","covid","invasions","invasion")
# )
# for (k in names(map)) if (raw %in% map[[k]]) {
# return(BARRIER_LABELS[match(tolower(k), tolower(BARRIER_LABELS))])
# }
# hit <- BARRIER_LABELS[tolower(BARRIER_LABELS) == raw]
# if (length(hit)) return(hit)
# "Other emerging issues"
# }
#
# canonicalize_barrier_vector <- function(v) {
# v <- v[ nzchar(trimws(v)) ]
# if (!length(v)) return("Other emerging issues")
# v <- unique(vapply(v, canonicalize_barrier_label, character(1)))
# # keep at most MAX_LABELS_PER_PASSAGE
# v <- v[seq_len(min(length(v), MAX_LABELS_PER_PASSAGE))]
# paste(v, collapse = "; ")
# }
#
# # ---------------- Multi-label batch prompt ----------------
# make_barrier_batch_prompt <- function(txts) {
# glue::glue(
# "Task: For each passage, select all applicable *Adaptation barriers* (1–{MAX_LABELS_PER_PASSAGE}) using ONLY the labels below.
# If uncertain, choose the closest categories (do NOT output NA). Return N lines, same order; separate multiple with '; '.
#
# Allowed labels (exact strings):
# - Financial — inadequate/lack of funds or budgets; affordability/cost constraints.
# - Economic — constraints from current livelihoods, market structure, macro-economy, and the development level of key sectors.
# - Human capacities — gaps in skills/training/education and adequate staffing at individual/organizational/societal levels.
# - Informational — gaps in information/awareness/knowledge/data/monitoring needed to guide or assess adaptation; ALSO gaps in data infrastructure and knowledge-management systems (platforms, registries, databases, MIS, M&E systems).
# - Institutional — weaknesses in policies/regulations/plans or inadequate mainstreaming of adaptation into other policies.
# - Organizational — weak organizations/mandates/coordination/designated entities; poor stakeholder inclusion/participation.
# - Technological — limited access to technologies/innovation/equipment, INCLUDING limitations of manmade/built infrastructure.
# - Physical — barriers from the natural physical environment (terrain, remoteness, topography, soils, floodplains, coastlines).
# - Social/ cultural — norms, values, identity, beliefs, place attachment, justice/equity, social support, security issues.
# - Biological — biophysical/climatic/physiological limits (extreme temperature/precipitation/salinity/acidity/extreme-event frequency).
# - Other emerging issues — contextual shocks (e.g., pandemics, invasions).
#
# Disambiguation (key cues):
# - Data/knowledge/M&E platforms/registries/databases → Informational (not Institutional).
# - Coordination/mandates/role clarity/designated entities → Organizational.
# - Manmade/built infrastructure constraints → Technological (Physical is ONLY natural environment).
# - Budget/funding/affordability → Financial; broader livelihoods/markets/macroeconomy → Economic.
#
# Output format: EXACT label strings separated by '; ' (max {MAX_LABELS_PER_PASSAGE}). No extra words.
#
# INPUTS (numbered):
# {paste0(sprintf('%d) %s', seq_along(txts), txts), collapse = '\n')}"
# )
# }
#
# # ---------------- Batch classifier (multi-label) ----------------
# classify_barrier_type_batch <- function(txts, model = BARRIER_MODEL, chunk_size = CHUNK_SIZE) {
# if (!length(txts)) return(character(0))
# chat <- ellmer::chat_openai(model = model)
# out <- vector("character", length(txts))
#
# idx <- seq_along(txts)
# chunks <- split(idx, ceiling(idx / chunk_size))
#
# pb <- txtProgressBar(min = 0, max = length(chunks), style = 3)
# on.exit(close(pb), add = TRUE)
#
# for (i in seq_along(chunks)) {
# ids <- chunks[[i]]
# prompt <- make_barrier_batch_prompt(txts[ids])
#
# ans <- tryCatch(chat$chat(prompt), error = function(e) e)
# if (inherits(ans, "error")) {
# warning("GPT batch failed: ", conditionMessage(ans), " — defaulting to 'Other emerging issues' for this chunk.")
# out[ids] <- "Other emerging issues"
# setTxtProgressBar(pb, i); next
# }
#
# lines <- strsplit(ans, "\\r?\\n", perl = TRUE)[[1]]
# lines <- trimws(lines)
# lines <- lines[nzchar(lines)]
# lines <- sub("^[0-9]+[.)\\-:]\\s*", "", lines, perl = TRUE)
#
# # align to input size
# if (length(lines) < length(ids)) lines <- c(lines, rep("Other emerging issues", length(ids) - length(lines)))
# if (length(lines) > length(ids)) lines <- lines[seq_along(ids)]
#
# # split by ';', canonicalize each, cap to MAX_LABELS_PER_PASSAGE, then collapse back
# out[ids] <- vapply(lines, function(s) {
# labs <- unlist(strsplit(s, "\\s*;\\s*", perl = TRUE))
# canonicalize_barrier_vector(labs)
# }, character(1))
#
# setTxtProgressBar(pb, i)
# }
# out
# }
#
# # ---------------- Run on your data ----------------
# if (!"BarrierType_GPT" %in% names(adaptdata)) adaptdata$BarrierType_GPT <- NA_character_
#
# bar_idx <- which(
# grepl("barrier", adaptdata$Element, ignore.case = TRUE) |
# grepl("barrier", adaptdata$ElementLabel, ignore.case = TRUE)
# )
#
# if (length(bar_idx)) {
# cat("Classifying adaptation barriers for", length(bar_idx), "rows (multi-label, batched)...\n")
# adaptdata$BarrierType_GPT[bar_idx] <-
# classify_barrier_type_batch(adaptdata$ElementText[bar_idx],
# model = BARRIER_MODEL,
# chunk_size = CHUNK_SIZE)
# } else {
# cat("No rows with Element/ElementLabel containing 'barrier' found.\n")
# }
#write.csv(adaptdata,"data/adaptdata_barriers.csv")
# ---------------- Explode to row-per-barrier (to mirror manual duplication) ----------------
# adaptdata_barriers_long <- adaptdata %>%
# filter(row_number() %in% bar_idx) %>%
# mutate(BarrierType_GPT = ifelse(is.na(BarrierType_GPT) | !nzchar(BarrierType_GPT),
# "Other emerging issues", BarrierType_GPT)) %>%
# separate_rows(BarrierType_GPT, sep = "\\s*;\\s*")
#
# # Quick preview
# DT::datatable(
# adaptdata_barriers_long %>%
# select(Country, Document, Element, ElementLabel, ElementText, BarrierType_GPT),
# escape = FALSE,
# caption = "🚧 Adaptation barriers — GPT (exploded to one row per barrier)",
# options = list(pageLength = 10, autoWidth = TRUE, dom = "tip")
# )
#
# # Optional: counts
# barrier_counts <- adaptdata_barriers_long %>%
# count(BarrierType_GPT, sort = TRUE)
# print(barrier_counts)
#
# # Optional save
# # write.csv(adaptdata_barriers_long, "data/adaptdata_barriers_long_nano6.csv", row.names = FALSE)
#
# # ---------------- Helper: agreement vs manual after dedup ----------------
# # Expects a data.frame `a` with columns: Element, ElementText, BarrierType (manual), BarrierType_GPT (semicolon string or exploded)
#
# a<-adaptdata_barriers_long%>%select(Element,ElementText,BarrierType,BarrierType_GPT)
#
# compare_barrier_agreement <- function(a) {
# # Explode GPT side if needed
# a_exp <- a %>%
# mutate(BarrierType_GPT = ifelse(is.na(BarrierType_GPT), "", BarrierType_GPT)) %>%
# separate_rows(BarrierType_GPT, sep = "\\s*;\\s*")
#
# # Deduplicate manual rows: one row per (Element, ElementText, BarrierType)
# a_dedup <- a_exp %>%
# group_by(Element, ElementText, BarrierType) %>%
# summarise(BarrierType_GPT = first(BarrierType_GPT), .groups = "drop") %>%
# mutate(
# BarrierType_norm = tolower(trimws(BarrierType)),
# BarrierType_GPT_norm= tolower(trimws(BarrierType_GPT)),
# match = BarrierType_norm == BarrierType_GPT_norm
# )
#
# agreement <- mean(a_dedup$match, na.rm = TRUE)
#
# confusion <- a_dedup %>%
# count(Manual = BarrierType, GPT = BarrierType_GPT) %>%
# arrange(desc(n))
#
# list(agreement = agreement, confusion = confusion, n = nrow(a_dedup))
# }
#
# ```
#
# ```{r}
# a_dedup <- a %>%
# group_by(Element, ElementText, BarrierType) %>%
# summarise(BarrierType_GPT = first(BarrierType_GPT), .groups = "drop")
#
# # 2. Compare manual vs GPT
# a_dedup <- a_dedup %>%
# mutate(match = tolower(trimws(BarrierType)) == tolower(trimws(BarrierType_GPT)))
#
# # 3. Agreement rate
# agreement <- mean(a_dedup$match, na.rm = TRUE)
adaptdata<-read.csv("data/adaptdata_barriers.csv")
# a<-adaptdata_barriers_long%>%filter(Element=="Barriers")%>%select(RowID,Country,Element,ElementText,BarrierType,BarrierType_GPT)%>%distinct
# Build region key (+ manual fixes)
region_key <- overview %>%
dplyr::select(Country, Region) %>%
dplyr::distinct()
manual_add <- tibble::tribble(
~Country, ~Region,
"Moldova", "Eastern Europe",
"Phillipines", "Asia-Pacific",
"Russia", "Eastern Europe",
"Türkiye", "Asia-Pacific",
"USA", "Western Europe and Other states"
)
region_key <- region_key %>%
dplyr::bind_rows(manual_add) %>%
dplyr::distinct()
# Join regions
adaptdata_with_region <- adaptdata %>%
dplyr::left_join(region_key, by = "Country")
# 1) Counts by Region x Barrier (using BarrierType_GPT; split on ';')
barrier_counts_region <- adaptdata_with_region %>%
dplyr::filter(stringr::str_to_lower(Element) == "barriers",
!is.na(BarrierType_GPT),
stringr::str_detect(BarrierType_GPT, "\\S")) %>%
tidyr::separate_rows(BarrierType_GPT, sep = "\\s*;\\s*") %>% # split multi-label cells
dplyr::mutate(
BarrierType_GPT = stringr::str_squish(BarrierType_GPT),
Region = dplyr::if_else(is.na(Region) | !nzchar(Region), "Unknown", Region)
) %>%
dplyr::filter(BarrierType_GPT != "") %>%
# avoid counting duplicate barrier labels for identical passages
dplyr::distinct(Country, Document, Element, ElementLabel, ElementText, Region, BarrierType_GPT) %>%
dplyr::count(Region, BarrierType_GPT, sort = TRUE) %>%
dplyr::rename(`Barrier Type` = BarrierType_GPT,
Occurrences = n)
# 2) Pick TOP 20 categories by total across regions
top20 <- barrier_counts_region %>%
dplyr::group_by(`Barrier Type`) %>%
dplyr::summarise(Total = sum(Occurrences), .groups = "drop") %>%
dplyr::slice_max(Total, n = 20, with_ties = FALSE)
barrier_counts_region_top20 <- barrier_counts_region %>%
dplyr::semi_join(top20, by = "Barrier Type") %>%
dplyr::left_join(top20, by = "Barrier Type") %>%
dplyr::mutate(`Barrier Type` = reorder(`Barrier Type`, Total)) %>%
dplyr::arrange(Total)
# 3) Stacked plot by Region (top 20 only)
barrier_plot_region <- ggplot2::ggplot(barrier_counts_region_top20,
ggplot2::aes(x = `Barrier Type`, y = Occurrences, fill = Region)) +
ggplot2::geom_col(position = "stack") +
ggplot2::coord_flip() +
ggplot2::scale_fill_viridis_d(option = "plasma", name = "Region") +
ggplot2::labs(title = "Adaptation barriers by UNFCCC region",
x = NULL, y = "Occurrences") +
ggplot2::theme_minimal(base_size = 10)
barrier_plot_region
#Step 9 : Summary table for Sankey plot
# Load the new dataset
# adaptdata_new <- read.csv("adaptdata_action_level_nano6.csv")
#
# # Make sure both have RowID
# if(!"RowID" %in% names(adaptdata_new)) stop("RowID missing in new data")
# if(!"...1" %in% names(adaptdata)) stop("RowID missing in current adaptdata")
#
# # Keep only the mapping from old data
# gga_map <- adaptdata %>%
# rename(RowID="...1")%>%
# select(RowID, SectorType_GGA)
#
# # Replace SectorType_GGA in the new dataset
# adaptdata <- adaptdata_new %>%
# select(-SectorType_GGA) %>% # drop old col
# left_join(gga_map, by = "RowID") # bring in correct values
#
#
# # Optionally save
# # write.csv(adaptdata_merged, "adaptdata_with_correct_GGA.csv", row.names = FALSE)
#
#
# # ----------------------------
# # 1) Pick your source data
# # ----------------------------
# # Replace `df` with your actual data frame (e.g., df <- check or df <- adaptdata)
# df <- adaptdata # or: df <- adaptdata
#
# # ----------------------------
# # 2) Normalize key columns
# # GGA theme, Action level, Action type
# # ----------------------------
# actions <- df %>%
# filter(tolower(Element) == "action") %>%
# mutate(
# GGA = str_squish(SectorType_GGA),
# ActionLevel = str_squish(coalesce(ActionLevel_GPT, Action.level)),
# # Choose which "type" you want to show in the Sankey’s last column:
# # - ActionType_GPT (your action-type buckets), or
# # - InterventionType_GPT (your intervention-type buckets)
# ActionType = str_squish(coalesce(ActionType_GPT, InterventionType_GPT))
# ) %>%
# # keep only rows that have at least GGA and ActionLevel
# filter(!is.na(GGA), GGA != "",
# !is.na(ActionLevel), ActionLevel != "")
#
# # split multiple ActionTypes (e.g., "economic; institutional")
# actions_long <- actions %>%
# separate_rows(ActionType, sep = ";|,") %>%
# mutate(ActionType = str_squish(ActionType)) %>%
# filter(!is.na(ActionType), ActionType != "")
#
# # ----------------------------
# # 3) Count linkages for the Sankey
# # ----------------------------
# # (a) GGA -> ActionLevel
# edges_gga_level <- actions %>%
# count(source = GGA, target = ActionLevel, name = "value", sort = TRUE) %>%
# mutate(stage = "GGA→ActionLevel")
#
# # (b) ActionLevel -> ActionType
# edges_level_type <- actions_long %>%
# count(source = ActionLevel, target = ActionType, name = "value", sort = TRUE) %>%
# mutate(stage = "ActionLevel→ActionType")
#
# # Combine into one edge list
# sankey_edges <- bind_rows(edges_gga_level, edges_level_type)
#
# # ----------------------------
# # 4) (Optional) Triple counts
# # GGA x ActionLevel x ActionType — useful for QA or other visuals
# # ----------------------------
# triples <- actions_long %>%
# count(GGA, ActionLevel, ActionType, name = "value", sort = TRUE)
#
# # ----------------------------
# # 5) (Optional) Nodes table (for tools that want nodes + edges)
# # ----------------------------
# nodes <- sankey_edges %>%
# select(source) %>% rename(node = source) %>%
# bind_rows(sankey_edges %>% select(target) %>% rename(node = target)) %>%
# distinct() %>%
# arrange(node) %>%
# mutate(id = row_number() - 1L) # 0-based IDs
#
# # ----------------------------
# # 6) Export for Sankey tools / SankeyMATIC
# # ----------------------------
# write_csv(sankey_edges, "sankey_edges_gga_level_type.csv")
# write_csv(triples, "sankey_triples_gga_level_type.csv")
Method overview
Goal. Check how well countries’ actions/results target the same GGA themes where risks/impacts are most prominent—and visualize any gaps.
Inputs - Core fields:
Country
, Element
(System at risk,
Climate impact, Action, Result),
SectorType_GGA
(GGA theme). - Optional:
ResultType_GPT
(Output / Outcome /
Impact) for weighting Result rows.
Cleaning & mapping - Clean:
trim text, standardize blanks, drop placeholder “NA” themes. -
Map to two sides: - Risk/Impact:
Element ∈ {System at risk, Climate impact}
-
Action/Result: Element ∈ {Action, Result}
- Themes: fix a consistent theme order; include only
themes present in the data.
Counting → shares For each country ×
theme: 1. Risk count: Risk_Count
= rows on the Risk/Impact side.
2. Action count: Action_Count
= rows on
the Action/Result side
(optionally weight Result rows by ResultType_GPT
;
Actions = 1). 3. Shares (within-side):
Risk_Share = Risk_Count / Σ Risk_Count
Action_Share = Action_Count / Σ Action_Count
Theme gap (delta)
Delta = Action_Share − Risk_Share
per theme.
Positive = actions over-represented vs risk;
negative = under-represented.
Coherence score (0–1) Compare the two share vectors (Risk vs Action) using: - Cosine similarity (pattern overlap), and - Jensen–Shannon similarity (distributional closeness).
Final score = average of the two, clipped to
[0, 1].
Higher = better alignment between where risks appear and where actions
focus.
Figures - Country paired bars: Risk
vs Action shares by theme (within a country). -
Stacked bars (counts): Absolute composition by theme
for each side (no forced 100%). - Coherence
summary: One bar per country (higher = more aligned). -
Delta heatmap: Action_Share − Risk_Share
by country × theme
(red = over-actioned; blue = under-actioned).
Interpretation - A theme appears in the Risk share if tagged under System at risk or Climate impact. - A theme appears in the Action share if tagged under Action or Result. - Alignment: largest risk themes ≈ largest action themes; misalignment highlights priority gaps.
##
## Saved coherence outputs to:
## outputs/coherence/coherence_summary_ALL.csv
## outputs/coherence/gga_risk_vs_action_tables_ALL.csv
## outputs/coherence/figs/coherence_summary_top20.png
## outputs/coherence/figs/country_stacked_bars_counts_top10_twoBars.png
## outputs/coherence/figs/delta_heatmap_top20.png
## outputs/coherence/figs/South_Africa_gga_balance_top20.png
## outputs/coherence/figs/Egypt_gga_balance_top20.png
## outputs/coherence/figs/Maldives_gga_balance_top20.png
## outputs/coherence/figs/Seychelles_gga_balance_top20.png
## outputs/coherence/figs/Denmark_gga_balance_top20.png
## outputs/coherence/figs/Bulgaria_gga_balance_top20.png
## outputs/coherence/figs/Gabon_gga_balance_top20.png
## outputs/coherence/figs/Kenya_gga_balance_top20.png
## outputs/coherence/figs/Moldova_gga_balance_top20.png
## outputs/coherence/figs/Nepal_gga_balance_top20.png
## outputs/coherence/figs/Latvia_gga_balance_top20.png
## outputs/coherence/figs/New_Zealand_gga_balance_top20.png
## outputs/coherence/figs/Canada_gga_balance_top20.png
## outputs/coherence/figs/Estonia_gga_balance_top20.png
## outputs/coherence/figs/Azerbaijan_gga_balance_top20.png
## outputs/coherence/figs/Lebanon_gga_balance_top20.png
## outputs/coherence/figs/Portugal_gga_balance_top20.png
## outputs/coherence/figs/Chile_gga_balance_top20.png
## outputs/coherence/figs/France_gga_balance_top20.png
## outputs/coherence/figs/Indonesia_gga_balance_top20.png
Hazard Type:
A few additional climate-related keywords were added to improve
coverage. For example, terms like “dry days”, “surface air
temperature”, and “heat stress” appeared frequently and
were added under appropriate hazard categories.
However, we also encountered terms such as “runoff”, “solar
radiation”, and “winds” that were not clearly assignable
to a single hazard category and remain unclassified for now.
System at Risk:
We expanded the keyword list to better capture common themes in the
data:
⚠️ Note: We observed several rows tagged as System at risk that likely describe hazards rather than systems, as they do not mention any system but describe a climate hazard (e.g., drought or storms). This may warrant a second look or reclassification. Some examples
Sector Tagging (IPCC & GGA):
Sector tagging shows strong overall coverage,
particularly due to the mapping from both direct sector mentions and
SystemType
.
That said, a number of rows still remain unclassified,
either because sector information wasn’t mentioned explicitly in the
original source or because it was too ambiguous to map reliably. It’s
unclear whether these untagged rows should be a concern, but they could
be flagged for manual review depending on use case.
Social
Adaptation driven by people, communities, and knowledge.
Includes:
- Behavioural – household/community practices (diet change, water storage, advocacy).
- Educational – awareness, training, gender & inclusion, extension services.
- Informational – climate/market/health information, early warning systems.
- Economic (household level) – income diversification, livelihood strategies.