Load packages

Preparation

To run this code, download the file “AdaptationElements” available at: https://doi.org/10.7910/DVN/VK3CP9. Adjust the file path below with the correct location where you would saved the “AdaptationElements” file

Step 1: Hazard type

This section classifies climate-related hazards mentioned in the ElementText column of the adaptdata dataset using regex-based keyword matching, aligned with the protocol’s definitions.

Column Creation
A new column HazardType is added to store the hazard category for each entry. This column is positioned after ElementText.
Keyword Mapping
Hazard categories are mapped to regex patterns that represent keywords found in the protocol definitions. These keywords are matched case-insensitively within the ElementText.
Tagging
The script loops over all hazard patterns and tags any matching rows accordingly.

Hazard Categories and Associated Keywords

Hazard Category	Keywords Used
Extreme temperature	heat wave, heatwave, excessive heat, high temperature, extreme temperature, extreme heat, extremely low, cold waves, cold wave, snow, ice, frost, freeze, severe winter, maximum and minimum, warmer, higher occurrence of hot, glacier, snowfall, evapo, extreme weather, heat stress, hot days, hot nights
Storm	storm, tropical storm, cyclone, cyclones, Cyclonic activity, typhoon, typhoons, hail, lightning, thunderstorm, heavy rain, windstorm, sand storm, dust storm, tornado, violent rains, torrential, strong winds
Drought	drought, drought cycle, prolonged droughts, drougths, dry spell, dry days, aridity
Wildfire	wildfire, wild fire, forest fire, land fire, bush fire, pasture fire, fire
Landslide	landslide, land slide
Flood	flood, coastal flood, riverine flood, flash flood, ice jam flood, inundation
Change in temperature	change in temperature, alteration in average temperature, temperature change, temperature rise, temperature drop, consistent change in temperature, rise in temperature, increase in temperature, increases in temperature, temperature increase, increasing temperature, increased temperature, increase temperatures, higher temperatures, average temperature, annual temperature, annual mean temperature, annual air temperature, minimum temperatures, maximum temperatures, number of warm days, rising temperatures, warming temperatures
Change in precipitation	change in precipitation, alteration in precipitation, precipitation patterns, shift in timing, shift in amount, shift in intensity, shift in frequency, rainfall, precipitation, distribution of rains, disruption of rains, fluctuation of rains, shorter rain, earlier and ending later
Salinization	salinization, salt content, saltwater, salt water, salinity, increase in salt content
Land degradation	land degradation, decline in land quality, decline in land health, pasture degradation, desertification, loss of organic matter, degradation of land, erosion, soil erosion
Sea level rise	sea level rise, sea level, increase in sea level, coastal flooding, coastal erosion, beach loss, coastline retreat, submersion, water mass
Sea temperature	sea temperature, sea surface temperature, change in sea temperature, ocean temperature, water temperature, surface temperature, seawater surface
Ocean acidification	ocean acidification, reduction in pH, acidity, acidification, acidic, coral
Pest and disease	pest, disease, epidemic, infestation, invasion, insect infestation, vector-borne, biological event, invasive species

Coverage of hazard type

## Hazard rows assigned a category: 1478 out of 1781

## **Coverage Rate:** 83 %

Hazard type summary data

by regions

Hazard type raw data

Step 2: Systems at risk

We assign a SystemType to each row of adaptdata where the element is “System at risk”. This classification is based on the official reporting protocol definitions, which describe different climate-sensitive systems affected by hazards (e.g., agriculture, biodiversity, infrastructure).

To do this, we use a dictionary of keyword patterns that match text found in the ElementText column. These keywords are derived from the protocol and expanded where needed to capture variations in language.

Only rows where the element is “System at risk” are considered.

The following table summarizes the system categories used for classification and the corresponding keyword patterns:

System Type	Keywords Used
Crop	crop, cropping systems, crop production, yield, cultivated area, agriculture, agricultural, agricultural production, agricultural pests, pest, irrigated, rain-fed, farming, land use planning, crop loss, agroecological zone, production
Livestock	livestock, pasture, pastoralist, pastoral area, grazing, animal health, productivity losses, herder, livestock loss
Fisheries and aquaculture	fish, fisheries, aquaculture, fishing, marine harvest
Forest	forest, forestry, forest product, tree, non-timber
Terrestrial	terrestrial ecosystem, terrestrial, drylands, land resource, natural resource, ecosystem structure, ecosystem services, ecological system, desertification, land degradation, soil degradation, soil erosion, environment
Freshwater	freshwater, wetlands, inland wetlands, water resource, drinking water, potable water, water quality, water availability, water supply, river, water scarcity, water shortage, hydrological cycle, water stress, water table, water source, groundwater, water supplied, dams, eutrophication, algal bloom
Biodiversity	biodiversity, flora, fauna, species, species extinction, extinction, range of species, ecosystem change
Coastal	coast, coastal, marine, mangrove, ocean, coral, beach, coastal erosion, sea level rise, blue carbon, coastal ecosystem, coastal zone
Food and nutrition	food security, food insecure, food insecurity, nutrition, malnutrition, hunger, food safety, food availability, famine, undernutrition, overnutrition, obesity
Gender and inclusion	gender, women, youth, children, elderly, inclusion, social exclusion, vulnerable group, vulnerable population, minority group, indigenous, small-scale producer, pastoralist, fishing communities, forest-based communities, high-risk regions
Livelihoods and poverty	livelihood, poverty, income, employment, labor, economic activity, loss of income, loss of livelihood, safety net, insurance, socio-economic development, tourism, subsistence, workforce, economic, tourists, outdoor activities, ski, vacation
Health	health, mental health, morbidity, mortality, vector-borne disease, water-borne disease, infectious disease, respiratory disease, malaria, epidemic, climate-sensitive disease, heat morbidity, disease, deaths, human lives, pollution, life expectancy
Infrastructure and services	infrastructure, critical infrastructure, services, critical services, road, bridge, electricity, power supply, energy, water supply, sanitation, hygiene, education, school, building, housing, settlement, evacuation, telecom, transport, waste management, power, hydropower, railways, port, industry, material, industries
Human security and Peace	migration, displacement, conflict, armed conflict, national security, human security, organized conflict, climate-induced migration, refugee, peace

Coverage of system at risk type

## **System at Risk Coverage:** 1428 of 1605 rows were successfully assigned a category.

## **Coverage Rate:** 89 %

System at risk type summary data

by regions

System at risk type raw data

Step 3: Sectors

We classify sectors in two ways:

SectorType (IPCC categories) — based on keywords found in the Sector and SystemType columns. This follows the sector taxonomy defined in the MPGs (Table 1).
SectorType_GGA (GGA themes) — which maps the sectors into broader Global Goal on Adaptation (GGA) thematic groups as per the protocol.

The tagging logic first applies regular expressions to the text in Sector, with some fallback rules based on the SystemType when relevant. The classification is additive and accounts for overlapping sector concepts.

Below is a summary table of sector categories and the associated keywords used.

SectorType (IPCC categories) - Keywords used

IPCC Sector Category	Keywords / Match Terms
Food, fiber and other ecosystem products	agri, agro, agriculture, food, crop, livestock, animal, fish, fisheries, aquacultur, aquaculture, seed, irrigation, value chain, land use, land tenure, land and forestry, agroforest, Agriculture and food security, Agriculture, Climate services, Others, forest and other land uses, Agriculture, forest and other land uses, land affairs, land reforms, Food and nutrition, Crop, Livestock, Fisheries and aquaculture, Sustainable development, Agriculture
Terrestrial and freshwater ecosystems	forest, environ, environment, enviornment, ecosystem, biodiversity, natural, ecology, ecolog, wildlife, REDD, peatland, protected area, Terrestrial, Freshwater, land
Ocean and coastal ecosystems	ocean, marine, coast, coastal, coastal land use, coastal zone, blue carbon, mangrove, Tourism and Coastal Zone Management, Coastal zone management
Water, sanitation and hygiene	sanitation, water and sanitation, sewerage, hygiene, water use, water security, water and energy, water, sanitation and waste
Cities, settlements and key infrastructure	city, cities, urban, settlement, infrastructure, housing, habitat, industr, waste, transport, energy, landfills, sanitary landfills, mining, mineral resources, mineral products, telecommunications, Infrastructure, transport and building, Infrastructure and services, Cities and Built Environment, Land Use and Human Settlements Development, Habitat, urban planning and development of the territory, Housing, Territorial Development and Urban Planning, Renewable Energy, Urban planning and infrastructure, Urban Development & Tourism, Private sector/trade; Manufacturing; Business process
Health, wellbeing and communities	health, well-being, wellbeing, nutrition, culture, territorial communit, territorial communities, local knowledge, Human security, Vulnerable communities, Territorial development
Livelihoods, poverty and sustainable development	social, poverty, people, econom, capacity, education, employment, tourism, rural development, sustainable development, economic and social infrastructure, Livelihoods and poverty, protection, social protection, transfer, income, revenue, social infrastructures, Education, research, Education, training, research, Multiple: Social Economy, Tourism, Multiple: Social Affairs, Women and Family, Multiple: Planning, Rural development, Multiple: Sustainable development, Planning
Crosscutting	cross-cutting, cross cutting, cross-sectoral, innovation, research, R&D, integration, empower, gender, women, youth, disaster, risk, climate, climate service, meteo, warning, governance, legislation, policy, policies, institution, M&E, devolution, private sector, public sector, territory, spatial planning, planning, weather
Not specified	Not specified

Sector GGA

GGA Sector Theme	Keywords / Match Terms
Water and sanitation	sanitation, water and sanitation, sewerage, hygiene, water security, water use, water and energy, water, sanitation and waste
Food and agriculture	agri, agro, agriculture, food, crop, livestock, animal, fish, fisheries, aquacultur, aquaculture, seed, irrigation, value chain, land use, land tenure, land and forestry, agroforest, nutrition, food secur, Agriculture and food security, Food and nutrition, Agriculture, Climate services, Others, Sustainable development, Agriculture, land affairs, land reforms, Agriculture, forest and other land uses
Health	health, well-being, wellbeing, territorial communit, territorial communities, local knowledge, Human security, Vulnerable communities, Territorial development
Biodiversity and ecosystems	ocean, marine, coast, coastal land use, coastal zone, blue carbon, mangrove, forest, environ, environment, enviornment, ecosystem, biodiversity, natural, ecology, ecolog, wildlife, REDD, peatland, protected area, freshwater, Terrestrial, Tourism and Coastal Zone Management, Coastal zone management
Infrastructure and human settlements	city, cities, urban, settlement, infrastructure, housing, habitat, industr, waste, transport, energy, landfills, sanitary landfills, mining, mineral resources, mineral products, telecommunications, Infrastructure, transport and building, Infrastructure and services, Cities and Built Environment, Land Use and Human Settlements Development, Habitat, urban planning and development of the territory, Housing, Territorial Development and Urban Planning, Renewable Energy, Urban planning and infrastructure, Urban Development & Tourism, spatial planning, planning, territory, Private sector/trade; Manufacturing; Business process, private sector, public sector,buildings
Poverty eradication and livelihoods	social, poverty, people, econom, capacity, education, employment, tourism, rural development, sustainable development, economic and social infrastructure, livelihood, income, revenue, social protection, protection, transfer, social infrastructures, Education, research, Education, training, research, Multiple: Social Economy, Tourism, Multiple: Social Affairs, Women and Family, Multiple: Planning, Rural development, Multiple: Sustainable development, Planning
Cultural heritage	cultur, cultural heritage, heritage site, traditional knowledge, local knowledge, indigenous

sector for all rows

# Ensure column exists
if (!"SectorType_GGA" %in% names(adaptdata)) adaptdata$SectorType_GGA <- NA_character_

adaptdata <- adaptdata %>%
  mutate(
    # Use Sector if available; if not, fall back to ElementText
    .gga_text = if_else(
      !is.na(Sector) & str_trim(Sector) != "",
      paste0(as.character(Sector), " ", coalesce(as.character(ElementText), "")),
      coalesce(as.character(ElementText), "")
    )
  ) %>%
  rowwise() %>%
  mutate(
    .counts_gga = list(vapply(gga_patterns, function(pat) {
      str_count(.gga_text, regex(pat, ignore_case = TRUE))
    }, integer(1))),
    SectorType_GGA = {
      # Special case: Sector exactly "water"
      if (!is.na(Sector) && str_to_lower(str_trim(Sector)) == "water") {
        "Water and sanitation"
      } else {
        cnt <- unlist(.counts_gga)
        if (all(cnt == 0L)) NA_character_ else {
          winners <- which(cnt == max(cnt))
          gga_priority[winners[1]]
        }
      }
    }
  ) %>%
  ungroup() %>%
  select(-.gga_text, -.counts_gga)

Sector coverage

## **SectorType (IPCC):** 6851 of 7295 rows tagged

## **SectorType_GGA (GGA):** 10352 of 7295 rows tagged

## **Coverage Rate:** 93.9 %

# adaptdata_previous<-read.csv("adaptdata_results_nano6.csv")
# 
# adaptdata <- adaptdata_previous %>%
#   select(-SectorType_GGA) %>%                # drop old classification
#   left_join(
#     adaptdata %>% select(...1, SectorType_GGA), 
#     by = "...1"                             # replace with the correct unique key
#   )
# 
# write.csv(adaptdata,"data/adaptdata_results_nano6.csv")
# # Option 2: If you ran the improved regex classification *directly* on adaptdata
# # then just overwrite:
# adaptdata <- adaptdata %>%
#   mutate(SectorType_GGA = adaptdata_new$SectorType_GGA)

Sector type summary

Sector type raw data

Sector consistency analysis

## Sector consistency per country

# Elements of interest
elements_focus <- c("Hazard", "System at risk", "Action")

# Count rows per Country, Element, and SectorType_GGA
theme_counts_country <- adaptdata %>%
  filter(Element %in% elements_focus,
         !is.na(SectorType_GGA),
         SectorType_GGA != "",
         SectorType_GGA != "NA") %>%
  group_by(Country, Element, SectorType_GGA) %>%
  summarise(n = n(), .groups = "drop") %>%
  arrange(Country, Element, desc(n))

# Interactive table
datatable(theme_counts_country,
          options = list(pageLength = 15, autoWidth = TRUE),
          rownames = FALSE)

theme_counts_country_all <- adaptdata %>%
  filter(!is.na(SectorType_GGA),
         SectorType_GGA != "",
         SectorType_GGA != "NA") %>%
  group_by(Country, Element, SectorType_GGA) %>%
  summarise(n = n(), .groups = "drop") %>%
  arrange(Country, Element, desc(n))

# Interactive table
datatable(theme_counts_country_all,
          options = list(pageLength = 15, autoWidth = TRUE),
          rownames = FALSE)

Step 4 : Climate Impact

4.1 GPT Extraction — impact keywords

Each Climate impact passage is reduced by GPT to short exact phrases of consequences
(e.g., “water scarcity”; “crop yield loss”; “heat-related illness”).
Hazards (“storm”) and actions (“irrigation”) are excluded unless phrased as impacts
(e.g., “increased flooding”).

4.2 Categorization — from phrases to classes

Phrases are mapped to a fixed set of impact categories
- Water availability shortfall
- Water quality deterioration
- Crop production loss
- Livestock impacts
- Soil degradation
- Fisheries & aquaculture decline
- Food security & nutrition decline
- Human health burden
- Mental health impacts
- Healthcare system strain
- Mortality & injury
- Displacement & migration
- Damage to buildings & infrastructure
- Energy system impacts
- Transport disruption
- Supply chain & logistics disruptions
- Economic & livelihood losses
- Tourism & recreation impacts
- Ecosystem degradation & biodiversity loss
- Soil erosion & land degradation
- Marine ecosystem stressors
- Cultural heritage impacts
- Service delivery disruptions
- Carbon storage & productivity decline
- Coastal inundation & shoreline erosion
- Cryosphere degradation & ground instability
- Air quality deterioration
- Agricultural pests & diseases
- Agricultural system shifts
- Potential gains
Each category has sample keywords + regex rules to ensure consistent matches.
Rows with only hazards (no consequences) are dropped.
Multiple categories can apply; results stored in ImpactCategories.

4.3 include the relevant system at risk rows

4.4 Climate impact summary data

load manual data after check

by regions

Step 4.5 : Raw data

This show 10 random rows for each climate impact categories. The raw text is included as well as the keywords that justifiy the category.

Step 5 : Action level and type

5.1 Action type

Goal. Tag each Action with a single type to describe the primary adaptation approach.

Action Type & Intervention Categories

This dataset classifies climate adaptation measures into broad intervention categories.
Each intervention contains types, which group related actions, and each type includes specific actions (and sometimes subactions).

Intervention Categories

Ecosystem based

Focus on maintaining and restoring natural systems.
Includes intervention types such as:
- Biodiversity development – actions to enhance biodiversity (e.g., ecological buffer zones, pollination support).
- Ecological restoration – restoring degraded ecosystems (e.g., afforestation, climate-resilient trees).
- Green infrastructure – using vegetation and permeable systems for adaptation (e.g., rain gardens, green roofs).
- Agroforestry – integrating trees into agricultural systems.
- Animals – livestock and aquaculture resilience.
- Crop diversification – multiple crops to reduce risk.
- Energy – biogas, improved cookstoves, etc.
- Migration – relocating crops/livestock in response to climate.
- Nutrient management – compost, manure, fertility practices.
- Pest and disease management – integrated pest management, biological control.
- Postharvest – storage, drying, reducing food loss.
- Schedule modification – shifting planting/harvest calendars.
- Soil management – mulching, conservation tillage, improved fallows.
- Water management – irrigation efficiency, water harvesting.

Institutional

Policies, governance, and frameworks enabling adaptation.
Includes:
- Economic (policy level) – finance and incentives from institutions (grants, subsidies, PES).
- Government policies and programs – national/local adaptation plans, sectoral programs.
- Laws and regulations – binding rules such as zoning, building codes, protected areas.
- Research development – climate modeling, monitoring, R&D, policy analysis.

Infrastructure / Structural / Physical

Hard measures, services, and technology.
Includes:
- Engineered and built environment – seawalls, dams, drainage, resilient transport.
- Services – service delivery platforms like social protection, healthcare.
- Supply-chain improvement – logistics, storage, market access.
- Technological – precision agriculture, IoT, improved breeds/varieties, renewable energy.

Method

GPT reads the Action text and must return exactly one of the five labels (or NA if unclear).
The output is saved in ActionType_GPT and previewed in a datatable.

by regions

Action type data

5.2 Action level

Action Level classification

Goal. Distinguish whether each Action reported is preparatory groundwork or a substantive adaptation measure.

Definitions

Groundwork — upstream or enabling work such as:
- impact/vulnerability assessments, scenarios, conceptual tools
- policy recommendations, planning frameworks
Substantive action — concrete measures such as:
- organizational development, regulations, awareness/outreach, education/training
- monitoring/MEL, infrastructure, technology/innovation
- financial mechanisms, resource transfer, funding

Method

GPT reads each Action text and must return exactly one label: Groundwork or Substantive action.
If no clear match, the model returns NA.
The result is stored in ActionLevel_GPT and previewed in a datatable.

by regions

Step 6 : Results

Goal. Label each Result as one of four types so we can compare what’s being reported across countries and sectors.

Method
- GPT reads the Result text and returns exactly one label from the list above (or NA if unclear).
- We store the model’s reply verbatim in ResultType_GPT (no normalization here); any cleanup or remapping happens later.

by regions

Step 7 : Adaptation priorities

List of adaptation priorities : - Water use efficiency & demand management — Reduce water losses and improve irrigation efficiency/productivity.
- Alternative & non-conventional water resources — Options like rainwater harvesting, greywater reuse, desalination, and storage.
- Agribusiness enhancement & private sector development — Strengthening agribusinesses, value chains, and SME/market linkages.
- Legal, policy & institutional frameworks — Laws, policies, governance, and institutional capacity for adaptation.
- Climate-smart agriculture (CSA) & resilient farming systems — Practices like crop diversification, agroforestry, and soil management.
- Sustainable land & farm management — Sustainable land practices, organic/landscape approaches.
- Climate-resilient crops & seeds — Breeding and access to drought/heat/pest-resistant varieties and seeds.
- Farmer capacity, extension & knowledge services — Training, extension services, and farmer knowledge exchange.
- Agricultural water management — On-farm water efficiency, watershed management, and irrigation.
- Rural development & livelihood diversification — Enhancing rural infrastructure, markets, and income sources.
- Ecosystem protection, restoration & protected areas — Conserving and restoring ecosystems and biodiversity.
- Combat land degradation & desertification — Tackling soil erosion, fertility loss, and desertification.
- Afforestation, reforestation & carbon sinks — Tree planting and forest cover expansion for carbon sequestration.
- Invasive species management — Control and management of invasive alien species.
- Land-use & spatial planning integration — Adaptation in zoning, territorial, and land-use planning.
- Monitoring, data & hydrological observation — Climate monitoring, data systems, and hydrological observation.
- National adaptation frameworks & strategies — National-level plans/frameworks (e.g., NAP, NCCAS).
- Local adaptation capacity & community empowerment — Community-based adaptation and local empowerment.
- Public awareness, education & engagement — Campaigns, curricula, and awareness programs.
- Institutional & technical capacity building — Training, institutional strengthening, inter-agency coordination.
- Policy mainstreaming & regulatory integration — Integrating adaptation into policies and regulations.
- Cross-sectoral collaboration & partnerships — Partnerships and coordination across sectors.
- Data management, knowledge & information services — Knowledge management, sharing, and tools.
- Evidence base, risk & impact assessments — Vulnerability, risk, and impact assessments.
- Subnational adaptation priorities & sector packages — Region- or sector-specific strategies and action packages.
- Agro-meteorological & climate information services — Forecasts, advisories, and information services for agriculture.
- Post-harvest, food loss reduction & risk transfer — Reducing losses, value addition, insurance, and cold chains.
- Climate-resilient livestock systems — Livestock management, resilient breeds, and animal health.
- Research, innovation & investment (agri/health) — Research, technology, and innovation in agriculture and health.
- Fire management & response capability — Fire prevention, management, and emergency response.
- Community-based forest management & re/afforestation — Local forest management and reforestation.
- Biodiversity assessment & monitoring — Biodiversity inventories, monitoring, and assessments.
- Public health: surveillance, systems & workforce — Disease surveillance, early warning, and health workforce.
- Public health infrastructure & services — Facilities, WASH, energy, and resilient health services.
- Urban climate resilience & planning — Climate-proofed urban planning and infrastructure.
- Urban green/blue infrastructure & heat mitigation — Greening, cooling, water-sensitive infrastructure.
- Disaster risk reduction, EWS & emergency management — DRR, early warning systems, and emergency response.
- Energy & grid resilience — Strengthening energy infrastructure and supply chains.
- Coastal & marine ecosystem protection/restoration — Mangroves, reefs, wetlands, and coastal restoration.
- Agricultural resource efficiency & high-standard farmland — Efficient farmland, fertilizer, and input use.
- Water allocation, security & quantified targets — Ensuring water security with defined consumption/coverage targets.
- Transport infrastructure & services adaptation — Climate-proofing roads, ports, rail, and mobility.
- Industry & mining adaptation — Industrial resilience and climate-compatible mining.
- Sectoral adaptation plans & guidelines — Guidelines and action plans for specific sectors (e.g., tourism).
- Finance, insurance & social protection — Climate finance, insurance, and social safety nets.
- Tourism sector adaptation — Adaptation strategies for tourism and ecotourism.
- Gender & social inclusion — Gender-sensitive and socially inclusive adaptation approaches.

by regions

library(dplyr)
library(stringr)
library(tidyr)
library(ggplot2)
library(DT)


region_key <- overview %>%
  select(Country, Region) %>%
  distinct()

# add manual fixes for missing countries
manual_add <- tibble::tribble(
  ~Country,        ~Region,
  "Moldova",       "Eastern Europe",
  "Phillipines",   "Asia-Pacific",
  "Russia",        "Eastern Europe",
  "Türkiye",       "Asia-Pacific",
  "USA",           "Western Europe and Other states"
)

# combine with existing region_key
region_key <- region_key %>%
  bind_rows(manual_add) %>%
  distinct()


adaptdata_with_region <- adaptdata %>%
  left_join(region_key, by = "Country")

# 1) Counts by Region x Priority
priority_counts_region <- adaptdata_with_region %>%
  filter(tolower(Element) == "adaptation priorities",
         !is.na(PriorityCategories),
         str_detect(PriorityCategories, "\\S")) %>%
  separate_rows(PriorityCategories, sep = ";") %>%
  mutate(
    PriorityCategories = str_squish(PriorityCategories),
    Region = if_else(is.na(Region) | !nzchar(Region), "Unknown", Region)
  ) %>%
  filter(PriorityCategories != "") %>%
  count(Region, PriorityCategories, sort = TRUE) %>%
  rename(`Priority Category` = PriorityCategories,
         Occurrences = n)

# 2) Pick TOP 20 categories by total across regions
top20 <- priority_counts_region %>%
  group_by(`Priority Category`) %>%
  summarise(Total = sum(Occurrences), .groups = "drop") %>%
  slice_max(Total, n = 20, with_ties = FALSE)

priority_counts_region_top20 <- priority_counts_region %>%
  semi_join(top20, by = "Priority Category") %>%
  left_join(top20, by = "Priority Category") %>%
  mutate(`Priority Category` = reorder(`Priority Category`, Total)) %>%
  arrange(Total)

# 3) Stacked plot by Region (top 20 only)
priority_plot_region <- ggplot(priority_counts_region_top20,
  aes(x = `Priority Category`, y = Occurrences, fill = Region)) +
  geom_col(position = "stack") +
  coord_flip() +
  scale_fill_viridis_d(option = "plasma", name = "Region") +
  labs(title = "Top 20 adaptation priorities by UNFCCC region (stacked)",
       x = NULL, y = "Occurrences") +
  theme_minimal(base_size = 10)

priority_plot_region

Step 8 : Adaptation barriers

# ---------------- Setup ----------------
# suppressPackageStartupMessages({
#   library(dplyr); library(stringr); library(glue); library(DT); library(ellmer); library(tidyr)
# })
# 
# # Model + batching
# BARRIER_MODEL <- "gpt-5-nano"
# CHUNK_SIZE    <- 30
# MAX_LABELS_PER_PASSAGE <- 3
# 
# # ---- API key ----
# btr_key <- Sys.getenv("BTR_KEY")
# stopifnot("BTR_KEY is not set in your environment." = nzchar(btr_key))
# Sys.setenv(OPENAI_API_KEY = btr_key)
# 
# # ---------------- Canonical labels ----------------
# BARRIER_LABELS <- c(
#   "Financial",
#   "Economic",
#   "Human capacities",
#   "Informational",
#   "Institutional",
#   "Organizational",
#   "Technological",
#   "Physical",
#   "Social/ cultural",
#   "Biological",
#   "Other emerging issues"
# )
# 
# # Map small variants to the exact allowed strings (extra safety)
# canonicalize_barrier_label <- function(x) {
#   if (is.null(x) || is.na(x)) return("Other emerging issues")
#   raw <- tolower(trimws(x))
#   raw <- gsub("[[:punct:]]+$", "", raw)
#   raw <- gsub("\\s+", " ", raw)
# 
#   map <- list(
#     "financial"            = c("financial","financing","funding","budget","costs"),
#     "economic"             = c("economic","macro-economic","macroeconomic","market","livelihood"),
#     "human capacities"     = c("human capacity","human capacities","capacity","capacities","skills","staffing"),
#     "informational"        = c("informational","information","knowledge","data gaps","monitoring and data","evidence","m&e","monitoring"),
#     "institutional"        = c("institutional","governance","policy","regulatory","mainstreaming"),
#     "organizational"       = c("organizational","organisational","coordination","mandates","roles and responsibilities","role clarity"),
#     "technological"        = c("technological","technology","tech","innovation","digital"),
#     "physical"             = c("physical","biophysical","infrastructure","geographical","terrain","remoteness"),
#     "social/ cultural"     = c("social/ cultural","social/cultural","sociocultural","social and cultural","social-cultural","social","cultural"),
#     "biological"           = c("biological","bio-physical limits","biophysical limits","climatic limits","ecophysiological"),
#     "other emerging issues"= c("other emerging issues","other","emerging issues","pandemics","covid","invasions","invasion")
#   )
#   for (k in names(map)) if (raw %in% map[[k]]) {
#     return(BARRIER_LABELS[match(tolower(k), tolower(BARRIER_LABELS))])
#   }
#   hit <- BARRIER_LABELS[tolower(BARRIER_LABELS) == raw]
#   if (length(hit)) return(hit)
#   "Other emerging issues"
# }
# 
# canonicalize_barrier_vector <- function(v) {
#   v <- v[ nzchar(trimws(v)) ]
#   if (!length(v)) return("Other emerging issues")
#   v <- unique(vapply(v, canonicalize_barrier_label, character(1)))
#   # keep at most MAX_LABELS_PER_PASSAGE
#   v <- v[seq_len(min(length(v), MAX_LABELS_PER_PASSAGE))]
#   paste(v, collapse = "; ")
# }
# 
# # ---------------- Multi-label batch prompt ----------------
# make_barrier_batch_prompt <- function(txts) {
#   glue::glue(
# "Task: For each passage, select all applicable *Adaptation barriers* (1–{MAX_LABELS_PER_PASSAGE}) using ONLY the labels below.
# If uncertain, choose the closest categories (do NOT output NA). Return N lines, same order; separate multiple with '; '.
# 
# Allowed labels (exact strings):
# - Financial — inadequate/lack of funds or budgets; affordability/cost constraints.
# - Economic — constraints from current livelihoods, market structure, macro-economy, and the development level of key sectors.
# - Human capacities — gaps in skills/training/education and adequate staffing at individual/organizational/societal levels.
# - Informational — gaps in information/awareness/knowledge/data/monitoring needed to guide or assess adaptation; ALSO gaps in data infrastructure and knowledge-management systems (platforms, registries, databases, MIS, M&E systems).
# - Institutional — weaknesses in policies/regulations/plans or inadequate mainstreaming of adaptation into other policies.
# - Organizational — weak organizations/mandates/coordination/designated entities; poor stakeholder inclusion/participation.
# - Technological — limited access to technologies/innovation/equipment, INCLUDING limitations of manmade/built infrastructure.
# - Physical — barriers from the natural physical environment (terrain, remoteness, topography, soils, floodplains, coastlines).
# - Social/ cultural — norms, values, identity, beliefs, place attachment, justice/equity, social support, security issues.
# - Biological — biophysical/climatic/physiological limits (extreme temperature/precipitation/salinity/acidity/extreme-event frequency).
# - Other emerging issues — contextual shocks (e.g., pandemics, invasions).
# 
# Disambiguation (key cues):
# - Data/knowledge/M&E platforms/registries/databases → Informational (not Institutional).
# - Coordination/mandates/role clarity/designated entities → Organizational.
# - Manmade/built infrastructure constraints → Technological (Physical is ONLY natural environment).
# - Budget/funding/affordability → Financial; broader livelihoods/markets/macroeconomy → Economic.
# 
# Output format: EXACT label strings separated by '; ' (max {MAX_LABELS_PER_PASSAGE}). No extra words.
# 
# INPUTS (numbered):
# {paste0(sprintf('%d) %s', seq_along(txts), txts), collapse = '\n')}"
#   )
# }
# 
# # ---------------- Batch classifier (multi-label) ----------------
# classify_barrier_type_batch <- function(txts, model = BARRIER_MODEL, chunk_size = CHUNK_SIZE) {
#   if (!length(txts)) return(character(0))
#   chat <- ellmer::chat_openai(model = model)
#   out  <- vector("character", length(txts))
# 
#   idx    <- seq_along(txts)
#   chunks <- split(idx, ceiling(idx / chunk_size))
# 
#   pb <- txtProgressBar(min = 0, max = length(chunks), style = 3)
#   on.exit(close(pb), add = TRUE)
# 
#   for (i in seq_along(chunks)) {
#     ids    <- chunks[[i]]
#     prompt <- make_barrier_batch_prompt(txts[ids])
# 
#     ans <- tryCatch(chat$chat(prompt), error = function(e) e)
#     if (inherits(ans, "error")) {
#       warning("GPT batch failed: ", conditionMessage(ans), " — defaulting to 'Other emerging issues' for this chunk.")
#       out[ids] <- "Other emerging issues"
#       setTxtProgressBar(pb, i); next
#     }
# 
#     lines <- strsplit(ans, "\\r?\\n", perl = TRUE)[[1]]
#     lines <- trimws(lines)
#     lines <- lines[nzchar(lines)]
#     lines <- sub("^[0-9]+[.)\\-:]\\s*", "", lines, perl = TRUE)
# 
#     # align to input size
#     if (length(lines) < length(ids)) lines <- c(lines, rep("Other emerging issues", length(ids) - length(lines)))
#     if (length(lines) > length(ids)) lines <- lines[seq_along(ids)]
# 
#     # split by ';', canonicalize each, cap to MAX_LABELS_PER_PASSAGE, then collapse back
#     out[ids] <- vapply(lines, function(s) {
#       labs <- unlist(strsplit(s, "\\s*;\\s*", perl = TRUE))
#       canonicalize_barrier_vector(labs)
#     }, character(1))
# 
#     setTxtProgressBar(pb, i)
#   }
#   out
# }
# 
# # ---------------- Run on your data ----------------
# if (!"BarrierType_GPT" %in% names(adaptdata)) adaptdata$BarrierType_GPT <- NA_character_
# 
# bar_idx <- which(
#   grepl("barrier", adaptdata$Element, ignore.case = TRUE) |
#   grepl("barrier", adaptdata$ElementLabel, ignore.case = TRUE)
# )
# 
# if (length(bar_idx)) {
#   cat("Classifying adaptation barriers for", length(bar_idx), "rows (multi-label, batched)...\n")
#   adaptdata$BarrierType_GPT[bar_idx] <-
#     classify_barrier_type_batch(adaptdata$ElementText[bar_idx],
#                                 model = BARRIER_MODEL,
#                                 chunk_size = CHUNK_SIZE)
# } else {
#   cat("No rows with Element/ElementLabel containing 'barrier' found.\n")
# }

#write.csv(adaptdata,"data/adaptdata_barriers.csv")
 # ---------------- Explode to row-per-barrier (to mirror manual duplication) ----------------
# adaptdata_barriers_long <- adaptdata %>%
#   filter(row_number() %in% bar_idx) %>%
#   mutate(BarrierType_GPT = ifelse(is.na(BarrierType_GPT) | !nzchar(BarrierType_GPT),
#                                   "Other emerging issues", BarrierType_GPT)) %>%
#   separate_rows(BarrierType_GPT, sep = "\\s*;\\s*")
# 
# # Quick preview
# DT::datatable(
#   adaptdata_barriers_long %>%
#     select(Country, Document, Element, ElementLabel, ElementText, BarrierType_GPT),
#   escape = FALSE,
#   caption = "🚧 Adaptation barriers — GPT (exploded to one row per barrier)",
#   options = list(pageLength = 10, autoWidth = TRUE, dom = "tip")
# )
# 
# # Optional: counts
# barrier_counts <- adaptdata_barriers_long %>%
#   count(BarrierType_GPT, sort = TRUE)
# print(barrier_counts)
# 
# # Optional save
# # write.csv(adaptdata_barriers_long, "data/adaptdata_barriers_long_nano6.csv", row.names = FALSE)
# 
# # ---------------- Helper: agreement vs manual after dedup ----------------
# # Expects a data.frame `a` with columns: Element, ElementText, BarrierType (manual), BarrierType_GPT (semicolon string or exploded)
# 
# a<-adaptdata_barriers_long%>%select(Element,ElementText,BarrierType,BarrierType_GPT)
# 
# compare_barrier_agreement <- function(a) {
#   # Explode GPT side if needed
#   a_exp <- a %>%
#     mutate(BarrierType_GPT = ifelse(is.na(BarrierType_GPT), "", BarrierType_GPT)) %>%
#     separate_rows(BarrierType_GPT, sep = "\\s*;\\s*")
# 
#   # Deduplicate manual rows: one row per (Element, ElementText, BarrierType)
#   a_dedup <- a_exp %>%
#     group_by(Element, ElementText, BarrierType) %>%
#     summarise(BarrierType_GPT = first(BarrierType_GPT), .groups = "drop") %>%
#     mutate(
#       BarrierType_norm    = tolower(trimws(BarrierType)),
#       BarrierType_GPT_norm= tolower(trimws(BarrierType_GPT)),
#       match = BarrierType_norm == BarrierType_GPT_norm
#     )
# 
#   agreement <- mean(a_dedup$match, na.rm = TRUE)
# 
#   confusion <- a_dedup %>%
#     count(Manual = BarrierType, GPT = BarrierType_GPT) %>%
#     arrange(desc(n))
# 
#   list(agreement = agreement, confusion = confusion, n = nrow(a_dedup))
# }
# 
# ```
# 
# ```{r}
# a_dedup <- a %>%
#   group_by(Element, ElementText, BarrierType) %>%
#   summarise(BarrierType_GPT = first(BarrierType_GPT), .groups = "drop")
# 
# # 2. Compare manual vs GPT
# a_dedup <- a_dedup %>%
#   mutate(match = tolower(trimws(BarrierType)) == tolower(trimws(BarrierType_GPT)))
# 
# # 3. Agreement rate
# agreement <- mean(a_dedup$match, na.rm = TRUE)

adaptdata<-read.csv("data/adaptdata_barriers.csv")

# a<-adaptdata_barriers_long%>%filter(Element=="Barriers")%>%select(RowID,Country,Element,ElementText,BarrierType,BarrierType_GPT)%>%distinct
# Build region key (+ manual fixes)
region_key <- overview %>%
  dplyr::select(Country, Region) %>%
  dplyr::distinct()

manual_add <- tibble::tribble(
  ~Country,      ~Region,
  "Moldova",     "Eastern Europe",
  "Phillipines", "Asia-Pacific",
  "Russia",      "Eastern Europe",
  "Türkiye",     "Asia-Pacific",
  "USA",         "Western Europe and Other states"
)

region_key <- region_key %>%
  dplyr::bind_rows(manual_add) %>%
  dplyr::distinct()

# Join regions
adaptdata_with_region <- adaptdata %>%
  dplyr::left_join(region_key, by = "Country")

# 1) Counts by Region x Barrier (using BarrierType_GPT; split on ';')
barrier_counts_region <- adaptdata_with_region %>%
  dplyr::filter(stringr::str_to_lower(Element) == "barriers",
                !is.na(BarrierType_GPT),
                stringr::str_detect(BarrierType_GPT, "\\S")) %>%
  tidyr::separate_rows(BarrierType_GPT, sep = "\\s*;\\s*") %>%   # split multi-label cells
  dplyr::mutate(
    BarrierType_GPT = stringr::str_squish(BarrierType_GPT),
    Region = dplyr::if_else(is.na(Region) | !nzchar(Region), "Unknown", Region)
  ) %>%
  dplyr::filter(BarrierType_GPT != "") %>%
  # avoid counting duplicate barrier labels for identical passages
  dplyr::distinct(Country, Document, Element, ElementLabel, ElementText, Region, BarrierType_GPT) %>%
  dplyr::count(Region, BarrierType_GPT, sort = TRUE) %>%
  dplyr::rename(`Barrier Type` = BarrierType_GPT,
                Occurrences = n)

# 2) Pick TOP 20 categories by total across regions
top20 <- barrier_counts_region %>%
  dplyr::group_by(`Barrier Type`) %>%
  dplyr::summarise(Total = sum(Occurrences), .groups = "drop") %>%
  dplyr::slice_max(Total, n = 20, with_ties = FALSE)

barrier_counts_region_top20 <- barrier_counts_region %>%
  dplyr::semi_join(top20, by = "Barrier Type") %>%
  dplyr::left_join(top20, by = "Barrier Type") %>%
  dplyr::mutate(`Barrier Type` = reorder(`Barrier Type`, Total)) %>%
  dplyr::arrange(Total)

# 3) Stacked plot by Region (top 20 only)
barrier_plot_region <- ggplot2::ggplot(barrier_counts_region_top20,
  ggplot2::aes(x = `Barrier Type`, y = Occurrences, fill = Region)) +
  ggplot2::geom_col(position = "stack") +
  ggplot2::coord_flip() +
  ggplot2::scale_fill_viridis_d(option = "plasma", name = "Region") +
  ggplot2::labs(title = "Adaptation barriers by UNFCCC region",
       x = NULL, y = "Occurrences") +
  ggplot2::theme_minimal(base_size = 10)

barrier_plot_region

#Step 9 : Summary table for Sankey plot

# Load the new dataset
# adaptdata_new <- read.csv("adaptdata_action_level_nano6.csv")
# 
# # Make sure both have RowID
# if(!"RowID" %in% names(adaptdata_new)) stop("RowID missing in new data")
# if(!"...1" %in% names(adaptdata)) stop("RowID missing in current adaptdata")
# 
# # Keep only the mapping from old data
# gga_map <- adaptdata %>%
#   rename(RowID="...1")%>%
#   select(RowID, SectorType_GGA)
# 
# # Replace SectorType_GGA in the new dataset
# adaptdata <- adaptdata_new %>%
#   select(-SectorType_GGA) %>%        # drop old col
#   left_join(gga_map, by = "RowID")   # bring in correct values
# 
# 
# # Optionally save
# # write.csv(adaptdata_merged, "adaptdata_with_correct_GGA.csv", row.names = FALSE)
# 
# 
# # ----------------------------
# # 1) Pick your source data
# # ----------------------------
# # Replace `df` with your actual data frame (e.g., df <- check or df <- adaptdata)
# df <- adaptdata   # or: df <- adaptdata
# 
# # ----------------------------
# # 2) Normalize key columns
# #    GGA theme, Action level, Action type
# # ----------------------------
# actions <- df %>%
#   filter(tolower(Element) == "action") %>%
#   mutate(
#     GGA          = str_squish(SectorType_GGA),
#     ActionLevel  = str_squish(coalesce(ActionLevel_GPT, Action.level)),
#     # Choose which "type" you want to show in the Sankey’s last column:
#     #   - ActionType_GPT (your action-type buckets), or
#     #   - InterventionType_GPT (your intervention-type buckets)
#     ActionType   = str_squish(coalesce(ActionType_GPT, InterventionType_GPT))
#   ) %>%
#   # keep only rows that have at least GGA and ActionLevel
#   filter(!is.na(GGA), GGA != "",
#          !is.na(ActionLevel), ActionLevel != "")
# 
# # split multiple ActionTypes (e.g., "economic; institutional")
# actions_long <- actions %>%
#   separate_rows(ActionType, sep = ";|,") %>%
#   mutate(ActionType = str_squish(ActionType)) %>%
#   filter(!is.na(ActionType), ActionType != "")
# 
# # ----------------------------
# # 3) Count linkages for the Sankey
# # ----------------------------
# # (a) GGA -> ActionLevel
# edges_gga_level <- actions %>%
#   count(source = GGA, target = ActionLevel, name = "value", sort = TRUE) %>%
#   mutate(stage = "GGA→ActionLevel")
# 
# # (b) ActionLevel -> ActionType
# edges_level_type <- actions_long %>%
#   count(source = ActionLevel, target = ActionType, name = "value", sort = TRUE) %>%
#   mutate(stage = "ActionLevel→ActionType")
# 
# # Combine into one edge list
# sankey_edges <- bind_rows(edges_gga_level, edges_level_type)
# 
# # ----------------------------
# # 4) (Optional) Triple counts
# #     GGA x ActionLevel x ActionType — useful for QA or other visuals
# # ----------------------------
# triples <- actions_long %>%
#   count(GGA, ActionLevel, ActionType, name = "value", sort = TRUE)
# 
# # ----------------------------
# # 5) (Optional) Nodes table (for tools that want nodes + edges)
# # ----------------------------
# nodes <- sankey_edges %>%
#   select(source) %>% rename(node = source) %>%
#   bind_rows(sankey_edges %>% select(target) %>% rename(node = target)) %>%
#   distinct() %>%
#   arrange(node) %>%
#   mutate(id = row_number() - 1L)  # 0-based IDs
# 
# # ----------------------------
# # 6) Export for Sankey tools / SankeyMATIC
# # ----------------------------
# write_csv(sankey_edges, "sankey_edges_gga_level_type.csv")
# write_csv(triples,     "sankey_triples_gga_level_type.csv")

Step 10 : analyze linkages at the impact and action level

Method overview

Goal. Check how well countries’ actions/results target the same GGA themes where risks/impacts are most prominent—and visualize any gaps.

Inputs - Core fields: Country, Element (System at risk, Climate impact, Action, Result), SectorType_GGA (GGA theme). - Optional: ResultType_GPT (Output / Outcome / Impact) for weighting Result rows.

Cleaning & mapping - Clean: trim text, standardize blanks, drop placeholder “NA” themes. - Map to two sides: - Risk/Impact: Element ∈ {System at risk, Climate impact} - Action/Result: Element ∈ {Action, Result} - Themes: fix a consistent theme order; include only themes present in the data.

Counting → shares For each country × theme: 1. Risk count: Risk_Count = rows on the Risk/Impact side.
2. Action count: Action_Count = rows on the Action/Result side
(optionally weight Result rows by ResultType_GPT; Actions = 1). 3. Shares (within-side):
Risk_Share = Risk_Count / Σ Risk_Count
Action_Share = Action_Count / Σ Action_Count

Theme gap (delta) Delta = Action_Share − Risk_Share per theme.
Positive = actions over-represented vs risk; negative = under-represented.

Coherence score (0–1) Compare the two share vectors (Risk vs Action) using: - Cosine similarity (pattern overlap), and - Jensen–Shannon similarity (distributional closeness).

Final score = average of the two, clipped to [0, 1].
Higher = better alignment between where risks appear and where actions focus.

Figures - Country paired bars: Risk vs Action shares by theme (within a country). - Stacked bars (counts): Absolute composition by theme for each side (no forced 100%). - Coherence summary: One bar per country (higher = more aligned). - Delta heatmap: Action_Share − Risk_Share by country × theme
(red = over-actioned; blue = under-actioned).

Interpretation - A theme appears in the Risk share if tagged under System at risk or Climate impact. - A theme appears in the Action share if tagged under Action or Result. - Alignment: largest risk themes ≈ largest action themes; misalignment highlights priority gaps.

## 
## Saved coherence outputs to:
## outputs/coherence/coherence_summary_ALL.csv
## outputs/coherence/gga_risk_vs_action_tables_ALL.csv
## outputs/coherence/figs/coherence_summary_top20.png
## outputs/coherence/figs/country_stacked_bars_counts_top10_twoBars.png
## outputs/coherence/figs/delta_heatmap_top20.png

## outputs/coherence/figs/South_Africa_gga_balance_top20.png 
## outputs/coherence/figs/Egypt_gga_balance_top20.png 
## outputs/coherence/figs/Maldives_gga_balance_top20.png 
## outputs/coherence/figs/Seychelles_gga_balance_top20.png 
## outputs/coherence/figs/Denmark_gga_balance_top20.png 
## outputs/coherence/figs/Bulgaria_gga_balance_top20.png 
## outputs/coherence/figs/Gabon_gga_balance_top20.png 
## outputs/coherence/figs/Kenya_gga_balance_top20.png 
## outputs/coherence/figs/Moldova_gga_balance_top20.png 
## outputs/coherence/figs/Nepal_gga_balance_top20.png 
## outputs/coherence/figs/Latvia_gga_balance_top20.png 
## outputs/coherence/figs/New_Zealand_gga_balance_top20.png 
## outputs/coherence/figs/Canada_gga_balance_top20.png 
## outputs/coherence/figs/Estonia_gga_balance_top20.png 
## outputs/coherence/figs/Azerbaijan_gga_balance_top20.png 
## outputs/coherence/figs/Lebanon_gga_balance_top20.png 
## outputs/coherence/figs/Portugal_gga_balance_top20.png 
## outputs/coherence/figs/Chile_gga_balance_top20.png 
## outputs/coherence/figs/France_gga_balance_top20.png 
## outputs/coherence/figs/Indonesia_gga_balance_top20.png

Glossary — Global theme ranking (Δ = Action share − Risk share)

Rank — Order by mean Δ (highest = most over-represented).
Theme (GGA) — GGA theme name.
Direction — Over / Under / Near-balanced (by sign of mean Δ).
Mean Δ (Action−Risk) — Average across countries of (Action share − Risk share) for the theme.
Median Δ — Median of Δ across countries (robust to outliers).
Avg Risk share — Country-average Risk/Impact share for the theme
(per-country, Risk shares sum to 1).
Avg Action share — Country-average Action/Result share for the theme
(per-country, Action shares sum to 1).
# Over (≥ +5pp) — Count of countries with Δ ≥ 0.05.
# Under (≤ −5pp) — Count of countries with Δ ≤ −0.05.
# Near — Count with −0.05 < Δ < +0.05.
Share Over / Share Under — Fractions of countries in over/under buckets.
Countries counted — Number of country × theme observations included.

## 
## === GLOBAL THEME RANKING (Action − Risk) ===
## Averaging across countries: EQUAL-COUNTRY
## Threshold for over/under counts: ±5 pp
## Saved table: outputs/coherence/insights_global/global_theme_ranking_ActionMinusRisk.csv

# =========================
# Step 10b (GLOBAL): Which themes are most under/over-represented?
# =========================
# Goal: Aggregate ACROSS COUNTRIES to find the globally under/over-represented themes.
# Uses coh_all$combined from Step 10 (one row per country × theme with shares & delta).
# Falls back to the CSVs saved in Step 10 if 'coh_all' isn't in memory.

suppressPackageStartupMessages({
  library(dplyr); library(tidyr); library(readr); library(DT)
})

# -------- Inputs & paths --------
ins_dir <- file.path(out_dir, "insights_global")
dir.create(ins_dir, recursive = TRUE, showWarnings = FALSE)

combined_tbl <- tryCatch(
  {
    stopifnot(exists("coh_all"))
    coh_all$combined %>% filter(!is.na(SectorType_GGA))
  },
  error = function(e) {
    readr::read_csv(
      file.path(out_dir, "gga_risk_vs_action_tables_ALL.csv"),
      show_col_types = FALSE
    ) %>% filter(!is.na(SectorType_GGA))
  }
)

# -------- Parameters --------
# Threshold for counting "over/under" occurrences (in share points)
THR_GAP <- 0.05  # = 5 percentage points of within-country share

# Optional: equal-weight countries (default) vs. weight by counts.
# If you prefer record-weighted averaging across countries, set WEIGHTED = TRUE
WEIGHTED <- FALSE

# -------- Prep: per-country theme rows --------
# Ensure we have exactly one row per Country × Theme with shares and delta
per_cty_theme <- combined_tbl %>%
  select(
    Country, SectorType_GGA,
    Risk_Share, Action_Share,
    Delta_Share_Action_minus_Risk,
    Risk_Count, Action_Count
  ) %>%
  distinct()

# -------- GLOBAL aggregation across countries --------
if (!WEIGHTED) {
  # Equal-country averaging (default)
  theme_global <- per_cty_theme %>%
    group_by(SectorType_GGA) %>%
    summarise(
      mean_delta   = mean(Delta_Share_Action_minus_Risk, na.rm = TRUE),
      median_delta = median(Delta_Share_Action_minus_Risk, na.rm = TRUE),
      mean_risk    = mean(Risk_Share,   na.rm = TRUE),
      mean_action  = mean(Action_Share, na.rm = TRUE),
      over_n       = sum(Delta_Share_Action_minus_Risk >=  THR_GAP, na.rm = TRUE),
      under_n      = sum(Delta_Share_Action_minus_Risk <= -THR_GAP, na.rm = TRUE),
      countries_n  = n(),
      .groups = "drop"
    ) %>%
    mutate(
      near_n      = countries_n - over_n - under_n,
      rank_over   = rank(-mean_delta, ties.method = "min"),
      rank_under  = rank( mean_delta, ties.method = "min")
    ) %>%
    arrange(desc(mean_delta))
} else {
  # Record-weighted averaging across countries
  theme_global <- per_cty_theme %>%
    mutate(weight = pmax(Risk_Count + Action_Count, 1)) %>%
    group_by(SectorType_GGA) %>%
    summarise(
      mean_delta   = weighted.mean(Delta_Share_Action_minus_Risk, w = weight, na.rm = TRUE),
      median_delta = median(Delta_Share_Action_minus_Risk, na.rm = TRUE),
      mean_risk    = weighted.mean(Risk_Share,   w = weight, na.rm = TRUE),
      mean_action  = weighted.mean(Action_Share, w = weight, na.rm = TRUE),
      over_n       = sum(Delta_Share_Action_minus_Risk >=  THR_GAP, na.rm = TRUE),
      under_n      = sum(Delta_Share_Action_minus_Risk <= -THR_GAP, na.rm = TRUE),
      countries_n  = n(),
      .groups = "drop"
    ) %>%
    mutate(
      near_n      = countries_n - over_n - under_n,
      rank_over   = rank(-mean_delta, ties.method = "min"),
      rank_under  = rank( mean_delta, ties.method = "min")
    ) %>%
    arrange(desc(mean_delta))
}

# -------- Key outputs for the report --------
# 1) Global ranking: most over- vs under-represented themes (by average Δ)
most_over  <- theme_global %>% arrange(desc(mean_delta)) %>% slice_head(n = 3)
most_under <- theme_global %>% arrange(mean_delta)        %>% slice_head(n = 3)

# 2) Clean, rounded overview table
theme_global_out <- theme_global %>%
  mutate(
    across(c(mean_delta, median_delta, mean_risk, mean_action), ~round(., 3))
  ) %>%
  arrange(desc(mean_delta))

# -------- Save & display --------
readr::write_csv(theme_global_out, file.path(ins_dir, "theme_global_overview.csv"))
readr::write_csv(most_over,        file.path(ins_dir, "themes_most_overrepresented.csv"))
readr::write_csv(most_under,       file.path(ins_dir, "themes_most_underrepresented.csv"))

cat("\n=== GLOBAL THEME BALANCE (Action − Risk) ===\n",
    "Averaging across countries (", ifelse(WEIGHTED, "RECORD-WEIGHTED", "EQUAL-COUNTRY"), "):\n",
    "Top over-represented themes (by mean Δ):\n", sep = "")

## 
## === GLOBAL THEME BALANCE (Action − Risk) ===
## Averaging across countries (EQUAL-COUNTRY):
## Top over-represented themes (by mean Δ):

print(most_over %>% select(SectorType_GGA, mean_delta, median_delta, over_n, under_n, countries_n))

## # A tibble: 3 × 6
##   SectorType_GGA              mean_delta median_delta over_n under_n countries_n
##   <chr>                            <dbl>        <dbl>  <int>   <int>       <int>
## 1 Infrastructure and human s…    0.0195             0     32      39          95
## 2 Cultural heritage              0.00269            0      3       0          95
## 3 Poverty eradication and li…   -0.0182             0     18      25          95

cat("\nTop under-represented themes (by mean Δ):\n")

## 
## Top under-represented themes (by mean Δ):

print(most_under %>% select(SectorType_GGA, mean_delta, median_delta, over_n, under_n, countries_n))

## # A tibble: 3 × 6
##   SectorType_GGA              mean_delta median_delta over_n under_n countries_n
##   <chr>                            <dbl>        <dbl>  <int>   <int>       <int>
## 1 Biodiversity and ecosystems    -0.104       -0.118      24      60          95
## 2 Water and sanitation           -0.0600      -0.0661     23      52          95
## 3 Food and agriculture           -0.0419      -0.0518     24      48          95

cat("\nFull table written to: ", file.path(ins_dir, "theme_global_overview.csv"), "\n", sep = "")

## 
## Full table written to: outputs/coherence/insights_global/theme_global_overview.csv

DT::datatable(
  theme_global_out,
  options = list(pageLength = 10, dom = "tip"),
  caption = htmltools::HTML(
    paste0("🌍 <b>Global theme balance</b> — Δ = Action share − Risk share (THR = ",
           THR_GAP * 100, " pp; averaging = ",
           ifelse(WEIGHTED, "record-weighted", "equal-country"), ")")
  )
)

link to overview

Save data to a local file

System at risk → IPCC sector → GGA

Remarks

Hazard Type:
A few additional climate-related keywords were added to improve coverage. For example, terms like “dry days”, “surface air temperature”, and “heat stress” appeared frequently and were added under appropriate hazard categories.
However, we also encountered terms such as “runoff”, “solar radiation”, and “winds” that were not clearly assignable to a single hazard category and remain unclassified for now.
System at Risk:
We expanded the keyword list to better capture common themes in the data:
- Coastal: Added terms like “eutrophication” and “algal bloom”, which frequently occur in the context of coastal or marine environmental issues.
- Livelihoods and Poverty: Included tourism-related terms such as “tourism”, “tourists”, “ski”, “vacation”, and “outdoor activities”, to capture socio-economic impacts on income and livelihoods.
- Infrastructure and Services: Extended to include infrastructure and utility-related keywords such as “power”, “hydropower”, “railways”, “port”, “industry”, “industries”, and “material”.
- Health: Added terms like “pollution”, “deaths”, “human lives”, and “life expectancy” to better reflect public health consequences mentioned in the data.
⚠️ Note: We observed several rows tagged as System at risk that likely describe hazards rather than systems, as they do not mention any system but describe a climate hazard (e.g., drought or storms). This may warrant a second look or reclassification. Some examples
- Climate change is projected to cause longer and dryer dry seasons and shorter rainy seasons
- Increased frequency and intensity of flooding in vulnerable areas
- Another precursor to higher drought incidence will be the expected change in rainfall variability, with the number of rainfall days decreasing, especially in spring and summer, while the intensity of individual rainfall events increases.
- Seychelles is projected to experience increased average temperatures and changes in rainfall patterns, leading to more frequent and intense rainfall events, including flooding and landslides, and potentially longer periods of drought
Sector Tagging (IPCC & GGA):
Sector tagging shows strong overall coverage, particularly due to the mapping from both direct sector mentions and SystemType.
That said, a number of rows still remain unclassified, either because sector information wasn’t mentioned explicitly in the original source or because it was too ambiguous to map reliably. It’s unclear whether these untagged rows should be a concern, but they could be flagged for manual review depending on use case.

0_Processing_updated

Lolita Muller & Namita Joshi (Mainly Lolita <3 )

2025-09-25

Load packages

Preparation

Step 1: Hazard type

Hazard Categories and Associated Keywords

Coverage of hazard type

Hazard type summary data

Hazard type raw data

Step 2: Systems at risk

Coverage of system at risk type

System at risk type summary data

System at risk type raw data

Step 3: Sectors

SectorType (IPCC categories) - Keywords used

Sector GGA

sector for all rows

Sector coverage

Sector type summary

Sector type raw data

Sector consistency analysis

Step 4 : Climate Impact

4.1 GPT Extraction — impact keywords

4.2 Categorization — from phrases to classes

4.3 include the relevant system at risk rows

4.4 Climate impact summary data

load manual data after check

Step 4.5 : Raw data

Step 5 : Action level and type

5.1 Action type

Action Type & Intervention Categories

Intervention Categories

Ecosystem based

Institutional

Social

Infrastructure / Structural / Physical

Action type data

5.2 Action level

Step 6 : Results

Step 7 : Adaptation priorities

Step 8 : Adaptation barriers

Step 10 : analyze linkages at the impact and action level

Glossary — Global theme ranking (Δ = Action share − Risk share)

link to overview

Save data to a local file

System at risk → IPCC sector → GGA

Remarks