Causal Inference for Computational Social Science

V-DEM EDA for PGM

Author
Affiliation

Troy Cheng, Hongzhe Wang

Georgetown University

Published

July 23, 2025

Modified

August 1, 2025

1 Introduction

This project is a practice of causal inference, aiming to complete the journey from correlation to causation in empirical social science research. The study investigates macro-political transformations—particularly regime changes—through a causal lens grounded in computational methods. The analysis relies on the Varieties of Democracy (V-Dem) dataset, one of the most comprehensive global sources on political regimes and democratic institutions. Covering nearly every country from the 18th century to the present, V-Dem provides detailed yearly indicators across political, economic, and social dimensions, allowing for the historical tracking of institutional change.

In terms of motivation, this project was initiated in response to the comprehensive shocks (cultural, political, etc.) experienced by a Chinese international student upon arriving in the United States. Speaking broken English and surrounded by sarcastic remarks stemming from “banana discourse”, the experience prompted reflection on the root causes of the present situation and, more broadly, the structural contest between the social systems of China and the United States. The project seeks to find some kind of scientific conclusion through data and empirical research. Although such clarity is difficult to achieve in the social sciences, where mischievous atoms always lurk, the hope is to apply causal inference techniques to eliminate bias and pursue a theoretically grounded and relatively objective exploration.

The analysis is guided by four core research questions:

  1. RQ1: What kinds of factors cause a country to undergo a regime change?
  2. RQ2: What factors influence the establishment of communist regimes?
  3. RQ3: What factors contribute to the collapse of communist regimes?
  4. RQ4: What factors explain the long-term survival of the remaining communist regimes (China, Cuba, Laos, North Korea, and Vietnam)?

These questions are explored through a combination of longitudinal data analysis and logistic modeling, with particular attention to causality and temporal ordering. Special emphasis is placed on the interplay between regime types and structural factors such as inequality, civil conflict, elite coordination, and education levels, in order to identify plausible causal pathways.

2 RQ1: What kinds of factors cause a country to undergo a regime change?

This section explores the historical predictors of regime change using V-Dem’s country-year panel data, covering over 200 years and nearly all sovereign states. We operationalize regime change based on transitions in the histname field and use logistic regression to assess whether various political, structural, and conflict-related variables can significantly predict the likelihood of a regime shift.

2.1 Import Data and Load Packages

# Install V-Dem data package:
# install.packages("devtools")
# devtools::install_github("vdeminstitute/vdemdata")

library(vdemdata)
library(tidyverse)
library(effects)
library(stringr)

2.2 Create Regime Change Variable

We define regime change as a change in the histname field within a country across years.

vdem_1 <- vdem |> 
  arrange(country_id, year) |> 
  group_by(country_id) |> 
  mutate(regime_change = if_else(histname != lag(histname), 1, 0, missing = 0)) |> 
  ungroup()

regime_transitions <- vdem_1 |> 
  arrange(country_id, year) |> 
  group_by(country_id) |> 
  mutate(
    previous_histname = lag(histname),
    current_histname = histname
  ) |> 
  filter(regime_change == 1) |> 
  select(country_id, country_name, year, previous_histname, current_histname) |> 
  ungroup()

2.3 Naive Logit Model with Political Indicators

We first test a simple model using three high-level indices: - v2x_corr: Control of corruption - v2x_polyarchy: Electoral democracy index - v2x_freexp_altinf: Freedom of alternative sources of information

vdem_model <- vdem_1 |> 
  arrange(country_id, year) |> 
  group_by(country_id) |> 
  mutate(
    v2x_corr_lag1 = lag(v2x_corr, 1),
    v2x_polyarchy_lag1 = lag(v2x_polyarchy, 1),
    v2x_freexp_altinf_lag1 = lag(v2x_freexp_altinf, 1)
  ) |> 
  ungroup()

model1 <- glm(regime_change ~ v2x_corr_lag1 + v2x_polyarchy_lag1 + v2x_freexp_altinf_lag1,
              data = vdem_model,
              family = binomial(link = "logit"))
summary(model1)

Call:
glm(formula = regime_change ~ v2x_corr_lag1 + v2x_polyarchy_lag1 + 
    v2x_freexp_altinf_lag1, family = binomial(link = "logit"), 
    data = vdem_model)

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)             -3.1161     0.1056 -29.498  < 2e-16 ***
v2x_corr_lag1           -0.3008     0.1549  -1.943   0.0521 .  
v2x_polyarchy_lag1      -2.2017     0.3724  -5.913 3.36e-09 ***
v2x_freexp_altinf_lag1   0.1186     0.2679   0.443   0.6580    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6007.1  on 25879  degrees of freedom
Residual deviance: 5891.9  on 25876  degrees of freedom
  (2033 observations deleted due to missingness)
AIC: 5899.9

Number of Fisher Scoring iterations: 7

2.4 Structural and Background Factors Model

We expand the model to include structural and conflict variables: - e_gdppc: GDP per capita - e_peaveduc: Education level (15+) - e_civil_war: Civil war dummy - e_pt_coup: Number of coup attempts

vdem_model1 <- vdem_model |>
  arrange(country_id, year) |>
  group_by(country_id) |>
  mutate(
    e_gdppc_lag1 = lag(e_gdppc),
    e_peaveduc_lag1 = lag(e_peaveduc),
    e_civil_war_lag1 = lag(e_civil_war),
    e_pt_coup_lag1 = lag(e_pt_coup),
    e_total_oil_income_pc_lag1 = lag(e_total_oil_income_pc)
  ) |>
  ungroup()

model_bg <- glm(regime_change ~ 
                  v2x_polyarchy_lag1 + 
                  e_gdppc_lag1 + 
                  e_civil_war_lag1 + 
                  e_pt_coup_lag1 + 
                  e_peaveduc_lag1,
                data = vdem_model1,
                family = binomial(link = "logit"))
summary(model_bg)

Call:
glm(formula = regime_change ~ v2x_polyarchy_lag1 + e_gdppc_lag1 + 
    e_civil_war_lag1 + e_pt_coup_lag1 + e_peaveduc_lag1, family = binomial(link = "logit"), 
    data = vdem_model1)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -3.718333   0.242908 -15.308  < 2e-16 ***
v2x_polyarchy_lag1 -2.304493   0.665439  -3.463 0.000534 ***
e_gdppc_lag1       -0.005258   0.011770  -0.447 0.655078    
e_civil_war_lag1    0.843364   0.299023   2.820 0.004796 ** 
e_pt_coup_lag1      0.152310   0.483337   0.315 0.752670    
e_peaveduc_lag1     0.013442   0.056318   0.239 0.811347    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 817.32  on 6087  degrees of freedom
Residual deviance: 781.71  on 6082  degrees of freedom
  (21825 observations deleted due to missingness)
AIC: 793.71

Number of Fisher Scoring iterations: 8

2.5 Visualizing Marginal Effects

plot(allEffects(model1))

plot(allEffects(model_bg))

2.6 Interpretation

Initial results suggest that lower levels of electoral democracy and active civil conflict significantly increase the likelihood of regime change. However, economic indicators such as GDP and education do not appear to have strong predictive power in this specification.


3 RQ2: What factors influence the establishment of communist regimes?

regime_transitions |>
  filter(str_detect(current_histname, "People's Republic") |
         str_detect(current_histname, "Socialist Republic") |
         str_detect(current_histname, "Socialist") |
         str_detect(current_histname, "Soviet") |
         str_detect(current_histname, "Democratic People's Republic")) |>
  arrange(year)
country_id country_name year previous_histname current_histname
11 Russia 1918 Russia Russian Socialist Federative Republic (USSR)
210 Hungary 1918 Hungarian half of the Habsburg Empire [Transleithania] Hungarian Soviet Republic
210 Hungary 1919 Hungarian Soviet Republic Hungarian Soviet Republic under Romanian occupation
140 Uzbekistan 1920 Emirate of Bukhara as a Russian protectorate Bukharan People’s Soviet Republic [Soviet state]
89 Mongolia 1924 Bogd Khaan State Mongolian People’s Republic
12 Albania 1944 Albanian Kingdom under German occupation People’s Republic of Albania
190 Romania 1944 Kingdom of Romania Kingdom of Romania under Soviet occupation
198 Serbia 1945 Kingdom of Yugoslavia under German occupation Socialist Federal Republic of Yugoslavia
152 Bulgaria 1946 Kingdom of Bulgaria Bulgarian People’s Republic
210 Hungary 1946 Kingdom of Hungary Second Hungarian People Republic under Soviet guidance
190 Romania 1947 Kingdom of Romania under Soviet occupation People’s Republic of Romania
41 North Korea 1948 North Korea under Soviet occupation Democratic People’s Republic of Korea
157 Czechia 1948 Czechoslovakia Czechoslovakia Socialist Republic as part of Soviet bloc
110 China 1949 Republic of China [Nationalist China] People’s Republic of China
17 Poland 1952 Republic of Poland People’s Republic of Poland
236 Zanzibar 1964 Sultanate of Zanzibar [independent] People’s Republic of Zanzibar and Pemba
190 Romania 1965 People’s Republic of Romania Socialist Republic of Romania
23 South Yemen 1967 Federation of South Arabia People’s Republic of South Yemen
112 Republic of the Congo 1970 Congo Republic People’s Republic of the Congo
124 Libya 1973 Libyan Arab Republic Socialist People’s Libyan Arab Jamahiriya
38 Ethiopia 1974 Empire of Ethiopia Socialist Ethiopia
12 Albania 1976 People’s Republic of Albania People’s Socialist Republic of Albania
34 Vietnam 1976 Democratic Republic of Vietnam Socialist Republic of Vietnam
131 Sri Lanka 1978 Republic of Sri Lanka Democratic Socialist Republic of Sri Lanka
55 Cambodia 1979 Democratic Kampuchea under Vietnamese occupation People’s Republic of Kampuchea under Vietnamese occupation
124 Libya 1986 Socialist People’s Libyan Arab Jamahiriya Great Socialist People’s Libyan Arab Jamahiriya
84 Latvia 1990 Republic of Latvia [independent state] Latvian Soviet Socialist Republic
161 Estonia 1990 Republic of Estonia [independent state] Estonian Soviet Socialist Republic
173 Lithuania 1990 Republic of Lithuania [independent state] Lithuanian Soviet Socialist Republic
38 Ethiopia 1991 Socialist Ethiopia Socialist Ethiopia [transition phase]
communist_keywords <- c("People's Republic", "Socialist Republic", "Socialist", "Soviet", "Democratic People's Republic", "People's Democratic Republic")

regime_transitions |>
  filter(str_detect(current_histname, str_c(communist_keywords, collapse = "|"))) |>
  arrange(country_id, year) |>
  group_by(country_id) |>
  slice_min(year, n = 1) |> 
  ungroup() |>
  arrange(year)
country_id country_name year previous_histname current_histname
11 Russia 1918 Russia Russian Socialist Federative Republic (USSR)
210 Hungary 1918 Hungarian half of the Habsburg Empire [Transleithania] Hungarian Soviet Republic
140 Uzbekistan 1920 Emirate of Bukhara as a Russian protectorate Bukharan People’s Soviet Republic [Soviet state]
89 Mongolia 1924 Bogd Khaan State Mongolian People’s Republic
12 Albania 1944 Albanian Kingdom under German occupation People’s Republic of Albania
190 Romania 1944 Kingdom of Romania Kingdom of Romania under Soviet occupation
198 Serbia 1945 Kingdom of Yugoslavia under German occupation Socialist Federal Republic of Yugoslavia
152 Bulgaria 1946 Kingdom of Bulgaria Bulgarian People’s Republic
41 North Korea 1948 North Korea under Soviet occupation Democratic People’s Republic of Korea
157 Czechia 1948 Czechoslovakia Czechoslovakia Socialist Republic as part of Soviet bloc
110 China 1949 Republic of China [Nationalist China] People’s Republic of China
17 Poland 1952 Republic of Poland People’s Republic of Poland
103 Algeria 1962 French colony People’s Democratic Republic of Algeria [independent state]
236 Zanzibar 1964 Sultanate of Zanzibar [independent] People’s Republic of Zanzibar and Pemba
23 South Yemen 1967 Federation of South Arabia People’s Republic of South Yemen
112 Republic of the Congo 1970 Congo Republic People’s Republic of the Congo
124 Libya 1973 Libyan Arab Republic Socialist People’s Libyan Arab Jamahiriya
38 Ethiopia 1974 Empire of Ethiopia Socialist Ethiopia
123 Laos 1975 Kingdom of Laos [independent state] Lao People’s Democratic Republic
34 Vietnam 1976 Democratic Republic of Vietnam Socialist Republic of Vietnam
131 Sri Lanka 1978 Republic of Sri Lanka Democratic Socialist Republic of Sri Lanka
55 Cambodia 1979 Democratic Kampuchea under Vietnamese occupation People’s Republic of Kampuchea under Vietnamese occupation
84 Latvia 1990 Republic of Latvia [independent state] Latvian Soviet Socialist Republic
161 Estonia 1990 Republic of Estonia [independent state] Estonian Soviet Socialist Republic
173 Lithuania 1990 Republic of Lithuania [independent state] Lithuanian Soviet Socialist Republic

Georgia Soviet Socialist Republic缺失政体建立前后时期的数据

Next Steps: We will next turn to more specific questions:

  • What factors explain the formation of communist regimes?
  • What causes communist regimes to collapse?
  • What drives capitalist regimes to transition into communism?

We will construct new binary regime-type indicators to address these questions in subsequent models.

4 RQ3: What factors contribute to the collapse of communist regimes?

5 RQ4: What factors explain the long-term survival of the remaining communist regimes (China, Cuba, Laos, North Korea, and Vietnam)?

6 EDA

6.1 Test function in vdemdata package

vdemdata::plot_indicator("v2msuffrage")

# Plot V-Dem indicators liberal democracy and egalitarian democracy
# for Sweden and Germany between 1912 and 2000.

plot_indicator(indicator=c( "v2x_egaldem", "v2x_libdem"), countries = c("Germany", "Sweden"),
                     min_year = 1912, max_year = 2000)

6.2 What country do we have?

unique(vdem$country_name)
  [1] "Mexico"                           "Suriname"                        
  [3] "Sweden"                           "Switzerland"                     
  [5] "Ghana"                            "South Africa"                    
  [7] "Japan"                            "Burma/Myanmar"                   
  [9] "Russia"                           "Albania"                         
 [11] "Egypt"                            "Yemen"                           
 [13] "Colombia"                         "Poland"                          
 [15] "Brazil"                           "United States of America"        
 [17] "Portugal"                         "El Salvador"                     
 [19] "South Yemen"                      "Bangladesh"                      
 [21] "Bolivia"                          "Haiti"                           
 [23] "Honduras"                         "Mali"                            
 [25] "Pakistan"                         "Peru"                            
 [27] "Senegal"                          "South Sudan"                     
 [29] "Sudan"                            "Vietnam"                         
 [31] "Republic of Vietnam"              "Afghanistan"                     
 [33] "Argentina"                        "Ethiopia"                        
 [35] "India"                            "Kenya"                           
 [37] "North Korea"                      "South Korea"                     
 [39] "Kosovo"                           "Lebanon"                         
 [41] "Nigeria"                          "Philippines"                     
 [43] "Tanzania"                         "Taiwan"                          
 [45] "Thailand"                         "Uganda"                          
 [47] "Venezuela"                        "Benin"                           
 [49] "Bhutan"                           "Burkina Faso"                    
 [51] "Cambodia"                         "Indonesia"                       
 [53] "Mozambique"                       "Nepal"                           
 [55] "Nicaragua"                        "Niger"                           
 [57] "Zambia"                           "Zimbabwe"                        
 [59] "Guinea"                           "Ivory Coast"                     
 [61] "Mauritania"                       "Canada"                          
 [63] "Australia"                        "Botswana"                        
 [65] "Burundi"                          "Cape Verde"                      
 [67] "Central African Republic"         "Chile"                           
 [69] "Costa Rica"                       "Timor-Leste"                     
 [71] "Ecuador"                          "France"                          
 [73] "Germany"                          "Guatemala"                       
 [75] "Iran"                             "Iraq"                            
 [77] "Ireland"                          "Italy"                           
 [79] "Jordan"                           "Latvia"                          
 [81] "Lesotho"                          "Liberia"                         
 [83] "Malawi"                           "Maldives"                        
 [85] "Mongolia"                         "Morocco"                         
 [87] "Netherlands"                      "Panama"                          
 [89] "Papua New Guinea"                 "Qatar"                           
 [91] "Sierra Leone"                     "Spain"                           
 [93] "Syria"                            "Tunisia"                         
 [95] "Türkiye"                          "Ukraine"                         
 [97] "United Kingdom"                   "Uruguay"                         
 [99] "Algeria"                          "Angola"                          
[101] "Armenia"                          "Azerbaijan"                      
[103] "Belarus"                          "Cameroon"                        
[105] "Chad"                             "China"                           
[107] "Democratic Republic of the Congo" "Republic of the Congo"           
[109] "Djibouti"                         "Dominican Republic"              
[111] "Eritrea"                          "Gabon"                           
[113] "The Gambia"                       "Georgia"                         
[115] "Guinea-Bissau"                    "Jamaica"                         
[117] "Kazakhstan"                       "Kyrgyzstan"                      
[119] "Laos"                             "Libya"                           
[121] "Madagascar"                       "Moldova"                         
[123] "Namibia"                          "Palestine/West Bank"             
[125] "Rwanda"                           "Somalia"                         
[127] "Sri Lanka"                        "Eswatini"                        
[129] "Tajikistan"                       "Togo"                            
[131] "Trinidad and Tobago"              "Turkmenistan"                    
[133] "German Democratic Republic"       "Palestine/Gaza"                  
[135] "Somaliland"                       "Uzbekistan"                      
[137] "Austria"                          "Bahrain"                         
[139] "Barbados"                         "Belgium"                         
[141] "Bosnia and Herzegovina"           "Bulgaria"                        
[143] "Comoros"                          "Croatia"                         
[145] "Cuba"                             "Cyprus"                          
[147] "Czechia"                          "Denmark"                         
[149] "Equatorial Guinea"                "Estonia"                         
[151] "Fiji"                             "Finland"                         
[153] "Greece"                           "Guyana"                          
[155] "Hong Kong"                        "Iceland"                         
[157] "Israel"                           "Kuwait"                          
[159] "Lithuania"                        "Luxembourg"                      
[161] "North Macedonia"                  "Malaysia"                        
[163] "Malta"                            "Mauritius"                       
[165] "Montenegro"                       "New Zealand"                     
[167] "Norway"                           "Oman"                            
[169] "Paraguay"                         "Romania"                         
[171] "Sao Tome and Principe"            "Saudi Arabia"                    
[173] "Serbia"                           "Seychelles"                      
[175] "Singapore"                        "Slovakia"                        
[177] "Slovenia"                         "Solomon Islands"                 
[179] "Vanuatu"                          "United Arab Emirates"            
[181] "Palestine/British Mandate"        "Hungary"                         
[183] "Zanzibar"                         "Baden"                           
[185] "Bavaria"                          "Modena"                          
[187] "Parma"                            "Saxony"                          
[189] "Tuscany"                          "Würtemberg"                      
[191] "Two Sicilies"                     "Hanover"                         
[193] "Hesse-Kassel"                     "Hesse-Darmstadt"                 
[195] "Mecklenburg Schwerin"             "Papal States"                    
[197] "Hamburg"                          "Brunswick"                       
[199] "Oldenburg"                        "Saxe-Weimar-Eisenach"            
[201] "Nassau"                           "Piedmont-Sardinia"               

Pick out German, Russia and China

selected_data <- vdem |>
  filter(country_text_id %in% c("DEU", "RUS", "CHN")) |>
  select(where(~ !any(is.na(.))))
selected_data |>
  group_by(country_text_id) |>
  summarize(
    years_available = sum(!is.na(v2csreprss)),
    year_min = min(year[!is.na(v2csreprss)]),
    year_max = max(year[!is.na(v2csreprss)])
  )
country_text_id years_available year_min year_max
CHN 236 1789 2024
DEU 232 1789 2024
RUS 236 1789 2024
ggplot(selected_data, aes(x = year, y = v2csreprss, color = country_text_id)) +
  geom_point() +
  labs(
    title = "CSO Repression Over Time (V-Dem v2csreprss)",
    x = "Year",
    y = "Civil Society Repression",
    color = "Country"
  ) +
  scale_color_manual(
    values = c("DEU" = "black", "RUS" = "red", "CHN" = "blue"),
    labels = c("Germany", "Russia", "China")
  ) +
  theme_minimal()