I am a Ph.D. candidate in Political Science at the University of California, San Diego, specializing in Computational Social Science. I am advised by Molly Roberts and affiliated with the Halıcıoğlu Data Science Institute.
My research lies at the intersection of quantitative methods, political communication, and law. I combine large-scale data analysis with methodological innovation to study how digital platforms shape public access to legal and governmental information and to advance empirical legal research.
Prior to UCSD, I earned an M.S. in Electrical Engineering from Columbia University, with a focus on data analysis and machine learning. I worked as a Data Science Research Associate at Columbia Law School, where I collaborated with Benjamin Liebman at the Hong Yen Chang Center for Chinese Legal Studies.
My work has been published in journals such as Sociological Methods & Research, Comparative Political Studies, Asian Journal of Law and Society, and the Columbia Law Review. I am also a co-developer of ccmEstimator, an R package for comparative causal mediation analysis.
Outside of research, I enjoy climbing, playing the flute, and training my dog.
Substantively, my research agenda centers on how emerging technologies and media platforms influence political communication and legal governance. Methodologically, I apply and develop advanced computational and statistical techniques to support data-rich social science and integrate tools from statistical modeling, natural language processing (including Large Language Models), causal inference, and machine learning to study complex political and legal processes at scale.
Governments worldwide have increasingly embraced transparency by making vast amounts of information available online. This openness provides the media with new resources to investigate and publicize government activities. However, does the media’s use of government-provided information help sustain transparency over time? This study theorizes that media exposure may undermine the availability of government data by triggering the removal of information. Using China’s open court initiative as an example, I examine how media engagement shapes transparency practices. Drawing on original datasets of 120 million court decisions and 1.8 million digital references to court decisions, I developed a multi-stage computational pipeline combining pattern-matching, semantic-searching, and large language models to systematically link news reports to individual court cases. To mitigate possible confounds inherent to case specifics, I constructed matched control groups to estimate how media coverage of a particular case influences its propensity to stay online. I find that court decisions covered by the media are significantly more likely to be removed from official websites. Moreover, the effect is strongest in cases trialed in provincial-level courts, which are in charge of the case uploading and removal on the official court website.
###
###
###
###
###
###
###
###
###
###
###
###
###
I have taught and assisted in a variety of courses in political science, data science, and law. I also lead annual Math bootcamps for incoming Ph.D. students in Political Science and Management at UCSD.
DSC 599: Teaching Methods in Data Science
CSS 500 - Teaching Apprentice (Computational Social Science)
Xiaohan Wu © 2025 V8.5