Open to DS Roles

PhD Candidate · Computational Humanities

Paschalis Agapitos

NLP & Network Data Scientist

I build NLP and network science pipelines for large-scale cultural and textual data at the DIPC and the University of the Basque Country. With a background in Linguistics and deep expertise in machine learning, deep learning, and NLP, I have published peer-reviewed research and developed open-source tools — including WikiTextGraph for multilingual Wikipedia graph extraction. Beyond academia, I have delivered end-to-end data science projects in overstock risk prediction and medical image classification. I am seeking data science roles in NLP and network analytics where research-grade models translate into production-level impact.

3 Publications
4+ Years Python
OSS Open Source
Portrait photo of Paschalis Agapitos

Research Focus

Developing and using computational methods to understand human culture, language, and communication through interdisciplinary research.

Computational Humanities

Using and developing computational approaches to cultural phenomena, bridging computational methods and humanistic inquiry to uncover new patterns in human expression and cultural transmission.

Cultural Analytics Computational Humanities Wikipedia's network analysis

Computational Stylometry

Using and developing algorithms for authorship attribution, text analysis, and literary pattern recognition using advanced machine learning and statistical techniques.

Authorship Attribution Literary Analysis Forensic Linguistics

Network Data Science

Exploring complex network structures in cultural and linguistic contexts, revealing hidden patterns in social dynamics, cultural evolution, and information networks.

Complex Networks Social Dynamics Graph Theory

Publications

Peer Reviewer: Digital Scholarship in the Humanities (Oxford Academic)

JORS · 2025

WikiTextGraph: A Python Tool for Parsing Multilingual Wikipedia Text and Graph Extraction

Journal of Open Research Software (JORS), 2025

Authors: Paschalis Agapitos, Juan Luis Suárez, Gustavo Ariel Schwartz

Open Source Zenodo Archive GitHub Repository
Wikipedia Software Engineering Data Mining Graph Processing
Nature · HSScomms · 2025

Wikipedia as a cultural lens: a quantitative approach for exploring cultural networks

Humanities & Social Sciences Communications (Nature), 2025

Authors: Luis A Miccio, Paschalis Agapitos, Carlos Gamez-Perez, Francisco González, Juan Luis Suárez, Gustavo Ariel Schwartz

Nature - HSScomms High Impact 6 Authors
Cultural Networks Wikipedia Analysis Complex Networks Cultural Analytics
JCLS · 2024

A Stylometric Analysis of Seneca's disputed plays

Journal of Computational Literary Studies (JCLS), 2024

Authors: Paschalis Agapitos, Andreas van Cranenburgh

Peer Reviewed Reproducible Classical Studies
Stylometry Forensic Linguistics Classical Literature Authorship Attribution

Featured Projects

Showcasing innovative applications of computational methods in healthcare, education, and cultural analysis.

CNN for Pneumonia Detection

2024 • 3 months

A sophisticated Convolutional Neural Network application designed to assist healthcare professionals in detecting pneumonia and COVID-19 type pneumonia from patient X-rays. Built using TensorFlow and advanced computer vision techniques.

Python TensorFlow Computer Vision Deep Learning Medical AI

Cultural Analytics Guide

2022–Present • Ongoing

A comprehensive, open-access textbook and educational resource teaching computational analysis of cultural phenomena. Developed following rigorous Open Science principles with full transparency, reproducibility, and collaborative methodology.

Python Data Science Educational Technology Open Science Cultural Studies

British Political Speeches Analysis

2021 • 4 months

Advanced computational stylometric analysis of British political speeches, with particular focus on Winston Churchill's rhetorical patterns. Combines NLP techniques with historical political discourse analysis.

Python Natural Language Processing Stylometric Analysis Political Science Historical Data

Blog

Long-form project writing focused on technical decisions, experimental design, and the lessons hidden behind the headline metric.

Medical AI · Computer Vision · Model Auditing

When High Accuracy Lies: Building and Auditing a CNN for Pneumonia Detection

Case study article • baseline modeling, Grad-CAM inspection, and shortcut-learning diagnosis

This article follows the project from the first baseline CNN to the moment explainability exposed a more important result than accuracy itself: the model was learning to trust image borders rather than lung pathology.

TensorFlow PyTorch Explainable AI Grad-CAM Medical Imaging

Technical Skills & Expertise

Comprehensive expertise across programming languages, machine learning, and computational research methodologies.

Python
★★★ Expert
Machine Learning
★★☆ Advanced
NLP
★★★ Expert
Network Data Science
★★☆ Advanced
Data Science & Visualization
★★★ Expert
Statistical Analysis
★★★ Expert
Web Development
★☆☆ Intermediate
C++ Programming
★☆☆ Intermediate