Zeyu Zhang
Logo Ph.D. Candidate at UvA

Hi, greetings from Zeyu. I am midway through my Ph.D. studies, supervised by Prof. Sebastian Schelter, Dr. Iacer Calixto and Prof. Paul Groth. My research sits at the intersection of machine learning and relational data management. Right now, I’m especially interested in building foundation models that can make better sense of structural data. I’m also exploring how to make these models both efficient and usable, so they scale well to real-world complex systems.

I am open to any kind of connection and collaboration.


Education
  • University of Amsterdam
    University of Amsterdam
    INDELab@Informatics Institute
    Ph.D. Candidate
    Nov. 2022 - present
  • Eindhoven University of Technology
    Eindhoven University of Technology
    MSc. in Computer Science
    Aug. 2020 - Oct. 2022
  • Harbin Institute of Technology
    Harbin Institute of Technology
    BSc. in Computer Science & Engineering
    Aug. 2016 - Jun. 2020
Honors & Awards
  • Amandus H. Lundqvist (ALSP) Full Scholarship
    2020 - 2022
  • Swiss-European Modility Programme Grant
    2021
  • International Experience Fund FIE Scholraship
    2021
  • National 2nd Prize in Chinese Undergraduate Computer Design Contest
    2018
  • Model Student of Academic Record of Harbin Institute of Technology
    2018
News
2025
I will give a talk on the topic of `Practices of Using Small and Large Language Models for Entity Resolution` at the Edinburgh DB Lab. Future
Apr 26
I attended the industrial event organised by Huawei, co-lacated with EDBT 2025, focusing on Next-Generation Data Management Systems at EDBT 2025 in Barcelona. There, I gave a short talk titled "Towards The Efficient Utilization of Language Models for Table Data Preparation". Talk
Mar 28
I presented our paper `A Deep Dive into Cross-Dataset Entity Matching with Large and Small Language Models` at EDBT 2025 in Barcelona.
Mar 28
The AnyMatch work appeared at the poster session of GOOD-DATA work shop at AAAI 2025.
Mar 03
2024
Gave a talk at the WAI meeting happened at Vrije Universiteit Amsterdam.
Oct 08
Presented the work `Directions Towards Efficient and Automated Data Wrangling with Large Language Models` at the DBML Workshop of IEEE International Conference on Data Engineering 2024.
May 13
Selected Publications (view all )
A Deep Dive Into Cross-Dataset Entity Matching with Large and Small Language Models
A Deep Dive Into Cross-Dataset Entity Matching with Large and Small Language Models

Zeyu Zhang, Paul Groth, Iacer Calixto, Sebastian Schelter

International Conference on Extending Database Technology (EDBT) 2025

We propose a new challenge that integrates two practical constraints into conventional entity matching (EM) tasks to better align with real-world deployment scenarios. A comprehensive evaluation of eight matching methods across 11 datasets provides key insights into model selection and data profiling.

A Deep Dive Into Cross-Dataset Entity Matching with Large and Small Language Models

Zeyu Zhang, Paul Groth, Iacer Calixto, Sebastian Schelter

International Conference on Extending Database Technology (EDBT) 2025

We propose a new challenge that integrates two practical constraints into conventional entity matching (EM) tasks to better align with real-world deployment scenarios. A comprehensive evaluation of eight matching methods across 11 datasets provides key insights into model selection and data profiling.

AnyMatch--Efficient Zero-Shot Entity Matching with a Small Language Model
AnyMatch--Efficient Zero-Shot Entity Matching with a Small Language Model

Zeyu Zhang, Paul Groth, Iacer Calixto, Sebastian Schelter

GoodData Workshop, AAAI 2025

We introduce AnyMatch, a novel framework for building effective and efficient entity matching systems. AnyMatch leverages small language models and borrows idea from instruction tuning to diversify the training corpus, refining the model through multiple data selection strategies. The GPT-2 variant of AnyMatch ranks second among baseline models, achieving an F1 score only 4.4$\%$ lower than GPT-4 in a zero-shot setting, while reducing costs by a factor of 3,899.

AnyMatch--Efficient Zero-Shot Entity Matching with a Small Language Model

Zeyu Zhang, Paul Groth, Iacer Calixto, Sebastian Schelter

GoodData Workshop, AAAI 2025

We introduce AnyMatch, a novel framework for building effective and efficient entity matching systems. AnyMatch leverages small language models and borrows idea from instruction tuning to diversify the training corpus, refining the model through multiple data selection strategies. The GPT-2 variant of AnyMatch ranks second among baseline models, achieving an F1 score only 4.4$\%$ lower than GPT-4 in a zero-shot setting, while reducing costs by a factor of 3,899.

All publications