Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching
Juan Wisznia
University of Buenos Aires
The NLP Reading Group is delighted to host Juan Wisznia who will be giving a talk remotely on “Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching”.
Recording
The recording for the talk can be found here on our YouTube channel.
Talk Description
In this talk, I will present “Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching,” accepted at ACL 2025 (Main Conference).
This work revisits classical sorting algorithms under the cost structure of LLM-based pairwise ranking, where inference calls dominate computational cost. By applying LLM inference optimizations, we show that algorithms traditionally considered suboptimal can surpass Heapsort while requiring far fewer inference calls, without any loss in ranking quality.
The findings pave the way for new directions in using pre-trained LLMs as re-rankers with realistic latency, without the need for finetuning and while maintaining high performance, which enables efficient and robust retrieval and re-ranking zero-shot components within modern LLM systems.
Speaker Bio
Juan Wisznia is a young researcher and machine learning engineer from the University of Buenos Aires, where he completed his MSc thesis in 2023 on pre-trained transformers for conversational AI in healthcare. His work focuses on the efficiency and robustness of large language model (LLM) systems, particularly in retrieval-augmented generation (RAG) and agentic architectures.
He is the first author of a paper accepted at ACL 2025 (Main Conference) and a co-author of another accepted at EMNLP 2024 (Main Conference), and currently teaches NLP engineering and multi-agent LLM systems at Argentine universities. Juan is pursuing research opportunities and a potential PhD focused on scalable, reliable, and efficient LLM systems.
Logistics
Date: October 24th
Time: 2:00PM
Location: H04 or via Google Meet (See email)