Computer Science > Machine Learning

arXiv:2410.20745 (cs)

[Submitted on 28 Oct 2024 (v1), last revised 31 Oct 2024 (this version, v2)]

Title:Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Abstract:Online shopping is a complex multi-task, few-shot learning problem with a wide and evolving range of entities, relations, and tasks. However, existing models and benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping by alleviating task-specific engineering efforts and by providing users with interactive conversations. Despite the potential, LLMs face unique challenges in online shopping, such as domain-specific concepts, implicit knowledge, and heterogeneous user behaviors. Motivated by the potential and challenges, we propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality, and can thus comprehensively evaluate the abilities of LLMs as general shop assistants. With Shopping MMLU, we benchmark over 20 existing LLMs and uncover valuable insights about practices and prospects of building versatile LLM-based shop assistants. Shopping MMLU can be publicly accessed at this https URL. In addition, with Shopping MMLU, we host a competition in KDD Cup 2024 with over 500 participating teams. The winning solutions and the associated workshop can be accessed at our website this https URL.

Comments:	NeurIPS 2024 Datasets and Benchmarks Track Accepted. Modified typos in Figure 9
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.20745 [cs.LG]
	(or arXiv:2410.20745v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.20745

Submission history

From: Yilun Jin [view email]
[v1] Mon, 28 Oct 2024 05:25:47 UTC (695 KB)
[v2] Thu, 31 Oct 2024 12:54:46 UTC (696 KB)

Computer Science > Machine Learning

Title:Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators