- Abu Dhabi, UAE
Stars
Code repository for AIED25 paper: Can Large Language Models Match Tutoring System Adaptivity? A Benchmarking Study
The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
Official code for --- InSTA: Towards Internet-Scale Training For Agents
Switchboard Dialog Act Corpus with Penn Treebank links
An Open-Source Library to Measure Pedagogical Ability of AI Tutors in Educational Dialogues
[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs
๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023
Agent driven automation starting with the web. Try it: https://www.emergence.ai/web-automation-api
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes"
ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
2D Positional Embeddings for Webpage Structural Understanding ๐ฆ๐
A framework to enable multimodal models to operate a computer.
Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
Python coherence evaluation tool using Stanford's CoreNLP.
This repo contains implementation of different architectures for emotion recognition in conversations.
SemEval2024-task8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection
A library for advanced large language model reasoning
The prime repository for state-of-the-art Multilingual Question Answering research and development.
Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents"