This repository contains the scripts for creating the dataset and code for the paper "Chatting with Logs: An exploratory study on Finetuning LLMs for LogQL".
The dataset and the fine-tuned models are available on HuggingFace under the following links:
The demo for the fine-tuned models is available at this link
logs/: This directory containers the system logs and their respective transformation and ingestion scripts for each of the three systems: OpenSSH, OpenStack, and HDFS.
Respective logs are stored in the following directories:
- logs/OpenSSH: Contains the OpenSSH logs and scripts.
- logs/OpenStack: Contains the OpenStack logs and scripts.
- logs/HDFS: Contains the HDFS logs and scripts.
Order of running the scripts:
python filter.py
: Filters the log headers and messages.python generate_labels.py
: Generates the labels and structured metadata intoparsed_<app>_logs.json
python update_timestamps.py
: Updates the timestamps in the parsed logs from based ondatetime.now()
and relative time difference. This is done because Grafana Loki does not support ingesting out-of-order logs. Linkpython upload_to_loki.py
: Uploads the logs to Grafana Loki.
dataset-curation/: This directory contains the scripts for creating the natural language to LogQL dataset. The dataset is created by pairing the natural language queries with the LogQL queries.
finetuning/: This directory contains the scripts for fine-tuning the LLMs on the dataset.