Crystalize is an open-source Embedding & Clustering 3D visualization tool. It provides a GUI that makes it easy to parse and clean up jsonlines formatted logs for visual exploration, allowing users to see the effectiveness of embeddings and clustering.
Demonstration.mp4
-
Import and process jsonlines formatted logs
-
Support for both single file and folder import
-
Selective field choosing for embedding and clustering
-
Pre-processing of data
-
Customizable embedding models (default: AllMiniLM-v2)
-
Automatic embedding generation
-
Interactive 3D visualization
-
Rotation and zoom functionality
-
Cluster highlighting
-
Tree widget for cluster and log exploration
-
Hierarchical view of clusters and their logs
-
Ability to toggle cluster visibility
- Clone the repository:
git clone https://github.com/zerocase/Crystalize.git
cd Crystalize
Create a virtual environment:
python -m venv venv
On Windows use venv\Scripts\activate
source venv/bin/activate
Install requirements:
pip install -r requirements.txt
Run the main.py file:
python main.py
-
Click "Open File" or "Open Folder" to import your jsonlines formatted log file(s).
-
Select relevant fields for embedding and clustering in the "Common Fields" section.
-
Click "Pre-process Data" to prepare the data.
-
Choose an embedding model from the dropdown (default or custom from Hugging Face).
-
Click "Generate Embeddings" to create embeddings and an initial visualization.
-
Click "Perform Clustering" to group logs and enhance the visualization.
-
Explore the data:
-
Use the mouse to rotate and zoom in the 3D visualization.
-
Select clusters or individual logs in the tree widget to highlight them in the 3D view.
-
Toggle cluster visibility using checkboxes in the tree widget.
-
Expand cluster nodes to see individual logs and their details.
-
Only jsonlines format is currently supported for log files.
-
For importing new data, it's recommended to clear the database first using the "Clear Database" button.
-
There is currently no project system, but backing up and switching databases is possible.
-
CUDA-compatible GPUs significantly speed up the embedding process.
Crystalize requires Python 3.12.6 and the following main dependencies:
-
PyQt6
-
NumPy
-
scikit-learn
-
sentence-transformers
-
PyOpenGL
-
torch
For a full list of dependencies, see requirements.txt
.
ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection
This project is open-source and available under the GNU General Public License v2.0.