Closed
Description
Recently, I began working on a demo for our log analysis tool, LogDelta, using your Hadoop. However, during the demo's creation, I grew increasingly suspicious of certain labels in the Hadoop data. As a result, what started as a simple demo evolved into a label investigation, ultimately requiring far more effort than initially anticipated.
I focused solely on the PageRank application, meaning that the WordCount application might still contain additional incorrect labels. Below are the identified incorrect labels along with their corresponding fixes:
ID | Orig Label | Fixed Label |
---|---|---|
1445144423722_0024 | Normal | Disk Full |
1445182159119_0017 | Machine Down | Normal |
1445062781478_0020 | Machine Down | Normal |
1445182151478_0015 | Machine Down | Disk Full |
1445182159119_0013 | Disk Full | Machine Down |
1445182159119_0011 | Disk Full | Machine Down |
If you're curious about how I reached these conclusions, the process is documented in a YouTube playlist.
- The key part of the label correction is covered in the final video.
- The earlier videos provide details on how the suspicions began to arise.
- I have also shared the text script of the video, which includes some visuals.
Metadata
Metadata
Assignees
Labels
No labels