8000 Incorrect Labels in Hadoop Log Data · Issue #56 · logpai/loghub · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Incorrect Labels in Hadoop Log Data #56
Closed
@mmantyla

Description

@mmantyla

Recently, I began working on a demo for our log analysis tool, LogDelta, using your Hadoop. However, during the demo's creation, I grew increasingly suspicious of certain labels in the Hadoop data. As a result, what started as a simple demo evolved into a label investigation, ultimately requiring far more effort than initially anticipated.

I focused solely on the PageRank application, meaning that the WordCount application might still contain additional incorrect labels. Below are the identified incorrect labels along with their corresponding fixes:

ID Orig Label Fixed Label
1445144423722_0024 Normal Disk Full
1445182159119_0017 Machine Down Normal
1445062781478_0020 Machine Down Normal
1445182151478_0015 Machine Down Disk Full
1445182159119_0013 Disk Full Machine Down
1445182159119_0011 Disk Full Machine Down

If you're curious about how I reached these conclusions, the process is documented in a YouTube playlist.

  • The key part of the label correction is covered in the final video.
  • The earlier videos provide details on how the suspicions began to arise.
  • I have also shared the text script of the video, which includes some visuals.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0