GitHub - SaraPouyan/C4.5-decision-tree: An Implementation of the C4.5 decision tree machine learning algorithm and visualize it graphically.

$${\color{purple} C4.5 \space decision \space tree:}$$

The C4.5 decision tree algorithm is a popular method used in machine learning for classification tasks. It is an extension of the earlier ID3 algorithm developed by Ross Quinlan. C4.5 constructs a decision tree based on a set of training data and is known for its ability to handle both categorical and continuous data, manage missing values, and perform pruning to avoid overfitting.

How C4.5 Works:

Splitting Criterion: C4.5 uses the concept of information gain ratio (a normalized version of information gain) to determine the best attribute to split the data at each node of the tree.
Handling Continuous Data: For continuous attributes, C4.5 creates a threshold and splits the data into subsets where values are either above or below this threshold.
Handling Missing Values: C4.5 can handle missing attribute values by considering the proportion of each class in the subset of data where the attribute is not missing.
Pruning: C4.5 performs a pruning step after the tree is created to remove branches that may reflect noise or outliers, improving the generalization ability of the model.

$${\color{red}C4.5 Decision \space Tree \space Algorithm:}$$

Calculate Gain Ratio (GR) for each feature and select the best feature:
- For each feature:
  - Calculate the entropy before the split.
  - Calculate the entropy after the split for each possible value.
  - Calculate the Information Gain.
  - Normalize the Information Gain to get the Gain Ratio.
- Select the feature with the highest Gain Ratio.
Create a node for the best feature:
- This node will represent the decision based on the best feature selected.
Split the dataset based on the unique values of the best feature:
- Partition the dataset into subsets where each subset contains all the instances with a specific value of the chosen feature.
Loop to check each unique value in the best feature:
- For each unique value of the chosen feature:
  - Create a subset of the dataset based on the unique value:
    - Extract all instances that have this unique value for the feature.
  - If the labels of all samples in the subset are the same:
    - Create a leaf node:
      - This node will have a class label which is the label of the samples.
  - Otherwise:
    - Return to the first step:
      - Recursively apply the C4.5 algorithm to the subset to build the tree further.

$${\color{green}Visualization \space of \space Decision \space Tree - Iris \space Dataset:}$$

$${\color{green}Tree \space with \space Pre-pruning:}$$

$${\color{green}Tree \space with \space Post-pruning:}$$

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
README.md		README.md
decision_tree.ipynb		decision_tree.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

$${\color{purple} C4.5 \space decision \space tree:}$$

How C4.5 Works:

$${\color{red}C4.5 Decision \space Tree \space Algorithm:}$$

$${\color{green}Visualization \space of \space Decision \space Tree - Iris \space Dataset:}$$

About

Releases

Packages

Languages

SaraPouyan/C4.5-decision-tree

Folders and files

Latest commit

History

Repository files navigation

$${\color{purple} C4.5 \space decision \space tree:}$$

How C4.5 Works:

$${\color{red}C4.5 Decision \space Tree \space Algorithm:}$$

$${\color{green}Visualization \space of \space Decision \space Tree - Iris \space Dataset:}$$

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages