Analysis of Building Electricity Use Pattern Using K-Means Clustering Algorithm by Determination of Better Initial Centroids and Number of Clusters
<p>Aerial photograph of Chubu University, Japan.</p> "> Figure 2
<p>Boxplot with outliers.</p> "> Figure 3
<p>Boxplot after outliers were removed.</p> "> Figure 4
<p>Flow chart of the K-means algorithm.</p> "> Figure 5
<p>Electricity use at Chubu University for different electricity use periods: (<b>a</b>) Sundays and holidays, (<b>b</b>) Saturdays, (<b>c</b>) lecture days without air conditioning, (<b>d</b>) lecture days with air-conditioning, (<b>e</b>) lecture days with air-conditioning peak, and (<b>f</b>) non-lecture days with air-conditioning.</p> "> Figure 6
<p>Selection of the initial centroid using the percentile method.</p> "> Figure 7
<p>Model of 366 × 24 order matrix.</p> "> Figure 8
<p>Part of the data frame of the 366 × 24 order matrix.</p> "> Figure 9
<p>Clustering result of Chubu university. The cluster centers for (<b>a</b>) <span class="html-italic">k</span> = 3 and (<b>b</b>) <span class="html-italic">k</span> = 4. The calendar plot for (<b>c</b>) <span class="html-italic">k</span> = 3 and (<b>d</b>) <span class="html-italic">k</span> = 4.</p> "> Figure 10
<p>Clustering result for Chubu University. The cluster centers for (<b>a</b>) <span class="html-italic">k</span> = 5 and (<b>b</b>) <span class="html-italic">k</span> = 6. The calendar plots for (<b>c</b>) <span class="html-italic">k</span> = 5 and (<b>d</b>) <span class="html-italic">k</span> = 6.</p> "> Figure 11
<p>Actual calendar plot of Chubu University.</p> "> Figure 12
<p>The number of days per cluster by day of the week.</p> ">
Abstract
:1. Introduction
Related Works
2. Outlier Detection and Imputation of Missing Values
3. Methodology
3.1. Overview of K-Means Clustering Algorithm
- (1)
- Determine the desired value of k, where the value of k is the number of the desired clusters.
- (2)
- Determine the initial centroids. The initial centroid is assigned randomly from the existing data, and the number of clusters is equal to the number of initial centroids.
- (3)
- Find the nearest centroid of each data point by calculating the distance to each centroid using the Euclidean distance formula.
- (4)
- Group the data by the minimum distance. A data point will be part of a cluster if it is the closest from its cluster center.
- (5)
- Find new centroids based on the average of the data for each cluster.
- (6)
- Return to step 3.
- (7)
- Stop if there are no data changes in the cluster assignment.
3.2. Proposed Method
- Dat = {d1, d2, d3,…di,…dn}, a set of n data points.
- di = {x1, x2, x3,...xi,….xm}, a set of attributes of one data point.
- F = {f1, f2,…fl,…..fk}, a set of k centroids
- Step 1: For k clusters, divide the cumulative probability equally into (k + 2) parts using the percentile method.
- Step 2: Find (k + 2) values corresponding to the percentile in an empirical cumulative distribution.
- Step 3: Compute the distance between each data pointdi (1 ≤ i ≤ n) to all the initial centroids.
- Step 4: Find the data points di closest to the centroid fl and assign the data di to cluster l.
- Step 5: For each cluster l, recalculate the new centroids.
- Step 6: Compute the distance between the new centroids and each data di.
- Step 7: Continue this process until the data points change in the cluster assignment.
4. Experiments and Results
4.1. Accuracy Verification
4.1.1. Accuracy Verification Using Real World Datasets
4.1.2. Cluster Quality Comparison Using University Data
Description of University Data
Cluster Quality Comparison Using Proposed Method and K-Means
4.2. Electricity Use Pattern Analysis by the Determination of Number of Clusters
5. Discussion
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Energy Technology Perspectives. 2017. Buildings. Available online: http://www.iea.org/buildings (accessed on 3 May 2019).
- Han, J.; Kamber, M.; Pie, J. Data Mining Concepts and Techniques; Academic Press; Morgan Kaufmann Publisher: Waltham, MA, USA, 2012. [Google Scholar]
- Kotsiantis, S.B.; Pintelas, P.E. Recent Advances in Clustering: A Brief Survey. WSEAS Trans. Inf. Sci. Appl. 2004, 1, 73–81. [Google Scholar]
- Khan, S.S.; Ahmad, A. Cluster Centre Initialization Algorithm for K-means clustering. Pattern Recognit. Lett. 2004, 25, 1293–1302. [Google Scholar] [CrossRef]
- Jain, A.K. Algorithm for Clustering Data; Prentice-Hall: Englewood Cliffs, NJ, USA, 1988. [Google Scholar]
- Amri, Y.; Fadhilah, A.L.; Setani, N.; Rani, S. Analysis Clustering of Electricity Usage Profile Using K-Means Algorithm. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2016; Volume 105, p. 012020. Available online: https://iopscience.iop.org/article/10.1088/1757-899X/105/1/012020 (accessed on 12 April 2019).
- Damayanti, R.; Abdullah, A.G.; Purnama, W.; Nandiyanto, A.B. Electricity Load Profile Analysis Using Clustering Techniques. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2017; Volume 180, p. 012081. Available online: https://iopscience.iop.org/article/10.1088/1757-899X/180//012081 (accessed on 26 March 2019).
- Santamouris, M.; Mihalakakou, G.; Patargias, P.; Gaitani, N.; Sfakianaki, K.; Papaglastra, M.; Pavlou, C.; Doukas, P.; Primikiri, E.; Geros, V.; et al. Using Intelligent Clustering Technique to Classify the energy performance of School Buildings. Energy Build. 2007, 39, 45–51. [Google Scholar] [CrossRef]
- Arai, K.; Barakbah, A.R. Hierarchial K-means: An Algorithm for Centroids Initialization for K-means. Rep. Fac. Sci. Eng. Saga Univ. 2007, 36, 25–31. [Google Scholar]
- Yedla, M.; Pathakota, S.R.; Srinivasa, T.M. Enhancing K-means Clustering Algorithm with Improved Initial Center. Int. J. Comput. Sci. Inf. Technol. 2010, 1, 121–125. [Google Scholar]
- Shakti, M.; Antony, S.T. An Effective Determination of Initial Centroids in K-means Clustering Using Kernel PCA. Int. J. Comput. Sci. Inf. Technol. 2011, 2, 955–959. [Google Scholar]
- Huang, J.Z. Automated Variable Weighing in K-Means Type Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 657–668. [Google Scholar] [CrossRef] [PubMed]
- Fahim, A.M. An Efficent Enhanced k-means Clustering algorithm. J. Zhejiang Univ. Sci. A 2006, 7, 1626–1633. [Google Scholar] [CrossRef]
- Prahastono, I.; King, D.J.; Ozveren, C.S. A review of Electricity Load Profile Classification methods. In Proceedings of the 42nd Universities Power Engineering Conference, Brighton, UK, 4–6 September 2007; pp. 1187–1191. [Google Scholar]
- Molina-Solana, M.; Ros, M.; Ruiz, M.D.; Gómez-Romero, J.; Martin-Bautista, M.J. Data Science for Building Management: A review. Renew. Sustain. Energy Rev. 2017, 70, 598–609. [Google Scholar] [CrossRef]
- Kim, S.S. Variable Selection and Outlier Detection for Automated K-means clustering. Commun. Stat. Appl. Methods 2015, 22, 55–67. [Google Scholar] [CrossRef]
- Yu, Z.J.; Haghighat, F.; Fung, B.C.; Morofsky, E.; Yoshino, H. A Methodology for Identifying and Improving Occupant Behavior in Residential Buildings. Energy 2011, 36, 6596–6608. [Google Scholar] [CrossRef]
- Bessa, R.J.; Trindade, A.; Mirinda, V. Spatial-Temporal Solar Power Forecasting for Smart Grids. IEEE Trans. Ind. Inform. 2014, 11, 232–241. [Google Scholar] [CrossRef]
- Ceci, M.; Corizzo, R.; Malerba, D.; Rashkovska, A. Soatial Autocorrelation and Entropy for Renewable Energy Forecasting. Data Min. Knowl. Discov. 2019, 33, 698–729. [Google Scholar] [CrossRef]
- Iglesias, F.; Kastner, W. Analysis of Similarity Measures in Time Series Clustering for the Discovery of Building Energy Patterns. Energies 2013, 6, 579–597. [Google Scholar] [CrossRef]
- UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/dataset.php (accessed on 6 April 2019).
Name of Data | No. of Clusters | K-Means (100 Trials) | Proposed Method Accuracy (%) | ||
---|---|---|---|---|---|
Max (%) | Min (%) | Average (%) | |||
Iris | 3 | 89.3 | 58 | 80.9 | 89.3 |
Wine | 3 | 70.39 | 56 | 65.8 | 70.39 |
Ruspini | 4 | 94 | 79 | 89.2 | 100 |
New-Thyroid | 3 | 86.04 | 79.1 | 82.13 | 86 |
Data Set | No. of Clusters | K-Means (100 Trials) Average | Proposed Method | ||
---|---|---|---|---|---|
Between-Cluster Sum of Square Dist. | Within-Cluster Sum of Square Dist. | Between-Cluster Sum of Square Dist. | Within-Cluster Sum of Square Dist. | ||
Iris | 3 | 585.8 | 95.46 | 602.5 | 78.85 |
Wine | 3 | 15,146,222 | 2,445,691 | 15,221,607 | 2,370,690 |
Ruspini | 4 | 218,108 | 26,226 | 231,493 | 12,881 |
New Thyroid | 3 | 34,679.2 | 29,009.8 | 35,204.8 | 28,876.7 |
No. Clusters (K) | Within-Cluster Sum of Squared Distance | Between-Cluster Sum of Squared Distance | ||
---|---|---|---|---|
K-Means with Random Initialization | Proposed Method | K-Means with Random Initialization | Proposed Method | |
4 | 2.28 × 108 | 2.08 × 108 | 1.66 × 109 | 1.68 × 109 |
5 | 1.65 × 108 | 1.65 × 108 | 1.72 × 109 | 1.72 × 109 |
6 | 1.36 × 108 | 1.30 × 108 | 1.75 × 109 | 1.76 × 109 |
7 | 1.16 × 108 | 1.09 × 108 | 1.59 × 109 | 1.82 × 109 |
Total | 6.45 × 108 | 6.14 × 108 | 6.74 × 109 | 6.95 × 109 |
Cluster No. | Lecture | Air-Conditioning | University | General Occurrences |
---|---|---|---|---|
1 | X | X | closed | Sundays, holidays |
2 | X | X | partially open | Saturdays, holidays with events |
3 | X | ✓ | open | vacations |
4 | ✓ | X | open | Spring and autumn |
5 | ✓ | ✓ | open | Summer and winter |
6 | ✓ | ✓ | open | Summer and winter peak period |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nepal, B.; Yamaha, M.; Sahashi, H.; Yokoe, A. Analysis of Building Electricity Use Pattern Using K-Means Clustering Algorithm by Determination of Better Initial Centroids and Number of Clusters. Energies 2019, 12, 2451. https://doi.org/10.3390/en12122451
Nepal B, Yamaha M, Sahashi H, Yokoe A. Analysis of Building Electricity Use Pattern Using K-Means Clustering Algorithm by Determination of Better Initial Centroids and Number of Clusters. Energies. 2019; 12(12):2451. https://doi.org/10.3390/en12122451
Chicago/Turabian StyleNepal, Bishnu, Motoi Yamaha, Hiroya Sahashi, and Aya Yokoe. 2019. "Analysis of Building Electricity Use Pattern Using K-Means Clustering Algorithm by Determination of Better Initial Centroids and Number of Clusters" Energies 12, no. 12: 2451. https://doi.org/10.3390/en12122451
APA StyleNepal, B., Yamaha, M., Sahashi, H., & Yokoe, A. (2019). Analysis of Building Electricity Use Pattern Using K-Means Clustering Algorithm by Determination of Better Initial Centroids and Number of Clusters. Energies, 12(12), 2451. https://doi.org/10.3390/en12122451