Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates
<p>Schematic diagram of the measurement system utilizing the algorithms developed in this study.</p> "> Figure 2
<p>Composition of an example scaled value of the 3-byte number format.</p> "> Figure 3
<p>Conversion algorithm of 2-byte and 3-byte number formats.</p> "> Figure 4
<p>Data write performance in the tests using the default buffer size. (<b>a</b>) File size = 128 MB and (<b>b</b>) file size = 256 MB.</p> "> Figure 5
<p>Data write performance in tests using the configured buffer size. (<b>a</b>) File size = 128 MB and (<b>b</b>) file size = 256 MB.</p> "> Figure 6
<p>Data write performance of tests using a self-created buffer. (<b>a</b>) Buffer size = 8 MB, (<b>b</b>) buffer size = 32 MB, (<b>c</b>) buffer size = 64 MB, (<b>d</b>) buffer size = 128 MB, (<b>e</b>) buffer size = 256 MB, and (<b>f</b>) buffer size = 512 MB.</p> "> Figure 6 Cont.
<p>Data write performance of tests using a self-created buffer. (<b>a</b>) Buffer size = 8 MB, (<b>b</b>) buffer size = 32 MB, (<b>c</b>) buffer size = 64 MB, (<b>d</b>) buffer size = 128 MB, (<b>e</b>) buffer size = 256 MB, and (<b>f</b>) buffer size = 512 MB.</p> "> Figure 7
<p>Data write performance of the multi-threaded write. (<b>a</b>) Buffer size = 128 MB and (<b>b</b>) buffer size = 256 MB.</p> "> Figure 8
<p>Data distribution across the three drives, based on the principle of RAID 0.</p> "> Figure 9
<p>Configuration to store data on multiple drives.</p> "> Figure 10
<p>Operating window of the program in a high-performance measurement test.</p> "> Figure 11
<p>Configuration of the hydraulic accumulator.</p> "> Figure 12
<p>Installation of acoustic emission sensors.</p> "> Figure 13
<p>AE event wave measured by the Kistler sensors. (<b>a</b>) At position P1, (<b>b</b>) at position P2, and (<b>c</b>) at position P3. (<b>d</b>) Comparison of the arrival times of the AE event wave to positions P1, P2, and P3.</p> "> Figure 13 Cont.
<p>AE event wave measured by the Kistler sensors. (<b>a</b>) At position P1, (<b>b</b>) at position P2, and (<b>c</b>) at position P3. (<b>d</b>) Comparison of the arrival times of the AE event wave to positions P1, P2, and P3.</p> "> Figure 14
<p>Failure of the hydraulic accumulator after 29 h of the load test.</p> ">
Abstract
:Featured Application
Abstract
1. Introduction
2. Algorithms for Data Encoding to Minimize the File Size
3. Algorithms to Improve the Data Write Rate
3.1. Data Write Performance Tests Using Text Formats
- Step 1: A total of 10,000,000 values were created as the Double type.
- Step 2: A new blank file was created.
- Step 3: The values were written into the file. Two options are available:
- -
- Each value is converted into a string and then written into the file individually.
- -
- Every value is converted into a string, forming a large string, and then written into the file.
- Step 4: The file was closed.
3.2. Data Write Performance Tests Using Binary Formats
3.2.1. Approach 1. Writing the Values into Files Immediately after They Are Acquired Using Internal Buffers
- Step 1: A new blank file was created.
- Step 2: Overall, 1 MB of data was read from the data pack and written into the data file. This step simulates the process of acquiring the sensor reading values and then writing them into a file.
- Step 3: The current file was closed, and the process was started again from Step 1. A new file was created to save the data.
- The process was repeated from Step 2 until the entire 5 GB of data had been written.
- Increasing the size of each file to 512 MB.
- Increasing the read data chunk size to 5, 10, and 20 MB.
3.2.2. Approach 2. Storing the Values in a Self-Created Buffer before Writing
- Step 1: An array was created to serve as a buffer memory.
- Step 2: A new blank file was created.
- Step 3: The values were read from the data pack and copied into the buffer. When the buffer became full, all the values were copied into a data block.
- Step 4: The data block was written into the file, which was then closed.
- Step 5: The buffer was reused as a new buffer for new values. A new file was created to store the data.
3.2.3. Approach 3. Parallel Writing
3.2.4. Discussion
3.2.5. Approach 4. Multidrive Write
- The entire data on the hard drives are erased during the RAID setup.
- A lack of flexibility in the configuration, such as adding or removing a drive, necessitates starting anew and erasing all the existing data.
- Unable to read data when one of the drives is damaged.
- To maximize the write performance, all hard drives must have the same capacity and write rate. For example, combining a 512 GB capacity and 120 MB/s write rate drive with a 128 GB capacity and 150 MB/s write rate drive in an RAID 0 system results in a system with a capacity of 128 × 2 = 256 GB and a write rate of MB/s. The achievement level is based on the lowest capacity and rate.
- Step 1: The configurations were set, as described in Figure 9:
- -
- Three distinct folders were selected to store the data, each located on a different drive.
- -
- Setting the buffer size—this is the configuration of a self-created buffer.
- Step 2: A new data file located in folder #1 was created.
- Step 3: The values were read from the data pack and copied into the buffer. When the buffer became full, all the values were copied to a data block.
- Step 4: A new thread was created to write the data block to the file and then closed.
- Step 5: The buffer was reused as a new buffer for the new values. A new file located in folder #2 was created to store the data.
- Maintain all the existing data on the hard drives.
- The configuration is flexible, allowing the addition or removal of a hard drive without affecting the existing data.
- Data on the remaining drives are still accessible if one of the drives is damaged.
- The use of hard drives with similar write rates is recommended, although the same capacity is not necessary.
4. Verification of High-Performance Measurements
- Acquiring sensor reading values from the DAQ device.
- Analyzing the data in real time to detect any anomalies or trends.
- Visualizing the data to provide meaningful insights.
- Encoding the data and writing them into storage.
4.1. Virtual Measurement Using Simulated DAQ Devices
- (1)
- Store data. The NI-9223 module type has a 16-bit resolution, making the 2-byte number format recommended for storing the measured data. However, the 3-byte number format was selected to verify the capabilities of the program. This format provided a 24-bit resolution that was greater than that of the DAQ devices. It did not make sense to improve the accuracy of the stored values but instead increase the amount of data generated for testing. The actual input range of ±10.7 V was set. As configured, this system acquired 56 mega-samples every second for all channels, producing 168 MB of data every second, thereby requiring a write rate of at least 168 MB/s.The average data write rates of the three HDDs were reported in Section 3 as 162, 172, and 138 MB/s, respectively. Therefore, only HDD #2 supported the required write rate and was selected to store the data. The target buffer size was set to 512 MB to maximize the write performance. The program automatically calculated and then applied an appropriate buffer size of 508 MB, which was closest to the target size. With this configuration, the buffer was filled every 3 s, and the entire dataset was written into the hard drive. The configuration for obtaining the measurements performed well.Another setting was also tested. Now, a combination of HDDs #1 and #3 was selected to store the data instead of HDD #2. A multidrive write was used for this measurement. The other parameters were configured as those for the previous test. This test was well implemented and successfully wrote the data into multiple files located across HDDs #1 and #3, even though neither drive could individually support the required write rate.
- (2)
- Skipping samples for visualization. The number of samples acquired every second was significantly large, leading to the visualization of the signals, which consumed a large amount of computing resources. Plotting all samples can be time-consuming and can impact the computer performance for other tasks. Notably, all tasks on the current data must be completed before new data are available. This program provides an option for skipping samples during signal plotting. For instance, let us set the number of samples to 10. Accordingly, for each group of 10 samples, only the first sample is plotted, and the others are skipped. This configuration reduces the number of samples that need to be plotted 10 times and accelerates the plotting process. By skipping the samples, a plot can be generated more rapidly while still providing a reasonable representation of the signal. This feature helps conserve computer resources and improve the performance of other tasks.
- (3)
- Alarm. In actual measurements using sensors, it is not always easy to recognize abnormal signals by simply observing the signal plots. This becomes even more difficult when the option of skipping samples for visualization is employed. Abnormal signals, such as AEs, can be automatically detected by activating the alarm feature in the program. Various configuration parameters are available for alarm supervision, such as peak, mean, and root mean square (RMS) values. However, as the measurements were conducted using simulated devices in this study, the signal waveforms were almost sinusoidal. The signals included noise but were insignificant, as shown in Figure 10. No trends or abnormalities were observed for these signals. Nevertheless, the alarm feature was deliberately activated, serving only to add additional workload to the testing program. Despite this challenge, the program successfully performed the tests.
- (4)
- Scheduled measurement. For long-term condition-monitoring applications, storing sensor data over an extended period can result in a significant amount of required storage space, particularly in AE-monitoring cases. To address this issue, the sensor reading values can be stored at specific times and durations. For instance, data were stored for 1 min for every 5 min. This approach saves storage space without sacrificing critical information. Additionally, it is possible to identify trends, patterns, and anomalies in the signals.
4.2. Actual Measurement with the Application of Acoustic Emission Monitoring
5. Conclusions
- To reduce the file size, values can be encoded using a specialized binary format regarding the measurement range and resolution of the employed DAQ device. The 3-byte number format requires 3 bytes to represent a value for a 24-bit resolution. This format reduces the size of files by 3.3 times compared with the text format with the same precision. For AE measurements, most DAQ devices are 16-bit resolution types, making the 2-byte format ideal for achieving 16-bit resolution and reducing file sizes by 4 times compared with the text format with the same precision. Additionally, the 4-byte number format can satisfy the data storage requirements for special cases that require a 32-bit resolution.
- Regarding the write rate, storing sensor reading values into a file immediately after they become available results in poor write performance. To overcome this issue, the use of a larger self-created buffer is recommended to achieve a better write rate. The optimal buffer size range was 128 MB to 512 MB, beyond which the improvement became less significant. Furthermore, each write process should be performed by a new thread to allow the processing of newly acquired data while the previous data are being written. A new file can also be created for each write to avoid errors. Implementing these strategies resulted in a 10× faster write rate, which, in combination with a 4× smaller file size, achieved a 40× reduction in the write time or a 40× better write performance. HDD #2 exhibited a write rate of 172 MB/s, providing the capability to perform measurements at 86 MS/s.
- Combining multiple hard drives to achieve the sum of the write rates of all drives can lead to a significant increase in write performance by multiples compared with a scenario using a single drive. This approach is similar to the RAID 0 technique but addresses all its drawbacks. Accordingly, the measurement data were written into multiple individual files located in order across the selected drives, and a configuration file was created to provide sufficient information to collect the individual files and subsequently merge them into a single large file. Alternatively, each data file could be read independently. This provides a simple yet effective approach to enhance the write performance for data storage, allowing for faster data acquisition and processing.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kuo, S.M.; Lee, B.H.; Tian, W. Real-Time Digital Signal Processing: Fundamentals, Implementations and Application, 3rd ed.; Wiley: Chichester, West Sussex, UK, 2013. [Google Scholar]
- Proakis, J.G.; Manolakis, D.G. Digital Signal Processing: Pearson New International Edition, 4th ed.; Pearson: London, UK, 2013. [Google Scholar]
- Smith, S.W. Digital Signal Processing: A Practical Guide for Engineers and Scientists, 3rd ed.; Elsevier Science & Technology: Saint Louis, MO, USA, 2013. [Google Scholar]
- What You Really Need to Know About Sample Rate. Available online: https://www.dataq.com/data-acquisition/general-education-tutorials/what-you-really-need-to-know-about-sample-rate.html (accessed on 30 July 2024).
- Grosse, C.; Ohtsu, M.; Aggelis, D.; Shiotani, T. Acoustic Emission Testing: Basics for Research–Applications in Engineering; Springer: Berlin, Germany, 2022. [Google Scholar]
- Nazarchuk, Z.T.; Skalskyi, V.; Serhiyenko, O. Acoustic Emission: Methodology and Application, 1st ed.; Springer: Berlin, Germany, 2017. [Google Scholar]
- Moore, P.O.; Miller, R.K.; Hill, E. Nondestructive Testing Handbook, Vol. 6-Acoustic Emission Testing; American Society for NDT Inc.: Columbus, OH, USA, 2005; pp. 147–190. [Google Scholar]
- Rai, A.; Kim, J.-M. A novel pipeline leak detection approach independent of prior failure information. Measurement 2021, 167, 108284. [Google Scholar] [CrossRef]
- Li, S.Z.; Song, Y.J.; Zhou, G.Q. Leak detection of water distribution pipeline subject to failure of socket joint based on acoustic emission and pattern recognition. Measurement 2018, 115, 39–44. [Google Scholar] [CrossRef]
- Ma, G.; Wu, C.; Hwang, H.-J.; Li, B. Crack monitoring and damage assessment of BFRP-jacketed concrete cylinders under compression load based on acoustic emission techniques. Constr. Build. Mater. 2021, 272, 121936. [Google Scholar] [CrossRef]
- Zhang, H.; Lin, Z. Analytical solution of acoustic emission in soft material with cracks by using reciprocity theorem. Eng. Fract. Mech. 2023, 277, 108996. [Google Scholar] [CrossRef]
- Hosseini, S.M.; Azadi, M.; Ghasemi-Ghalebahman, A.; Jafari, S.M. Fatigue crack initiation detection in ductile cast iron crankshaft under rotating bending fatigue test using the acoustic emission entropy method. Eng. Fail. Anal. 2023, 144, 106981. [Google Scholar] [CrossRef]
- Wang, X.; Zou, Q.; Wang, R.; Li, Z.; Zhang, T. Deformation and acoustic emission characteristics of coal with different water saturations under cyclic load. Soil Dyn. Earthq. Eng. 2022, 162, 107468. [Google Scholar] [CrossRef]
- Teng, M.; Bi, J.; Zhao, Y.; Wang, C. Experimental study on shear failure modes and acoustic emission characteristics of rock-like materials containing embedded 3D flaw. Theor. Appl. Fract. Mech. 2023, 124, 103750. [Google Scholar] [CrossRef]
- Eaton, M.J.; Pullin, R.; Holford, K.M. Acoustic emission source location in composite materials using Delta T Mapping. Compos. Part A Appl. Sci. Manuf. 2012, 43, 856–863. [Google Scholar] [CrossRef]
- Na, K.; Yoon, H.; Kim, J.; Kim, S.; Youn, B.D. PERL: Probabilistic energy-ratio-based localization for boiler tube leaks using descriptors of acoustic emission signals. Reliab. Eng. Syst. Saf. 2023, 230, 108923. [Google Scholar] [CrossRef]
- McCrory, J.P.; Al-Jumaili, S.K.; Crivelli, D.; Pearson, M.R.; Eaton, M.J.; Featherston, C.A.; Guagliano, M.; Holford, K.M.; Pullin, R. Damage classification in carbon fibre composites using acoustic emission: A comparison of three techniques. Compos. Part B Eng. 2015, 68, 424–430. [Google Scholar] [CrossRef]
- Lurie, A.I. Theory of Elasticity; Springer: Berlin, Germany, 2010. [Google Scholar]
- Okawai, H.; Tanaka, M.; Dunn, F. Non-contact acoustic method for the simultaneous measurement of thickness and acoustic properties of biological tissues. Ultrasonics 1990, 28, 401–410. [Google Scholar] [CrossRef] [PubMed]
- Chubachi, N.; Kanai, H. Noncontact AE measurement system using acoustic microscope. Electron. Lett. 1991, 27, 2104–2105. [Google Scholar] [CrossRef]
- Hundt, W.; Leuenberger, D.; Rehsteiner, F.; Gygax, P. An approach to monitoring of the grinding process using acoustic emission (AE) technique. CIRP Ann. 1994, 43, 295–298. [Google Scholar] [CrossRef]
- SSD vs. HDD. Available online: https://tekie.com/blog/hardware/ssd-vs-hdd-speed-lifespan-and-reliability/ (accessed on 30 July 2024).
- Advantages and Disadvantages of SSHDs (Solid State Hybrid Drives). Available online: https://www.lifewire.com/solid-state-hybrid-drive-833451 (accessed on 30 July 2024).
- ISO/IEC 60559:2020: Information Technology—Microprocessor Systems—Floating-Point Arithmetic; International Organization for Standardization: London, UK, 2020.
- Standard for Floating-Point Arithmetic. In IEEE Std 754-2008; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2008; pp. 1–70. [CrossRef]
- Standard for Floating-Point Arithmetic. In IEEE Std 754-2019 (Revision of IEEE 754-2008); Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2019; pp. 1–84. [CrossRef]
- Data Acquisition Handbook; Measurement Computing Corporation: Norton, MA, USA, 2012.
- Bauermann, L.P.; Mesquita, L.V.; Bischoff, C.; Drews, M.; Fitz, O.; Heuer, A.; Biro, D. Scanning acoustic microscopy as a non-destructive imaging tool to localize defects inside battery cells. J. Power Sources Adv. 2020, 6, 100035. [Google Scholar] [CrossRef]
- Morokov, E.; Levin, V.; Chernov, A.; Shanygin, A. High resolution ply-by-ply ultrasound imaging of impact damage in thick CFRP laminates by high-frequency acoustic microscopy. Compos. Struct. 2021, 256, 113102. [Google Scholar] [CrossRef]
- Learn Microsoft, Floating-Point Numeric Types (C# Reference). Available online: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types (accessed on 30 July 2024).
- Learn Microsoft, StreamWriter Class (System.IO). Available online: https://learn.microsoft.com/en-us/dotnet/api/system.io.streamwriter?view=net-7.0 (accessed on 30 July 2024).
Manufacturer | Western Digital |
---|---|
Model | WD10EZEX |
Storage capacity | 1000 GB |
Free capacity | 750 GB |
Connectivity technology | SATA 6 GB/s |
Form factor | 3.5 inches |
Rotational speed | 7200 rotations per minute (rpm) |
Power-on hours | 31,280 h |
Year | 2012 |
Configuration | Average Data Write Rate (MB/s) | |
---|---|---|
Approach No. 1 | Default buffer size | |
File size = 128 MB | 124 | |
File size = 256 MB | 125 | |
Configured buffer size | ||
Buffer size = file size = 128 MB | 150 | |
Buffer size = file size = 128 MB | 154 | |
Approach No. 2 | Self-created buffer | |
Buffer size = file size = 8 MB | 136 | |
Buffer size = file size = 32 MB | 145 | |
Buffer size = file size = 64 MB | 151 | |
Buffer size = file size = 128 MB | 157 | |
Buffer size = file size = 256 MB | 160 | |
Buffer size = file size = 512 MB | 162 | |
Approach No. 3 | Parallel writing | |
Buffer size = file size = 128 MB | 110 | |
Buffer size = file size = 256 MB | 120 |
HDD #1 | HDD #2 | HDD #3 | |
---|---|---|---|
Manufacturer | Western Digital | Western Digital | Toshiba |
Model | WD10EZEX | WD10EZEX | DT01ACA200 |
Storage capacity | 1000 GB | 1000 GB | 2000 GB |
Free capacity | 750 GB | 750 GB | 1200 GB |
Connectivity technology | SATA 6 GB/s | SATA 6 GB/s | SATA 6 GB/s |
Form factor | 3.5 inches | 3.5 inches | 3.5 inches |
Rotational speed | 7200 rpm | 7200 rpm | 7200 rpm |
Power-on hours | 31,280 h | 8180 h | 8170 h |
Year | 2012 | 2012 | 2013 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vuong, Q.D.; Seo, K.; Choi, H.; Kim, Y.; Lee, J.-w.; Lee, J.-u. Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates. Appl. Sci. 2024, 14, 7410. https://doi.org/10.3390/app14167410
Vuong QD, Seo K, Choi H, Kim Y, Lee J-w, Lee J-u. Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates. Applied Sciences. 2024; 14(16):7410. https://doi.org/10.3390/app14167410
Chicago/Turabian StyleVuong, Quang Dao, Kanghyun Seo, Hyejin Choi, Youngmin Kim, Ji-woong Lee, and Jae-ung Lee. 2024. "Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates" Applied Sciences 14, no. 16: 7410. https://doi.org/10.3390/app14167410
APA StyleVuong, Q. D., Seo, K., Choi, H., Kim, Y., Lee, J.-w., & Lee, J.-u. (2024). Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates. Applied Sciences, 14(16), 7410. https://doi.org/10.3390/app14167410