8000 Netstat by sreedharamzn · Pull Request #70 · aws/aperf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Netstat #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 26, 2023
Merged

Netstat #70

merged 6 commits into from
Jul 26, 2023

Conversation

sreedharamzn
Copy link
Collaborator
@sreedharamzn sreedharamzn commented Jul 24, 2023

Collect and visualize netstat data. aperf was used to collect data during a network load simulation exercise.

aperf_report_netstat_simulation.tar.gz

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

fn collect_data(&mut self) -> Result<()> {
self.time = TimeEnum::DateTime(Utc::now());
self.data = String::new();
self.data = std::fs::read_to_string("/proc/net/netstat")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the average increase in time taken for aperf before and after this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 samples collected on Ubuntu 22.04 (m6a.4xlarge) ec2 instance:
Average Time (with Netstat) = 4.316205832ms
Average Time (without Netstat) = 4.198065059ms
Average increase in Time = 0.118140773ms

100 samples collected on AL Cloud Desktop machine:
Average Time (with Netstat) = 10.06244311ms
Average Time (without Netstat) = 9.958092307ms
Average increase in Time = 0.104350803ms

Copy link
Contributor
@wash-amzn wash-amzn Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a big difference in times between the two environments (I mean regardless of your change). The cloud dev desktop is sporting a higher CPU count, right?

Copy link
Collaborator Author
@sreedharamzn sreedharamzn Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My cloud dev desktop is having same CPU count (16) as that of m6a.4xlarge ec2 instance.

However, sizes of CPU cache & main memory of my cloud dev desktop, is significantly lesser than that of m6a.4xlarge ec2 instance. (documented it below for your reference)

Cloud dev desktop
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 16
Model name: Intel(R) Xeon(R) CPU E5@ 2.90GHz
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
MemTotal: 30779248 kB

m6a.4xlarge ec2 instance
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 16
Model name: AMD EPYC 7R13 Processor
L1d cache: 256 KiB (8 instances
L1i cache: 256 KiB (8 instances)
L2 cache: 4 MiB (8 instances)
L3 cache: 32 MiB (1 instance)
MemTotal: 64191680 kB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely aperf isn't (for one or two reasons) collecting PMU stats on your AMD instance, resulting in lower collection times.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. PMU stats collection is getting skipped because of unsupported CPU.

"[2023-07-26T20:15:47Z ERROR aperf_lib] Excluding perf_stat from collection. Error msg: Unsupported CPU"

let mut lines = reader.lines();

while let (
Some(line1), Some(line2)) = (lines.next(), lines.next()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if line1 has a value and line2 is None. Will it continue? It might cause issues later on since we blindly use .unwrap().

Copy link
Collaborator Author
@sreedharamzn sreedharamzn Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per '/proc/net/netstat' data contract, header row will always succeeded by a data row.
So, line2 is always expected to be present when line1 is available.

I also wanted the tool to fail, if the data contract is violated so we get noticed and can take appropriate action.
But let me know, if this needs to be absorbed and no exception should be raised in the unlikely scenario when line2 is None.
I'll take care of the necessary changes.


for name in names {
let val = values.next().ok_or(PDError::ProcessorOptionExtractError)?;
map.insert(tag.to_owned() + " " + &name.to_owned(), val.parse::<i64>()?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever expect the value to be signed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Have changed this to u64

}

fn get_entries(value: Netstat) -> Result<String> {
let mut keys: Vec<String> = value.netstat_data.into_keys().collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A HashMap does not maintain the order of the keys. Every hashmap will produce a different output in the report.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Currently i have a sorting logic in place for ordering the plots lexicographically. In my next CR, i'll change this to a Vector and get rid of the sorting logic.


fn get_entries(value: Netstat) -> Result<String> {
let mut keys: Vec<String> = value.netstat_data.into_keys().collect();
keys.sort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we sort the keys?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ordering the plots lexicographically based on the parameter names.

ProcessedData::Netstat(ref value) => values.push(value.clone()),
_ => unreachable!(),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a check for the number of params here and in the "values" call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the check and panicking if params & values count differ.

assert!(netstat.collect_data().unwrap() == ());
buffer.push(Data::NetstatRaw(netstat));
processed_buffer.push(Netstat::new().process_raw_data(buffer[0].clone()).unwrap());
let json = Netstat::new().get_data(processed_buffer, "run=test&get=values&key=IpExt: InNoECTPkts".to_string()).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to be there in the build system where the unit test will be run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this on AL 2, Ubuntu 22.04 & RHEL 9 ec2 instances and the unit test always succeeded.

title: 'Time (s)',
},
yaxis: {
title: 'Packets',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true for all the keys?

Copy link
Collaborator Author
@sreedharamzn sreedharamzn Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have changed this to 'Count', to keep it generic. Will work on a separate CR to add custom Units.

F438
.ok_or(PDError::VisualizerNetstatValueGetError(key.to_string()))?;

let mut v = *curr_value;
if !key.contains("nr_") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it

fn test_collect_data() {
let mut netstat = NetstatRaw::new();

assert!(netstat.collect_data().unwrap() == ());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change it if it's what all the other tests in the project are doing, but assert!(result.is_ok()) would be nicer than assert!(result.unwrap() == ())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since all other tests are using the same, left it unchanged.

@@ -19,6 +19,7 @@ <h2>APerf</h2>
<button class="tablinks" name="interrupts">Interrupt Data</button>
<button class="tablinks" name="disk_stats">Disk Stats</button>
<button class="tablinks" name="perfstat">PMU Stats</button>
<button class="tablinks" name="netstat">NET Stat</button>
Copy link
Contributor

Choose a reason for hiding this comment

10000

The reason will be displayed to describe this comment to others. Learn more.

NET isn't an acronym ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Changed the name to 'Netstat'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the others, I'd say it should be "Net Stats"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor
@wash-amzn wash-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Janak also has to approve since he had review feedback

@janaknat janaknat merged commit 9905a3e into main Jul 26, 2023
@janaknat janaknat deleted the netstat branch August 15, 2023 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0