-
Notifications
You must be signed in to change notification settings - Fork 23
DRAM simulation #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, Thank you for your interest in uPIMulator. You are correct in observing that uPIMulator currently does not simulate DRAM refreshes. To integrate Ramulator for DRAM simulation, you will need to modify two key functions within the
Crucially, you will also need to integrate the Ramulator cycle function into your simulation loop to advance the Ramulator simulation time. Please do not hesitate to contact us if you encounter any issues during this integration process. We are happy to provide further assistance. Thank you, |
Hello, Thank you very much for your response! I've started implementing my integration with Ramulator, and so far it's going smoothly. Nevertheless, while doing the integration I stumbled upon the default DDR4 timing parameters in argument_parser->add_option("t_rcd", util::ArgumentParser::INT,
"32"); // based on DDR4-2400
argument_parser->add_option("t_ras", util::ArgumentParser::INT,
"78"); // based on DDR4-2400
argument_parser->add_option("t_rp", util::ArgumentParser::INT,
"32"); // based on DDR4-2400
argument_parser->add_option("t_cl", util::ArgumentParser::INT,
"32"); // based on DDR4-2400
argument_parser->add_option("t_bl", util::ArgumentParser::INT,
"8"); // based on DDR4-2400 These are consistently double the timing parameters that are described in the paper, and double the DDR4 timing parameters of Ramulator 2: inline static const std::map<std::string, std::vector<int>> timing_presets = {
// name rate nBL nCL nRCD nRP nRAS nRC nWR nRTP nCWL nCCDS nCCDL nRRDS nRRDL nWTRS nWTRL nFAW nRFC nREFI nCS, tCK_ps
// ...
{"DDR4_2400P", {2400, 4, 15, 15, 15, 39, 54, 18, 9, 12, 4, 6, -1, -1, 3, 9, -1, -1, -1, 2, 833} },
{"DDR4_2400R", {2400, 4, 16, 16, 16, 39, 55, 18, 9, 12, 4, 6, -1, -1, 3, 9, -1, -1, -1, 2, 833} },
{"DDR4_2400U", {2400, 4, 17, 17, 17, 39, 56, 18, 9, 12, 4, 6, -1, -1, 3, 9, -1, -1, -1, 2, 833} },
{"DDR4_2400T", {2400, 4, 18, 18, 18, 39, 57, 18, 9, 12, 4, 6, -1, -1, 3, 9, -1, -1, -1, 2, 833} },
// ...
}; I also traced the DRAM memory commands issued by uPIMulator current memory model and their times, and indeed, the timings such as Is there any reason why this is the default instead of the timing parameters described in the paper? Am I missing anything here? Thank you very much in advance. Best regards, |
Additionally, I have another question about how you model burst ( Below there's a log of the memory accesses that are generated into the reorder buffer from a DMA command when running the VA benchmark on 1 DPU and 16 tasklets. Inside the memory scheduler, the DMA command is divided into 8-byte memory accesses:
Below there's a timing extract of the MRAM DRAM commands that are scheduled when executing the VA benchmark on 1 DPU and 16 tasklets (the first number is the cycle in which the DRAM command is popped from
If I'm not wrong, the MRAM has a 64-bit wide access port, as detailed in UPMEM's HotChips presentation, slide 14. Therefore, each cycle, 8 bytes (64-bit) could be transferred. However, DDR4 has a burst length of 8 words ( Nevertheless, in the command trace generated from uPIMulator's DRAM model, getting 2 consecutive words in a burst (each word being 8 bytes), takes 33 cycles ( Am I missing anything here? Where's my mistake? Shouldn't the scheduler only generate DRAM accesses of size 64 bytes each, which will then have the read access latency of Again, thank you very much, and sorry for bothering you. Best regards, |
The DRAM timing parameters are doubled because the DRAM clock frequency is also doubled. For example, our default DDR-2400 clock frequency is 2400 MHz. In contrast, the DRAM timing parameters used in our paper and Ramulator are halved because their clock frequency is 1200 MHz. Despite this difference, the overall timing behavior remains the same between the two approaches. Regarding the burst length, we are not entirely certain how UPMEM has implemented MRAM. However, our assumption is based on standard DDR memory, where each of the eight DRAM chips provides 8 bytes, resulting in a total of 8 × 8 = 64 bytes when the burst length is eight. Since MRAM consists of only a single DRAM bank, we assume its bandwidth is one-eighth of that, meaning it provides 8 bytes rather than 64 bytes. If a single MRAM were capable of providing 64 bytes, the aggregate bandwidth would be eight times greater than that of standard DDR-2400. That said, we acknowledge that this is an assumption, and we welcome any insights or corrections. If you have well-founded reasoning or data, we encourage you to modify uPIMulator accordingly. If you have any further questions, please feel free to ask. Best regards, |
Thank you very much for your detailed explanation! It's very clear and makes complete sense. I'll continue fiddling with uPIMulator and will reach out if I have any additional questions. Best regards, |
Hello,
First of all, thank you for open sourcing the simulator, it's very useful!
I'm trying to play with the DRAM simulator part, and I'm having some trouble:
Thank you very much!
The text was updated successfully, but these errors were encountered: