8000 DRAM simulation · Issue #8 · VIA-Research/uPIMulator · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

DRAM simulation #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
palmenros opened this issue Jan 30, 2025 · 5 comments
Open

DRAM simulation #8

palmenros opened this issue Jan 30, 2025 · 5 comments

Comments

@palmenros
Copy link

Hello,

First of all, thank you for open sourcing the simulator, it's very useful!

I'm trying to play with the DRAM simulator part, and I'm having some trouble:

  1. I cannot find any parameter in the source code modelling the DRAM refresh interval. How can I modify it? Is DRAM refresh modelled in uPIMulator?
  2. In your paper, you mention that your simulator runs 2.5x slower when interfaced with Ramulator. Do you have the source code for interfacing uPIMulator with Ramulator (or other more accurate DRAM simulator)? If not, could you give me some guidance on where in the source code it would be best to interface it?

Thank you very much!

@bongjoonhyun
Copy link
Collaborator

Hello,

Thank you for your interest in uPIMulator.

You are correct in observing that uPIMulator currently does not simulate DRAM refreshes.

To integrate Ramulator for DRAM simulation, you will need to modify two key functions within the MemoryController class: service_memory_command_q() and service_row_buffer().

service_memory_command_q(): This function is responsible for converting uPIMulator's MemoryCommand objects into the equivalent Ramulator memory command format. After the conversion, the Ramulator command should be issued to the Ramulator instance.

service_row_buffer(): This function handles the retrieval of completed memory operations from Ramulator. The returned Ramulator commands must be converted back into uPIMulator's MemoryCommand format.

Crucially, you will also need to integrate the Ramulator cycle function into your simulation loop to advance the Ramulator simulation time.

Please do not hesitate to contact us if you encounter any issues during this integration process. We are happy to provide further assistance.

Thank you,
Bongjoon Hyun

@palmenros
Copy link
Author

Hello,

Thank you very much for your response! I've started implementing my integration with Ramulator, and so far it's going smoothly.

Nevertheless, while doing the integration I stumbled upon the default DDR4 timing parameters in python_cpp/uPIMulator_backend/src/main.cc:

  argument_parser->add_option("t_rcd", util::ArgumentParser::INT,
                              "32");  // based on DDR4-2400
  argument_parser->add_option("t_ras", util::ArgumentParser::INT,
                              "78");  // based on DDR4-2400
  argument_parser->add_option("t_rp", util::ArgumentParser::INT,
                              "32");  // based on DDR4-2400
  argument_parser->add_option("t_cl", util::ArgumentParser::INT,
                              "32");  // based on DDR4-2400
  argument_parser->add_option("t_bl", util::ArgumentParser::INT,
                              "8");  // based on DDR4-2400

These are consistently double the timing parameters that are described in the paper, and double the DDR4 timing parameters of Ramulator 2:

  inline static const std::map<std::string, std::vector<int>> timing_presets = {
    //   name       rate   nBL  nCL  nRCD  nRP   nRAS  nRC   nWR  nRTP nCWL nCCDS nCCDL nRRDS nRRDL nWTRS nWTRL nFAW  nRFC nREFI nCS,  tCK_ps
    // ...
    {"DDR4_2400P",  {2400,   4,  15,  15,   15,   39,   54,   18,   9,   12,   4,    6,   -1,   -1,    3,    9,   -1,  -1,  -1,   2,    833} },
    {"DDR4_2400R",  {2400,   4,  16,  16,   16,   39,   55,   18,   9,   12,   4,    6,   -1,   -1,    3,    9,   -1,  -1,  -1,   2,    833} },
    {"DDR4_2400U",  {2400,   4,  17,  17,   17,   39,   56,   18,   9,   12,   4,    6,   -1,   -1,    3,    9,   -1,  -1,  -1,   2,    833} },
    {"DDR4_2400T",  {2400,   4,  18,  18,   18,   39,   57,   18,   9,   12,   4,    6,   -1,   -1,    3,    9,   -1,  -1,  -1,   2,    833} },
    // ...
  };

I also traced the DRAM memory commands issued by uPIMulator current memory model and their times, and indeed, the timings such as t_cl are double than expected.

Is there any reason why this is the default instead of the timing parameters described in the paper? Am I missing anything here?

Thank you very much in advance.

Best regards,
Pedro

@palmenros
Copy link
Author

Additionally, I have another question about how you model burst (n_bl):

Below there's a log of the memory accesses that are generated into the reorder buffer from a DMA command when running the VA benchmark on 1 DPU and 16 tasklets. Inside the memory scheduler, the DMA command is divided into 8-byte memory accesses:

[2025-02-19 17:16:20.995] [info] =DMACommand(READ, mram_start=0x183808, size=1024, mram_end=0x183c08); MRAM_BASE=0x80000
[2025-02-19 17:16:20.995] [info]  |-> MemAccess(addr=0x183808, size=8)
[2025-02-19 17:16:20.995] [info]  |-> MemAccess(addr=0x183810, size=8)
[2025-02-19 17:16:20.995] [info]  |-> ...
[2025-02-19 17:16:20.995] [info]  |-> MemAccess(addr=0x183c00, size=8)

Below there's a timing extract of the MRAM DRAM commands that are scheduled when executing the VA benchmark on 1 DPU and 16 tasklets (the first number is the cycle in which the DRAM command is popped from input_q_ into its respective queue, and for the DRAM addressing vector, only row and column are relevant):

cycle, cmd, Ch, Ra, Bg, Ba, Ro, Co
229407, ACT, 0, 0, 0, 0, 1038, 0
229440, RD, 0, 0, 0, 0, 1038, 8
229473, RD, 0, 0, 0, 0, 1038, 16
229506, RD, 0, 0, 0, 0, 1038, 24
229539, RD, 0, 0, 0, 0, 1038, 32
229572, RD, 0, 0, 0, 0, 1038, 40

If I'm not wrong, the MRAM has a 64-bit wide access port, as detailed in UPMEM's HotChips presentation, slide 14. Therefore, each cycle, 8 bytes (64-bit) could be transferred. However, DDR4 has a burst length of 8 words (BL8, in Ramulator 2 it's m_internal_prefetch_size), which means that actually 8*8=64 bytes are transferred in t_bl cycles (8/2=4 cycles).

Nevertheless, in the command trace generated from uPIMulator's DRAM model, getting 2 consecutive words in a burst (each word being 8 bytes), takes 33 cycles (t_cl), instead of taking just half a cycle.

Am I missing anything here? Where's my mistake? Shouldn't the scheduler only generate DRAM accesses of size 64 bytes each, which will then have the read access latency of t_cl + t_bl and will actually transfer 8 words of 8 bytes (therefore 64 bytes)?

Again, thank you very much, and sorry for bothering you.

Best regards,
Pedro

@bongjoonhyun
Copy link
Collaborator

The DRAM timing parameters are doubled because the DRAM clock frequency is also doubled. For example, our default DDR-2400 clock frequency is 2400 MHz. In contrast, the DRAM timing parameters used in our paper and Ramulator are halved because their clock frequency is 1200 MHz. Despite this difference, the overall timing behavior remains the same between the two approaches.

Regarding the burst length, we are not entirely certain how UPMEM has implemented MRAM. However, our assumption is based on standard DDR memory, where each of the eight DRAM chips provides 8 bytes, resulting in a total of 8 × 8 = 64 bytes when the burst length is eight. Since MRAM consists of only a single DRAM bank, we assume its bandwidth is one-eighth of that, meaning it provides 8 bytes rather than 64 bytes. If a single MRAM were capable of providing 64 bytes, the aggregate bandwidth would be eight times greater than that of standard DDR-2400.

That said, we acknowledge that this is an assumption, and we welcome any insights or corrections. If you have well-founded reasoning or data, we encourage you to modify uPIMulator accordingly.

If you have any further questions, please feel free to ask.

Best regards,
Bongjoon

@palmenros
Copy link
Author

Thank you very much for your detailed explanation!

It's very clear and makes complete sense.

I'll continue fiddling with uPIMulator and will reach out if I have any additional questions.

Best regards,
Pedro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0