8000 Decouple AXI `base_addr` and calyx memories' `curr_addr` in hardcoded AXI implementation by nathanielnrn · Pull Request #1854 · calyxir/calyx · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Decouple AXI base_addr and calyx memories' curr_addr in hardcoded AXI implementation #1854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 97 commits into from
Jan 21, 2024

Conversation

nathanielnrn
Copy link
Contributor
@nathanielnrn nathanielnrn commented Jan 18, 2024

This PR updates the testbench to have a non-zero base address assumed for AXI accesses. It also separates an incorrect dependency that existed between base addresses from AXI's perspective and the current addresses from Calyx memories' perspective.

This addresses the immediate issue in #1853, so will close that for now, perhaps it/a similar issue will be opened again once we begin work on dynamic addressing.

cc @sampsyo

nathanielnrn and others added 30 commits December 22, 2023 11:56
TBD if this actually implements AXI correctly.

There are currently some hacks in place (marked with todos)
to get this to compile, namely some splicing
that doesn't consider what we actually want to splice
(it just takes [31:0]) as opposed to dynamically considering actual
bits we want.

A few other things that should be cleaned up eventually

Need to create a cocotb testbench to test correctness
Maybe this shouldn't be here, but for now (having deleted my
working directory earlier) putting it here
Simply run make from the cocotb directory and axi-read-tests
will be executed
We tie ARID low in our manager
Prefixes should not contain trailing "_"
Got rid of "assert_val" and "block_transfer" groups
and instead perform these things inside "do_ar_transfer", this is
required because we cant assert valid before we drive the data
correctly, so needs to happen in parallel.

Currently: This seems to write 16 times to same place, this is due to
hardcoding of 16 in ar transfer, not sure why address doesn't
increment this is tbd (and next TODO)
This is part of read channel control sequence
Also reduces data bus width to 32
Base automatically changed from axi-wrapper-opts to main January 18, 2024 19:06
@nathanielnrn nathanielnrn changed the title Address fixes axi Decouple AXI base_addr and calyx memories' curr_addr in hardcoded AXI implementation Jan 18, 2024
@nathanielnrn nathanielnrn merged commit 4a206e9 into main Jan 21, 2024
@nathanielnrn nathanielnrn deleted the address-fixes-axi branch January 21, 2024 03:25
Copy link
Contributor
@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful!!! I have a few suggestions: one about terminology, and the other about an eventual way to deal with base addresses and kernel.xml.

Comment on lines +178 to +179
//need to increment this
ref base_addr = std_reg(64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just providing a "dumb" reading of this line: why would one ever need to increment the base address? Going by its name, a base address is always fixed: it's baked into the "communication protocol" for an accelerator, and we never need to change it. We're adding other stuff to the base address to get a given access address, but the base address never changes.

I think I understand, though, the base_addr register here actually just starts at the "base address" for the AXI interface. In fact, it holds the current "AXI-side" address we need to access.

That is, we imagine that there are two "sides" to every location in the abstract memory: the AXI-side address, which is what we will send out over the AXI bus, and the memory-side address, which we send into the std_mem_d1 or whatever. These are related by a linear equation, like axi_addr = mem_addr * WIDTH / 8 + BASE_ADDR or something. But importantly, when mem_addr == 0, axi_addr = BASE_ADDR. Does this terminology make sense?

(If so, maybe it would be sensible to change the names of these registers to reflect this notion.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This is a question I asked myself recently and I think a name change is warranted. The terminology makes sense and this is correct. The incrementing ties into part of what you spoke on here. Incrementing allows us to avoid doing expensive conversions by having the AXI-facing address "follow" along as transfers (making up a larger transaction) occur. This allows us to more easily transfer all of our data into our internal memories during the load part of our load -> compute -> store scheme.

invoke base_addr_A0(in = 64'b0)();
invoke base_addr_B0(in = 64'b0)();
invoke base_addr_Sum0(in = 64'b0)();
//TODO: get this from kernel.xml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a high-level notion: in the limit, let's actually generate kernel.xml!

That is, what currently happens is: we run the Calyx compiler once to produce kernel.xml, and then another time to generate the corresponding Verilog code for the AXI interface.

What we could instead do is: we run the Calyx compiler once to produce the YXI file. Then, our new tooling runs (consuming only the YXI file) and produces two things: (1) the Calyx code for the AXI interface, and (2) kernel.xml.

(So basically, I'm suggesting that we (YXI) be in charge, rather than the existing kernel.xml generator, which we can eventually remove.)

@nathanielnrn nathanielnrn mentioned this pull request Jan 23, 2024
4 tasks
rachitnigam added a commit that referenced this pull request Feb 16, 2024
… AXI implementation (#1854)

* init commit of hardcoded axi wrapper for a 'main' kernel

* add axi-reads-calix

* hook up inputs to channels in the wrapper. tbd if this works

* Working calyx verison of AR and R

TBD if this actually implements AXI correctly.

There are currently some hacks in place (marked with todos)
to get this to compile, namely some splicing
that doesn't consider what we actually want to splice
(it just takes [31:0]) as opposed to dynamically considering actual
bits we want.

A few other things that should be cleaned up eventually

Need to create a cocotb testbench to test correctness

* Track output of compiled calyx read channel

Maybe this shouldn't be here, but for now (having deleted my
working directory earlier) putting it here

* update gitignore to get rid of sim_build and other cocotb artifacts

* Working make files  for running cocotb tests

Simply run make from the cocotb directory and axi-read-tests
will be executed

* Add xID signals for cocotb compatability

We tie ARID low in our manager

* Fix prefix issue on cocotb axi test bench

Prefixes should not contain trailing "_"

* commit to repro 'make WAVES=1' cocotb error from axi-reads-calyx.futil

* axi-reads patch

* sync debug

* Add txn_len initialization to 16 in calyx program

* AXI Read fixed to get to read channel start

Got rid of "assert_val" and "block_transfer" groups
and instead perform these things inside "do_ar_transfer", this is
required because we cant assert valid before we drive the data
correctly, so needs to happen in parallel.

Currently: This seems to write 16 times to same place, this is due to
hardcoding of 16 in ar transfer, not sure why address doesn't
increment this is tbd (and next TODO)

* Add integer byte conversion for tests on Calyx AXI testharness

* WIP get reads to work. Add incr_curr_addr group

This is part of read channel control sequence

* remove .fst from tracking

* Add more data to testbench to make waveform viewing easier

* Reads seem to be terminating correctly at RLAST

* AR transfers seem to work, valid is high for 1 cycle

* Unreduced axi-reads-calyx.futil

Also reduces data bus width to 32

* Cocotb testbench now passes

* Formatted and passing axi-read-tests

* Reduce and comment axi-reads-calyx.futil

* remove axi-reads.v from being tracked

* add a todo

* add required ARPROT signal. This is hardcoded to be priviliged

* rename directories to yxi/axi-calyx

* initial commit of axi-writes-calyx, a copy of axi-reads-calyx

* WIP axi writes

* rename directories

* WIP imlpementing writes

* add testing for writes, note makefile is overwritten so now tests writes, not reads

* passing axi writes and testing

* init commit of hardcoded axi wrapper for a 'main' kernel

* add axi-reads-calix

* hook up inputs to channels in the wrapper. tbd if this works

* Working calyx verison of AR and R

TBD if this actually implements AXI correctly.

There are currently some hacks in place (marked with todos)
to get this to compile, namely some splicing
that doesn't consider what we actually want to splice
(it just takes [31:0]) as opposed to dynamically considering actual
bits we want.

A few other things that should be cleaned up eventually

Need to create a cocotb testbench to test correctness

* Track output of compiled calyx read channel

Maybe this shouldn't be here, but for now (having deleted my
working directory earlier) putting it here

* Working make files  for running cocotb tests

Simply run make from the cocotb directory and axi-read-tests
will be executed

* Add xID signals for cocotb compatability

We tie ARID low in our manager

* Fix prefix issue on cocotb axi test bench

Prefixes should not contain trailing "_"

* commit to repro 'make WAVES=1' cocotb error from axi-reads-calyx.futil

* axi-reads patch

* sync debug

* Add txn_len initialization to 16 in calyx program

* AXI Read fixed to get to read channel start

Got rid of "assert_val" and "block_transfer" groups
and instead perform these things inside "do_ar_transfer", this is
required because we cant assert valid before we drive the data
correctly, so needs to happen in parallel.

Currently: This seems to write 16 times to same place, this is due to
hardcoding of 16 in ar transfer, not sure why address doesn't
increment this is tbd (and next TODO)

* Add integer byte conversion for tests on Calyx AXI testharness

* WIP get reads to work. Add incr_curr_addr group

This is part of read channel control sequence

* remove .fst from tracking

* Add more data to testbench to make waveform viewing easier

* Reads seem to be terminating correctly at RLAST

* AR transfers seem to work, valid is high for 1 cycle

* Unreduced axi-reads-calyx.futil

Also reduces data bus width to 32

* Cocotb testbench now passes

* Formatted and passing axi-read-tests

* Reduce and comment axi-reads-calyx.futil

* remove axi-reads.v from being tracked

* add a todo

* add required ARPROT signal. This is hardcoded to be priviliged

* rename directories to yxi/axi-calyx

* initial commit of axi-writes-calyx, a copy of axi-reads-calyx

* WIP axi writes

* rename directories

* WIP imlpementing writes

* add testing for writes, note makefile is overwritten so now tests writes, not reads

* passing axi writes and testing

* Work on full AXI wrapper, reads and compute works

* cleaned up combined futil and tests

* delete axi-reads* which is subsumed by axi-combined

* add axi-combined-tests.py

* remove axi-writes as it is subsumed by axi-combined

* formatting

* Update yxi/axi-calyx/axi-combined-calyx.futil

Co-authored-by: Adrian Sampson <asampson@cs.cornell.edu>

* formatting

* add sim.sh which goes from calyx to running tests

* simplify valid.in signals

* WIP: replace groups with reg invokes

* add python file that enables waveform (vcd/fst) generation

* formatting

* simplify valid.in signals

* WIP: replace groups with reg invokes

* Replaces register-init groups with invokes

* Formatting of invokes

* Replace reg groups with invokes in main

* Modify tests to account for base address != 0

* Separate base-address calyx-mem-address dependency

This solution, made for our load->compute->store scheme, simply
increments the base_addr and curr_addr differently.
This should make it easy to have multiple transactions, which this
hardcoded does not support

* move incrs into par block

* fix  was_valid signal in write channel

---------

Co-authored-by: Rachit Nigam <rachit.nigam12@gmail.com>
Co-authored-by: Adrian Sampson <asampson@cs.cornell.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0