8000 [Final Project] Performance Competition (substituting final exam, due: 8th of July) · Issue #168 · kaist-cp/cs420 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Final Project] Performance Competition (substituting final exam, due: 8th of July) #168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeehoonkang opened this issue Jun 1, 2020 · 22 comments

Comments

@jeehoonkang
Copy link
Member
jeehoonkang commented Jun 1, 2020

In turned out that we cannot physically gather for the final exam. So I decided not to take the final exam. Instead, as the substitute task, we'll have a performance competition as the final project.

For the competition, you’ll submit your entire compiler. Predefined benchmark programs will be compiled and then executed on Hifive Unleashed (the first Linux-bootable RISC-V development board), which is sponsored by SemiFive. If any of the results are wrong, you’ll be disqualified. The geometric average of the number of CPU cycles will be compared among students’ compilers and clang -O1.

Please do whatever you can to reduce the number of cycles, e.g., by implementing more optimizations or by improving your asmgen with a better register allocation algorithm.

If your compiler is better than clang -O1, you’ll get A#. If your compiler is better than those of most students, you’ll get A+. Depending on the performance of your compiler, you'll get some bonus.

@cyron1259
Copy link

Would there be some kind of a leaderboard so that we can compare against others' performance?

@jeehoonkang
Copy link
Member Author

@cyron1259 good idea! we will soon prepare for a leaderboard.

@hestati63
Copy link

As the whole compiler will be run on the final competition, I want to fuzz each optimization pass.
But the fuzzer does not support it. Can you make options to fuzz the optimization pass?

Also, can you provide the command line argument for the compiler that will be used in the competition?

@jeehoonkang
Copy link
Member Author
jeehoonkang commented Jun 4, 2020

@hestati63

  • on fuzzing optimizations, let's discuss here: [HW 3~6] Testing Optimization Passes #178
  • on competition's specification, I will soon prepare for a detailed description of the submission procedure, competition rule, leaderboard, etc. For now, let's say (1) you can implement custom optimizations on IR; and (2) you can optimize the naive asmgen introduced in the lecture videos.

8000 @jeehoonkang jeehoonkang self-assigned this Jun 10, 2020
@jeehoonkang
Copy link
Member Author
jeehoonkang commented Jun 15, 2020

Clarification: you need to observe the LP64D calling convention: #209 (comment)

@jeehoonkang
Copy link
Member Author

IMPORTANT UPDATE on FINAL PROJECT

  • Benchmark code is uploaded: kaist-cp/kecc-public@114f38c
  • In the bench directory, execute make run. Then it will build your compiler, build benchmark codes, run them, and measure the elapsed CPU cycles. The average is your score (lower is better).
  • For the time being, it's running on QEMU and the measurement is not accurate. I will soon provide a gg.kaist.ac.kr submission link so that you can run the benchmark codes on the SiFive HiFive Unleashed RISC-V machine.
  • Benchmark codes will be added in the near future.

@hestati63
Copy link
hestati63 commented Jul 1, 2020

Can you notify a specific deadline that you finalizes the benchmark codes?

@cmpark0126
Copy link
Collaborator
cmpark0126 commented Jul 2, 2020

IMPORTANT: you should use la pseudo instruction instead of HI20, LO12 pair when obtaining the address of the global variable.
We create a shared object using the assembly code to check the performance of the compiler on the final project.
However, the relocation function HI20 and LO12 can not be used when making a shared object.
Instead, you can generate a shared object normally by using la instruction.

So, please use la pseudo instruction instead of HI20, LO12 pair like below:

# before
lui     a5,%hi(nonce)
lw      a5,%lo(nonce)(a5)
# after
la      a5,nonce

@jeehoonkang
Copy link
Member Author

IMPORTANT:

  • I just uploaded the final project grader: kaist-cp/kecc-public@542535f Please do whatever you want to improve cd bench; make run's "[AVERAGE]" score (lower is better). We recommend you to read driver.cpp.

    @hestati63 Sorry for uploading the grader late. It's now finalized.

  • You'll upload your entire src directory. Please run ./scripts/make-submissions.sh and final.zip is the file you'll upload to gg (TBA).

@Medowhill
Copy link

Hi. Could you let us know the scores of some reference compilers (for example, gcc -O0 and gcc -O1)? Currently, it is hard to know whether my implementation performs well or not by only seeing the score. Also, if one tries to challenge gcc / clang -O1, those scores can be good targets.

@jeehoonkang
Copy link
Member Author
jeehoonkang commented Jul 3, 2020

@Medowhill make run-gcc will evaluate GCC with the optimization flag -O for the same benchmark. You can easily change Makefile to evaluate gcc -O0 and gcc -O1 as well.

@Medowhill
Copy link

Thank you! I didn't notice that.

@hestati63
Copy link

When will be gg grader ready?
As qemu uses binary translation, the cycle looks like just dependent on the number of instructions.

@jeehoonkang
Copy link
Member Author

@hestati63 I'm trying to provide the grader by tomorrow. Sorry for delay.

@jeehoonkang
Copy link
Member Author

@jeehoonkang
Copy link
Member Author

FYI, gcc -O's result is as follows:

[exotic_arguments_struct_small] 52
[exotic_arguments_struct_large] 77
[exotic_arguments_struct_small_ugly] 34
[exotic_arguments_struct_large_ugly] 138
[exotic_arguments_float] 18
[exotic_arguments_double] 19
[fibonacci_recursive] 52089252
[fibonacci_loop] 1640
[two_dimension_array] 72229
[matrix_mul] 373849
[matrix_add] 53248
[graph_dijkstra] 78160627
[graph_floyd_warshall] 151599746
[fibonacci_recursive] 52089692
[fibonacci_loop] 1787
[two_dimension_array] 74329
[matrix_mul] 372201
[matrix_add] 57362
[graph_dijkstra] 79328213
[graph_floyd_warshall] 151565586
[fibonacci_recursive] 52048084
[fibonacci_loop] 1754
[two_dimension_array] 72840
[matrix_mul] 377440
[matrix_add] 55333
[graph_dijkstra] 78468837
[graph_floyd_warshall] 151555926
[fibonacci_recursive] 52089934
[fibonacci_loop] 1759
[two_dimension_array] 72798
[matrix_mul] 372444
[matrix_add] 52443
[graph_dijkstra] 75623586
[graph_floyd_warshall] 151648428
[fibonacci_recursive] 52082904
[fibonacci_loop] 1755
[two_dimension_array] 72791
[matrix_mul] 373361
[matrix_add] 54438
[graph_dijkstra] 76326790
[graph_floyd_warshall] 151566425
[fibonacci_recursive] 52048784
[fibonacci_loop] 1782
[two_dimension_array] 72896
[matrix_mul] 379304
[matrix_add] 52175
[graph_dijkstra] 76327618
[graph_floyd_warshall] 151525424
[fibonacci_recursive] 52046427
[fibonacci_loop] 1758
[two_dimension_array] 72529
[matrix_mul] 370371
[matrix_add] 53737
[graph_dijkstra] 77295262
[graph_floyd_warshall] 151636428
[fibonacci_recursive] 52042896
[fibonacci_loop] 1775
[two_dimension_array] 72726
[matrix_mul] 376777
[matrix_add] 55529
[graph_dijkstra] 75582433
[graph_floyd_warshall] 151547499
[fibonacci_recursive] 52043659
[fibonacci_loop] 1896
[two_dimension_array] 72959
[matrix_mul] 370099
[matrix_add] 53497
[graph_dijkstra] 75628374
[graph_floyd_warshall] 151557803
[fibonacci_recursive] 52043419
[fibonacci_loop] 1780
[two_dimension_array] 72684
[matrix_mul] 373757
[matrix_add] 57870
[graph_dijkstra] 79321067
[graph_floyd_warshall] 151603631
[AVERAGE] 1.06947e+06

@lomotos10
Copy link
Member
lomotos10 commented Jul 7, 2020

IMPORTANT: you should use la pseudo instruction instead of HI20, LO12 pair when obtaining the address of the global variable.
We create a shared object using the assembly code to check the performance of the compiler on the final project.
However, the relocation function HI20 and LO12 can not be used when making a shared object.
Instead, you can generate a shared object normally by using la instruction.

So, please use la pseudo instruction instead of HI20, LO12 pair like below:

# before
lui     a5,%hi(nonce)
lw      a5,%lo(nonce)(a5)
# after
la      a5,nonce

@cmpark0126 I am currently having trouble understanding the la instruction.
Does la return the address of the label, or the data inside that address?

< 8000 circle cx="8" cy="8" r="7" stroke="currentColor" stroke-opacity="0.25" stroke-width="2" vector-effect="non-scaling-stroke" fill="none" />

@cmpark0126
Copy link
Collaborator

@cmpark0126 I am currently having trouble understanding the la instruction.
Does la return the address of the label, or the data inside that address?

The address of the label

@jesper-amilon
Copy link

Should we use the la-instruction only for the Nonce-object or for all global variables?

@cmpark0126
Copy link
Collaborator

@christofides You need to use la instruction for all global variables.

@jesper-amilon
Copy link
jesper-amilon commented Jul 7, 2020

@cmpark0126 I am currently having trouble understanding the la instruction.
Does la return the address of the label, or the data inside that address?

The address of the label

So if it loads the address, we need also add lw to actually load the value of the variable? I.e.:

la     a5, nonce
lw     a5,  a5

Edit: Another question, can LA be used to get the address of also floating point variables? (I assume this is the case but want to make sure)

@cmpark0126
Copy link
Collaborator

@christofides

  1. Yes, you need to add load instruction to get the value of the global variable like below:
    la     a5, nonce
    lw     a5,  0(a5)
    
  2. Yes, you can use la instruction for a global variable whose type is floating-point.

5A13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants
0