Programming Parallel Computers

Aalto 2024

LLM9a: CPU optimization ★★

You need to log in to make submissions.

What you will need to do in this task

Please read the general instructions for this exercise first. Here are the additional instructions specific to this task:

Implement efficient processing of an LLM prompt using all available single-precision processing power of the CPU.

What I will try to do with your code

I will first run all kinds of tests to see that your code works correctly. You can try it out locally by running ./grading test, but please note that your code has to compile and work correctly not only on your own computer but also on our machines.

If all is fine, I will run the benchmarks. You can try it out on your own computer by running ./grading benchmark, but of course the precise running time on your own computer might be different from the performance on our grading hardware.

Benchmarks

Name Operations Parameters
benchmarks/1 3,472,443,648 dim = 288, n_heads = 6, n_layers = 6, num_tokens = 127, vocab_size = 2048
Process a sequence of 127 tokens with a 6 layer model of width 288
benchmarks/2 58,321,158,400 dim = 512, n_heads = 8, n_layers = 8, num_tokens = 765, vocab_size = 2048
Process a sequence of 765 tokens with a 8 layer model of width 512
benchmarks/3 142,139,200,512 dim = 768, n_heads = 12, n_layers = 6, num_tokens = 510, vocab_size = 32000
Process a sequence of 510 tokens with a 6 layer model of width 768
benchmarks/4 34,244,798,976 dim = 288, n_heads = 6, n_layers = 6, num_tokens = 2047, vocab_size = 2048
Process a sequence of 2047 tokens with a 6 layer model of width 288
benchmarks/5 233,995,901,952 dim = 768, n_heads = 12, n_layers = 12, num_tokens = 513, vocab_size = 32000
Process a sequence of 513 tokens with a 12 layer model of width 768

Here “operations” is our rough estimate of how many useful arithmetic operations you will at least need to perform in this benchmark, but of course this will depend on exactly what kind of an algorithm you are using.

Grading

In this task your submission will be graded using benchmarks/5: Process a sequence of 513 tokens with a 12 layer model of width 768.

The point thresholds are as follows. If you submit your solution no later than on Sunday, 02 June 2024, at 23:59:59 (Helsinki), your score will be:

Running timePoints
≤ 8.000 sec 1
≤ 6.000 sec 2
≤ 4.000 sec 3
≤ 2.000 sec 4
≤ 1.000 sec 5

For late submissions you will not get any points.

Contest

Your submissions to this task will also automatically take part in the contest, and you can receive up to 2 additional points if your code is among the fastest solutions this year!

Running timeExtra points
≤ 1.20 × fastest 1
≤ 1.05 × fastest 2