Please read the general instructions on this page first, and then check the individual tasks for more details. For each task you can download a zip file that contains the code templates you can use for development.

Task | Attempts | Expected | Points | Max | Rating | Rec. | Deadline for full points |
---|---|---|---|---|---|---|---|

CP1: CPU baseline | |||||||

Implement a simple For this initial exercise, we have disabled auto-vectorization. |
|||||||

– | – |
– |
5 + 0 | ★ | R | 2023-04-30 at 23:59:59 | |

CP2a: instruction-level parallelism | |||||||

Parallelize your solution to CP1 by exploiting For this technical exercise, we have disabled auto-vectorization. |
|||||||

– | – |
– |
3 + 0 | ★ | R | 2023-05-07 at 23:59:59 | |

CP2b: multicore parallelism | |||||||

Parallelize your solution to CP1 with the help of For this technical exercise, we have disabled auto-vectorization. |
|||||||

– | – |
– |
3 + 0 | ★ | R | 2023-05-07 at 23:59:59 | |

CP2c: vectorization | |||||||

Parallelize your solution to CP1 with the help of For this technical exercise, we have disabled auto-vectorization. |
|||||||

– | – |
– |
3 + 0 | ★ | R | 2023-05-07 at 23:59:59 | |

CP3a: fast solution with doubles | |||||||

Using all resources that you have in the CPU, solve the task |
|||||||

– | – |
– |
5 + 2 | ★★ | R | 2023-05-14 at 23:59:59 | |

CP3b: fast solution with floats | |||||||

Using all resources that you have in the CPU, solve the task |
|||||||

– | – |
– |
5 + 2 | ★★ | R | 2023-05-14 at 23:59:59 | |

CP4: GPU baseline | |||||||

Implement a simple baseline solution for the |
|||||||

– | – |
– |
5 + 0 | ★ | R | 2023-05-21 at 23:59:59 | |

CP5: fast GPU solution | |||||||

Using all resources that you have in the |
|||||||

– | – |
– |
10 + 2 | ★★ | R | 2023-05-28 at 23:59:59 | |

CP9a: better algorithm | |||||||

Try to use |
|||||||

– | – |
– |
5 + 0 | ★★★ | R | 2023-06-04 at 23:59:59 | |

CP9c: fast solution with doubles | |||||||

This is a version of CP3a which has extended, |
|||||||

– | – |
– |
0 + 0 | ★★ | R | 2023-06-04 at 23:59:59 |

You are given m input vectors, each with n numbers. Your task is to calculate the correlation between *every pair of input vectors*.

You need to implement the following function:

void correlate(int ny, int nx, const float* data, float* result)

Here `data`

is a pointer to the input matrix, with `ny`

rows and `nx`

columns. For all `0 <= y < ny`

and `0 <= x < nx`

, the element at row `y`

and column `x`

is stored in `data[x + y*nx]`

.

The function has to solve the following task: for all `i`

and `j`

with `0 <= j <= i < ny`

, calculate the correlation coefficient between row `i`

of the input matrix and row `j`

of the input matrix, and store the result in `result[i + j*ny]`

.

Note that the correlations are symmetric, so we will only compute the upper triangle of the result matrix. You can leave the lower triangle `i < j`

undefined.

The arrays `data`

and `result`

are already allocated by whoever calls this function; you do not need to do any memory management related to these arrays. You should not assume that `result`

contains any valid values at the point of call. In particular, it is not guaranteed to be initialized with zeros.

The input and output are always given as single-precision floating point numbers (type `float`

). However, depending on the task, we will do arithmetic with either single or double precision numbers:

- If the task specifies that you must use double-precision floating point numbers, then
*all*arithmetic operations must be done with type`double`

, all intermediate results must be stored in variables of type`double`

, and you will only round the final result to single precision. - If the task specifies that you can use single-precision floating point numbers, then you are encouraged to use the type
`float`

whenever possible.

However, in each case you will have to make sure the numerical precision of the results is sufficiently high. The grading tool makes sure the error is sufficiently small. The error thresholds are chosen so that a straightforward and efficient solution is typically good enough, but please feel free to ask the course staff for hints if you are struggling with the rounding errors.

These examples show what a correct implementation will do if you apply it to a bitmap image:

- Input (first image): vector i = row i of the image.
- Output (second image): red pixel at (i, j) = positive correlation between rows i and j, blue pixel = negative correlation.

A reasonable way to calculate all pairwise correlations is the following:

- First normalize the input rows so that each row has the arithmetic mean of 0 — be careful to do the normalization so that you do not change pairwise correlations.
- Then normalize the input rows so that for each row the sum of the squares of the elements is 1 — again, be careful to do the normalization so that you do not change pairwise correlations.
- Let X be the normalized input matrix.
- Calculate the (upper triangle of the) matrix product Y = XX
^{T}.

Now matrix Y contains all pairwise correlations. The only computationally-intensive part is the computation of the matrix product; the normalizations can be done in linear time in the input size.