# HY 2022–2023

## MF: median filter

MF1: CPU baseline

Implement a simple sequential baseline solution. Make sure it works correctly. Do not try to use any form of parallelism yet. You are expected to use a naive algorithm that computes the median separately for each pixel, with a linear-time median-finding algorithm.

5 2023-08-31 at 23:59:59
MF2: multicore parallelism

Parallelize your solution to MF1 with the help of OpenMP so that you are exploiting multiple CPU cores in parallel.

3 M 2023-08-31 at 23:59:59
MF9a: better algorithm

Design a better algorithm that does not recalculate the median separately for each pixel. Make it as efficient as possible, also for very large window sizes. You are encouraged to use all resources that you have in the CPU.

5 ★★★ 2023-08-31 at 23:59:59

### General instructions for this exercise

In this task, we will implement a program for doing 2-dimensional median filtering with a rectangular window.

### Interface

You need to implement the following function:

```void mf(int ny, int nx, int hy, int hx, const float* in, float* out)
```

Here `in` and `out` are pointers to `float` arrays, each of them contains `ny*nx` elements. The arrays are already allocated by whoever calls this function; you do not need to do any memory management.

The original value of pixel (x,y) is given in `in[x + nx*y]` and its new value will be stored in `out[x + nx*y]`.

### Correct output

In the output, pixel (x,y) will contain the median of all pixels with coordinates (i,j) where

• `0 <= i < nx`,
• `0 <= j < ny`,
• `x - hx <= i <= x + hx`,
• `y - hy <= j <= y + hy`.

This is the sliding window. Note that the window will contain at most `(2*hx+1) * (2*hy+1)` pixels, but near the boundaries there will be fewer pixels.

### Details

For our purposes, the median of the vector a = (a1, a2, …, an) is defined as follows:

• Let x1, x2, …, xn be the values of a sorted in a non-decreasing order.
• If n = 2k + 1, then the median of a is xk+1.
• If n = 2k, then the median of a is (xk + xk+1)/2.

### Examples

These examples show what a correct implementation will do if you apply it to each color component of a bitmap image. In these examples, for each color component, the value of each pixel is the median of all pixel values within a sliding window of dimensions (2k+1) × (2k+1). Hover the mouse on the output images to see the differences between input and output.

k = 1

k = 2

k = 5

k = 10

Noise reduction, k = 1

Noise reduction, k = 2

### Hints for MF1–MF2

Make sure that you use a fast selection algorithm to find the median. Do not sort; quickselect and other good selection algorithms are much faster than sorting, and there is already a good implementation in the C++ standard library, so you do not need to reinvent it.

### Hints for MF9a

Here are some examples of possible techniques that you can use to solve task MF9a:

• Partition the image in smaller, partially overlapping blocks. For example, if your windows size is 21 × 21, a reasonable block size might be 50 × 50. Solve the median filter problem separately for each block; place the blocks so that each output pixel comes from exactly one block. Be careful with the boundaries.
• Within each block, sort the input data and replace the original values by their ordinals, breaking ties arbitrarily — for example, if the input data was (1.0, 1.0, 0.1, 0.9, 0.5, 0.2, 1.0), replace it with (4, 5, 0, 3, 2, 1, 6). Now you have a much simpler task: instead of finding a median of floating point numbers, you only need to find the median of small, distinct integers.
• You can now represent the sliding window as a bit vector: for example, if your block size is 50 × 50, then your sliding window can be represented as a bit vector with 2500 bits. Do not recompute the bit vector from scratch for every window position — if you move the window from (x,y) to (x+1,y), you only need to reset 2k+1 bits and set 2k+1 bits.
• Exploit bit-parallelism and the special CPU instructions that help with bit manipulation. To find the median from the bit vector efficiently, you may want to have a look at e.g. compiler intrinsics __builtin_popcountll (POPCNT instruction), __builtin_ctzll (TZCNT instruction), and _pdep_u64 (PDEP instruction). Some caching of pre-computed values may be helpful, too.
• Pay attention to the locality of memory references. For example, does it make more sense to slide the window from left to right or from top to bottom?
• You can use OpenMP to process multiple blocks in parallel.