代写SESI M2、代做C++编程设计
Sorbonne Université – SESI M2
——–
MU5IN160 – Parallel Programming
Hands-on Session 6 – Dataflow for Motion Application
Very important, about the submission of your work At the end of this session you will have to
upload the following files on Moodle: 1) a zip of the src folder and 2) a zip of the include folder. After
that you will have 2 weeks to complete your work and to update your first submission. You have to work
in group of two people but each of you will have to upload the file on Moodle. Finally, please write your
name plus the name of your pair at the top of all these files.
Short introduction In this session, we will work on a streaming application that detects and tracks
moving objects from a video sequence. Contrary to the previous sessions, we will not use EasyPAP this
time. The later is not adapted for streaming applications. A working streaming application will be given
to you and you will have to use StreamPU to implement the Motion application through an explicit
dataflow representation.
1 Appetizer
First you need to clone the repository of the Motion project:
git clone --recursive https://gitlab.lip6.fr/parallel-programming/motion-sesi.git
The Motion project uses CMake in order to generate a Makefile: follow the README instructions to
compile the code.
grayscale image
(t − 1) motion detection Σ∆ (per pixel) mathematical morphology
opening-closing
connected components
labeling (CCL)
connected components
analysis (CCA)
surface filtering
grayscale image
(t) motion detection Σ∆ (per pixel) mathematical morphology
opening-closing
connected components
labeling (CCL)
connected components
analysis (CCA)
surface filtering
k-nearest neighboor
matching (k-NN)
temporal
tracking
grayscale pixels
p ∈ [0; 255]
blob of binary pixels
p ∈ {0, 1}, 0 → stationary, 1 → moving
image of labels
l ∈ [1; 232 − 1]
CCs = list of regions,
surface S & centroid (xG, yG)
sub-list of CCs with S ∈ [Smin, Smax]
list of (t − 1, t)
associations
final list of
moving objects
Figure 1: Motion detection and tracking processing graph. In gray and italic: the output of each
processing.
Fig. 1 presents the different algorithms used to detect moving objects and to track them over time. To
make it work, two strong assumptions are made: 1) the camera is fixed, 2) the light intensity is constant
over time. First, an image is read from a camera (or a video sequence) and then it is converted in a
grayscale image. Then, the Σ∆ algorithm is triggered. This algorithm is able to detect if a pixel is
moving over time. It returns a binary image, if a pixel value is 0, then it means that it is not moving.
Otherwise, if a pixel value is 1, then it means that it is moving. After that, morphology algorithms are
applied1
. This is a pre-processing to regroup moving pixels together and eliminate isolated pixels. Then,
from a binary image, a connected components labeling (CCL) algorithm is performed. The later, gives
the same label to a group of pixel that are connected to each other. CCL returns an image of labels where
l = 0 means no object and l > 0 means a moving object. From this image of label, some features are
extracted (CCA): for each object the center of mass (xG, yG), the bounding box ([xmin, xmax, ymin, ymax])
and the surface S are extracted. Depending on their surface, the objects are filtered (Smin < S < Smax).
From two images at t − 1 and t, a matching algorithm determines which objects are the same in the two
different images (mainly according to their distance). At the end, the identified objects are tracked to
have a constant identifier over time.
This graph of tasks is then repeated until the video sequence is over. It is not mandatory to understand
perfectly each algorithm. The purpose of this session is to work on a streaming application, representative
of a real application, and to perform optimizations at the task graph level.
1Mathematical morphology: https://en.wikipedia.org/wiki/Mathematical_morphology
In this graph, two tasks cannot be replicated. The per pixel motion algorithm requires its previous
output to compute the current binary image. It detects intensity variations over time. It is almost the
same for the tracking algorithm that maintains a list of tracks that are updated according to the last
frame.
If you’d like to better understand the algorithms used in this project, some of them are described in more
detail in the document’s appendix. In any case, it’s worth noting that you don’t need to understand
exactly what these algorithms do to complete this lab.
1.1 Run Motion
To run the code you will need some input videos. You can download a videos collection on Moodle (see the
“Artifacts” section) or from this web link: http://www.potionmagic.eu/~adrien/data/traffic.zip.
First, unzip the traffic.zip and from the build directory run the code with the following command:
./bin/motion2 --vid-in-path ./traffic/1080p_day_street_top_view_snow.mp4 \
--flt-s-min 2000 --knn-d 50 --trk-obj-min 5 --vid-out-play --vid-out-id
You should see a window with a top view of a highway and some moving cars (see Fig. 2) and you should
see green bounding boxes around the cars.
Figure 2: Motion screenshot (with –-vid-out-play –-vid-out-id parameters).
1.2 Architecture of the Project
Motion is mainly a C-style project but it is compiled in C++ to use StreamPU. The sources are
located in the src folder, and there are 3 sub-folders:
• common: contains implementations of the processing tasks,
• main: contains source files that correspond to a final binary executable,
• wrapper: contains C++ files to wrap the C-style processing functions into StreamPU modules
and tasks.
The headers are located in the include folder. Inside there are two sub-folders: c/motion for the C-style
headers and cpp/motion for the C++ headers.
Page 2
2 From Imperative to Dataflow Programming
We will convert the motion2 main into a dataflow description (= StreamPU modules and tasks). The
motion2 is located here src/main/motion2.c. This implementation is very close to the task graph
presented in Fig. 1.
Task #1 Understand the code, run the motion2 executable and play with the parameters (-h shows
and describes the available parameters).
To help you in the task, we created an other main based on motion2.c and we converted some C functions
into StreamPU modules for you. See the motion2_spu.cpp file.
Task #2 Understand the code, run the motion2-spu executable and play with the parameters (-h
shows and describes the available parameters). Understand the code of motion2_spu.cpp by comparing
it with the C-style motion2.c code.
Task #3 Create new StreamPU stateful modules, each time you will create new .cpp and .hpp
files in the wrapper folders. You will only declare input and output sockets (DON’T use forward
sockets at this time):
1. Sigma_delta: Add a StreamPU compute task that will call the sigma_delta_compute function,
2. Morpho: Add a StreamPU compute task that will call the C morpho_compute_opening3 and
morpho_compute_closing3 functions,
3. CCL: Add a StreamPU apply task that will call the C CCL_LSL_apply function,
4. Features_CCA: Add a StreamPU extract task that will call the C features_extract function,
5. Features_filter: Add a StreamPU filter task that will call the C features_filter_surface
and features_shrink_basic functions (note that the maximum input size of the features differs
from the maximum size of the output features: indeed, the main purpose of the shrink function is
to reduce the maximum number of features and to save memory space),
6. KNN: Add a StreamPU match task that will call the C kNN_match function,
7. Tracking: Add a StreamPU perform task that will call the C tracking_perform function.
Add the StreamPU modules and tasks incrementally in the motion2_spu.cpp file and you will test
if their integration is working (you can compare the logs with a diff, see Note #2 below). Have a look
on how we did this for the other StreamPU tasks that are given to you. You will follow the same
philosophy: 1) bind the sockets to the buffers allocated in the main file and 2) call the exec() method
explicitly.
Note #1 It is NOT possible to create sockets of RoI_t structure. Only the basic C types are supported.
To get around this limitation you can count the number of bytes in the structure. For instance, you can
do something like:
auto si_RoIs = this->template create_socket_in(t, "in_RoIs", max_size * sizeof(RoI_t));
Note #2 motion2 is our golden model. To compare the results of motion2 and motion2-spu you need
to generate the logs of motion2 executable first (we do it for only 20 frames to execute faster):
./bin/motion2 --vid-in-path ./traffic/1080p_day_street_top_view_snow.mp4 \
--vid-in-stop 20 --flt-s-min 2000 --knn-d 50 --trk-obj-min 5 --log-path logs_refs
Secondly, you need to generate the logs of the motion2-spu executable:
Page 3
./bin/motion2-spu --vid-in-path ./traffic/1080p_day_street_top_view_snow.mp4 \
--vid-in-stop 20 --flt-s-min 2000 --knn-d 50 --trk-obj-min 5 --log-path logs_spu
Finally you need to compare the logs together:
diff logs_refs logs_spu
If the later command returns nothing, it means that motion2 and motion2-spu are equivalent (in term
of features). This is good, your new implementation is correct! If not... it is time to debug :’-(.
Task #4 At this point, you should only have StreamPU tasks that call their exec() method explicitly
(no more C style function calls). However, the code is still using the data allocated in the main function.
This can be improved because StreamPU performs the data allocation and deallocation automatically
for you. In order to remove most of these allocations you have to perform partial “output to input socket”
bindings. For instance, if we only consider to eliminate the IB0 buffer, it is possible to remove “pointer
to output socket” bindings and to add “output to input socket” bindings instead, as shown in Code 1.
Do it for all the buffers, EXCEPT for IG0 and IG1. It is strongly advised to do it step by step and to
check if the code is giving exactly the same results after each modification (please refer to Note #2).
// [...]
// step 1: motion detection (per pixel) with Sigma-Delta algorithm
sd0["compute::in_img"].bind(IG0[0]);
// sd0["compute::out_img"].bind(IB0[0]); // this line can be removed
sd0("compute").exec();
// step 2: mathematical morphology
// mrp0["compute::in_img"].bind(IB0[0]); // this line can be removed
mrp0["compute::in_img"] = sd0["compute::out_img"]; // <-- [NEW] output to input socket binding
// mrp0["compute::out_img"].bind(IB0[0]); // this line can be removed
mrp0("compute").exec();
// step 3: connected components labeling (CCL)
uint32_t n_RoIs_tmp0;
// ccl0["apply::in_img"].bind(IB0[0]); // this line can be removed
ccl0["apply::in_img"] = mrp0["compute::out_img"]; // <-- [NEW] output to input socket binding
ccl0["apply::out_labels"].bind(L10[0]);
ccl0["apply::out_n_RoIs"].bind(&n_RoIs_tmp0);
ccl0("apply").exec();
// [...]
Source code 1: Example of partial socket binding to eliminate IB0 buffer allocation/deallocation in the
main function.
Task #5 Now, replace IG0 and IG1 buffers by the binding of the video["generate::out_img_gray8"]
socket. For this, you will need to use a Delayer module in order to keep the t − 1 image in memory
(previously kept in the IG0 buffer). If you don’t use it, the t − 1 image will always be overwritten when
executing the video("generate") task.
Note #3 In the motion2 executable, some tasks are not executed in the first stream (see the following
condition in the motion2.c file: “if (n_processed_frames > 0)”). To manage it you have two possible
options:
• Always execute the tasks (no control flow) but in this case you need to carefully initialize the
Delayer module to the first frame with the Delayer::set_data() method (this solution is
simpler to implement),
Page 4
• Use a Switcher and a Controller_limit module to implement the control flow (= if condition).
To simplify, you will only put the Sigma_delta.compute() task in the condition. In other terms,
the CCL, the CCA and the filtering will be executed anyway.
Task #6 At this point you should not have memory allocations and deallocations anymore in the main
function. Next objective is to get rid of the multiple exec() calls over the tasks. You will separate the
binding from the execution. To do this, the socket bindings need to be moved outside of the while(1)
loop and the while(1) loop needs to be replaced by a StreamPU Sequence. Once it is done, only one
exec() call should remain: the one over the newly created Sequence object. Of course, you will check
if it works correctly (please refer to Note #2).
Sub-sequence 0
Delayer
exec order: [13]
addr: 0x16b1660a8
memorize (id = 13)
Tracking
exec order: [14]
addr: 0x16b166468
perform (id = 14)
Sigma_delta
Sigma_delta1
exec order: [7]
addr: 0x16b166cc8
compute (id = 7)
Morpho
Morpho1
exec order: [8]
addr: 0x16b166b38
compute (id = 8)
CCL
CCL1
exec order: [9]
addr: 0x16b1669a8
apply (id = 9)
Features_CCA
CCA1
exec order: [10]
addr: 0x16b166808
extract (id = 10)
Features_filter
Ftr_filter1
exec order: [11]
addr: 0x16b166618
filter (id = 11)
KNN
exec order: [12]
addr: 0x16b166548
match (id = 12)
Delayer
exec order: [0]
addr: 0x16b1660a8
produce (id = 0)
Sigma_delta
Sigma_delta0
exec order: [1]
addr: 0x16b166d98
compute (id = 1)
Morpho
Morpho0
exec order: [2]
addr: 0x16b166c00
compute (id = 2)
CCL
CCL0
exec order: [3]
addr: 0x16b166a70
apply (id = 3)
Features_CCA
CCA0
exec order: [4]
addr: 0x16b1668d8
extract (id = 4)
Features_filter
Ftr_filter0
exec order: [5]
addr: 0x16b166710
filter (id = 5)
Video
exec order: [6]
addr: 0x16b166e98
generate (id = 6)
out[0]:out
in[0]:in_img
out[1]:status
out[1]:out_img
in[0]:in_img
out[2]:status
out[1]:out_img
in[0]:in_img
out[2]:status
out[1]:out_labels
in[0]:in_labels
0
in[0]:in_labels
1
out[2]:out_n_RoIs
in[1]:in_n_RoIs
0
in[1]:in_n_RoIs
1
out[3]:status
out[2]:out_RoIs
in[2]:in_RoIs
out[3]:status
out[3]:out_labels out[4]:out_n_RoIs
in[1]:in_n_RoIs0
out[5]:out_RoIs
in[0]:in_RoIs0
out[6]:status
out[0]:out_img
in[0]:in_img
0
in[0]:in
1
out[1]:out_frame
in[0]:in_frame
out[2]:status
out[1]:out_img
in[0]:in_img
out[2]:status
out[1]:out_img
in[0]:in_img
out[2]:status
out[1]:out_labels
in[0]:in_labels
0
in[0]:in_labels
1
out[2]:out_n_RoIs
in[1]:in_n_RoIs
0
in[1]:in_n_RoIs
1
out[3]:status
out[2]:out_RoIs
in[2]:in_RoIs
out[3]:status
out[3]:out_labels out[4]:out_n_RoIs
in[3]:in_n_RoIs1
0
in[2]:in_n_RoIs
1
out[5]:out_RoIs
in[2]:in_RoIs1
out[6]:status
out[4]:out_RoIs0 out[5]:out_RoIs1
in[1]:in_RoIs
out[6]:out_nearest out[7]:out_distances out[8]:status
out[1]:status
out[3]:status
Figure 3: Expected StreamPU task graph without logs, without visualization and without control flow.
Note #4 To help you in the debugging, you can print the sequence graph with the export_dot method.
Enable/disable the logs, enable/disable the visualization and observe the impact on the task graph. If
you chose to do not implement control flow, the output graph should looks like in Fig. 3. Note that you
can personalize the name of a module with the set_custom_name(std::string custom_name) method.
Task #7 Before the sequence execution, you will enable the statistics of the task (call the get_modules
method on a sequence object). And after the sequence execution you will print them at the end
(tools::Stats::show function). The application will display the statistics only if there is the --stats
parameter. What do you see? Is it different than from the motion2 executable? Explain.
[Bonus] Task #8 When you think it’s necessary, create new tasks, postfixed with a f, that use forward
socket instead of input/output sockets combination. For instance, if we consider a task named compute
without forward socket, the task that uses forward socket will be named computef. You will NOT
replace the former compute task. Using forward sockets should help you to remove useless copies.
Do it incrementally to validate that the application is still working (see Note #2). Can you see an
improvement in the statistics of the tasks?
Page 5
Appendix
2.1 Sigma-Delta Algorithm (Σ∆)
The motion detection problem consists in separating moving and static areas in each frame. At each
instant, each pixel must be tagged with a fixed/moving binary identifier. When the camera is fixed, such
detection can be performed using the time differences computed for each pixel.
The following notations apply:
• t : current instant of time, used to identify the frames,
• It: grayscale source image at time t,
• It−1: grayscale source image at time t − 1,
• Mt: background image (mean image),
• Ot: grayscale difference image,
• Vt: image of variance (standard deviation) computed for each pixel,
• Lt: binary label image (motion/background), Lt(x) = {0, 1} or Lt(x) = {0, 255} to encode
{background, movement},
• x: the current pixel with (i, j) coordinates.
Most of motion detection techniques in an image sequence It(x) are based on an estimate of the modulus
of the temporal gradient |
∂I
∂t |. If the light intensity of the scene vary slowly (= is constant between two
consecutive images), then a significant variation in the pixel grayscale (above a threshold) between two
images will imply that there is movement at that point.
The Σ∆ algorithm assumes that the noise level can vary at any point. To achieve this, the pixel grayscale
is modeled by a mean Mt(x) and a variance (standard deviation) Vt(x). If the difference between the
current image and the background image is greater than N times the standard deviation, then movement
occurs. The value of N is a parameter. In this project, N is always set to 2.
This is a motion detection system based on the estimation of static background statistics using Σ∆
modulation: an iterative analog/digital conversion method that increments or decrements the digitized
value by one unit according to the result of the comparison between the analog value and the current
digitized value.
Algorithm 1: Sigma-Delta (Σ∆).
1 [Part #1: mean computation]
2 foreach pixel x do // Step #1: Mt estimation
3 if Mt−1(x) < It(x) then Mt(x) ← Mt−1(x) + 1
4 if Mt−1(x) > It(x) then Mt(x) ← Mt−1(x) − 1
5 otherwise do Mt(x) ← Mt−1(x)
6 [Part #2: difference computation]
7 foreach pixel x do // Step #2: Ot computation
8 Ot(x) = |Mt(x) − It(x)|
9 foreach pixel x do // Step #3: Vt update and clamping
10 if Vt−1(x) < N × Ot(x) then Vt(x) ← Vt−1(x) + 1
11 if Vt−1(x) > N × Ot(x) then Vt(x) ← Vt−1(x) − 1
12 otherwise do Vt(x) ← Vt−1(x)
13 Vt(x) ← max(min(Vt(x), Vmax), Vmin)
14 foreach pixel x do // Step #4: Lˆt estimation
15 if Ot(x) < Vt(x) then Lˆt(x) ← 0
16 else Lˆt(x) ← 1
The algorithm initialization for t = 0 is the following: M0(x) ← I0(x) and V0(x) ← Vmin. Then, the
algorithm is applied to the images from t = 1. The Vmin and Vmax constants are used to restrict the
possible values of Vt. Typically, Vmin = 1 and Vmax = 254. The complete algorithm after initialization is
shown in Alg. 1.
In the Motion project, a naive Σ∆ implementation is given to you:
Page 6
• Header: in the include/c/motion/sigma_delta/sigma_delta_compute.h file,
• Source: in the src/common/sigma_delta/sigma_delta_compute.c file.
See the sigma_delta_compute function.
2.2 Mathematical Morphology
In this project, we consider squared elements B of size 3 × 3. Let X be the set of pixels associated with
the B element. There are two basic operations: the dilation of X noted δB(X) and the erosion of X
noted ϵB(X). The application of mathematical morphology operators is similar to filtering operators
(stencils or convolutions), but with non-linear operations.
For binary images, dilation consists in computing a OR on the B neighborhood in the source image and
writing it to the destination image. Conversely, erosion consists in computing a AND on the neighborhood.
So, if a point in the neighborhood is 1, the dilation produces a 1 (since x OR 1 == 1), thus dilating the
binary connected component. Conversely, if only one pixel is 0 in the B neighborhood, the erosion will
produce a 0 (since x AND 0 == 0), thus eroding the connected component.
Erosion is used to reduce noise in images: if we consider that a small group of pixels is the noise that
we’re trying to remove, then applying erosion with a B element of size 3 × 3 will make any group of
pixels with a radius smaller than its size disappear.
Figure 4: Left: the initial binary image. Center: eroded image with a 3 × 3 squared element: the gray pixels are
removed. Right: dilated image with a 3 × 3 squared element: the gray pixels are added. Source: Wikipedia.
Let r be the radius and d = 2r + 1 the diameter of a squared element B, then an erosion of radius r
removes, to any connected component, a thickness of r pixels of contour while a dilation of radius r adds
a thickness of r pixels to the contour (see Fig. 4, note that in the figure the logic is reversed: pixels at 1
are black while pixels at 0 are white).
Figure 5: Left: the initial binary image. Center: opened image with a 3 × 3 squared element: the gray pixels
are removed. Right: closed image with a 3 × 3 squared element: the gray pixels are added. Source: Wikipedia.
From these two operators, two others can be defined: the closing ϕB(X) = ϵB(δB(X)) and the opening
γB(X) = δB(ϵB(X)). Closing reduces (or even completely close) holes in connected components, while
opening does the opposite, enlarging these same holes (see Fig. 5, note that in the figure the logic is
reversed: pixels at 1 are black, while pixels at 0 are white).
One of the advantages of opening and closing is that they preserve the (discrete) size of the regions,
unlike erosion, which reduces it, or dilation, which increases it. Depending on requirements, either a
closing or an opening can be chosen. As these operators are idempotent, applying them several times
does not change the result (which will be identical to that obtained after a single application). On the
other hand, they can be chained (opening and then closing or closing and then opening) to improve the
result image (noise reduction, filling holes, ...). By gradually increasing their radius, we obtain sequential
alternating filters, which are particularly effective for removing noise.
In the Motion project, naive 3 × 3 mathematical morphology implementations are given to you:
• Header: in the include/c/motion/morpho/morpho_compute.h file,
Page 7
• Source: in the src/common/morpho/morpho_compute.c file.
See the morpho_compute_opening3 and morpho_compute_closing3 functions.
Page 8