代写CMP-6026A/CMP-7016A – Audio-visual processing代写Python编程

COURSEWORK ASSIGNMENT

MODULE:                                             CMP-6026A/CMP-7016A – Audio-visual processing

ASSIGNMENT TITLE:                          Design, implementation and evaluation of a speech recognition system

DATE SET:                                           Week 1

PRACTICAL DEMONSTRATION:        Week 7 Wednesday (slot to be advised) – 5th  Nov 2025

RETURN DATE:                                   Friday of Week 8

ASSIGNMENT VALUE:                        50%

LEARNING OUTCOMES

•     Explain how humans produce speech from audio and visual perspectives and how these differ across different speech sounds and be able to give examples of how these are subject to noise and distortion

•    Apply a range of tools to display and process audio and visual signals and be able to analyse these to find structure and identify sound or visual events

•     Transfer knowledge learnt into code that extracts useful features from audio and visual data to provide robust and discriminative information in a compact format and apply this to machine learning methods

•     Design and construct audio and visual speech recognisers and evaluate their performance under varying adverse operating conditions

•     Work in a small team and organise work appropriately using simple project management techniques before demonstrating accomplishments within a professional setting

SPECIFICATION

Overview

This assignment involves the design, implementation and evaluation of a speaker-dependent speech recognition system to recognise the names of 20 students taken from the CMP-6026A/CMP-7016A modules in clean and noisy conditions.

Description

The task of building and testing a speech recogniser can be broken down into five stages:

i)          Speech data collection and labelling

ii)          Feature extraction

iii)        Acoustic modelling

iv)        Noise compensation

v)         Evaluation

The speech recogniser is to be speaker-dependent which means that it will be trained on speech from just a single speaker and should also be tested on speech from only that speaker. The vocabulary is a set of 20 names taken from the students studying the CMP-6026A and CMP-7016A, which will be provided by separately.

The twenty names have been selected to be words that are distinctive, some that are confusable with others and some that are short. Your recogniser will perform isolated word recognition. This means that, during testing, you will provide it with the audio of a single name, and the recogniser will output a single label providing its classification of that speech.

The assignment is to be carried out in pairs, with marks awarded according to the mark scheme provided. The assignment will use Python and a variety of Python libraries such as TensorFlow, numpy, matplotlib and scikit-learn. These are standard libraries and give a good introduction as to how such a task may be carried out in industry.

The second assignment (CW2) will be based closely on this assignment. This means that this assignment will form an important underpinning for the next coursework. Feedback and feedforward from this assignment should be useful when undertaking the second assignment.

1.         Speech data collection and labelling

A speech recogniser must be trained on examples of the speech sounds that it is expected to recognise. For this assignment, the vocabulary of the speech recogniser comprises 20 names taken from students on CMP-6026A and CMP-7016A. Therefore, the first part of the assignment involves recording examples of each name in the vocabulary. Theoretically, the more examples of each name, the higher the accuracy of the speech recogniser. The minimum number of samples you should collect is 20 of each name. Each speech file can be stored as a separate WAV file (e.g. dave001.wav).

Next, each audio file requires an associated label. This could be stored in the filename (as above), or via the directory structure, or in a separate reference text file including the label for each file. You should choose a logical approach that you can easily interface with from your Python code.

You only need to collect audio for this first coursework. However, bear in mind that you will have to collect a dataset of audio-visual speech for CW2, and so it might be more efficient to collect audio-visual data from the outset and to put the video data to one side until required. We strongly recommend finding video recording software that can record video data at a fixed framerate (not a variable framerate). Also, consider that you will have to give a live demo of your work for CW1 and, ideally, for CW2.

2.         Feature extraction

Feature extraction’s task is to extract a set of feature vectors from each speech utterance that forms the input to the speech recogniser. This will involve designing and implementing in Python an algorithm to extract feature vectors from each speech signal. Many different feature extraction methods exist, but for this assignment you should consider only filterbank-derived cepstral features. You may first want to use a linear frequency filterbank with rectangular channels as a simple starting point. This can be extended to a mel-scaled filterbank and to then incorporate triangular shaped channels to ultimately produce mel- frequency cepstral coefficients (MFCCs). You should also consider augmenting the feature vector with energy and then its temporal derivatives as this should increase recognition accuracy. These different configurations should provide you with some interesting designs that you can test within your speech recogniser.

The feature extraction code should take as input a speech file (for example dave001.wav) and output a variable containing MFCC vectors.

3.         Acoustic modelling

Acoustic modelling is where the acoustic properties (as represented by the feature vectors) are modelled. For this assignment, Deep Neural Networks (DNNs) will be used as the acoustic model. You  will implement DNNs using TensorFlow in Python, and specifically using the higher-level Keras functions. The scripts from Lab 4 show how to partition a dataset into training/validation sets, and how to use those sets to train, evaluate and optimise a simple DNN. You should attempt to optimise the validation accuracy of your network by varying its hyperparameters (including but not limited to changing the number of hidden layers and filters). You should use 2D convolutional layers in your network, although you are free to explore different layer types if you have time.

4.         Noise compensation

Noise can be added to the clean speech samples to create noisy speech which is more representative of real-world use of speech recognition systems. Adding noise to the speech will reduce the recognition accuracy and increase confusions. To mitigate this, some form of noise compensation may be needed. Different methods can be tested such as applying spectral subtraction to the feature extraction process or training the speech models on noisy speech (matched models). The effect of different types of noise can be investigated and different signal-to-noise ratios (SNRs).

5.         Testing and evaluation

Once you have evaluated your network using your evaluation data, training of the speech recogniser is complete and it can now be tested. Testing involves passing a new speech file (in the same feature format as the training data) to the speech recogniser and letting it recognise the speech. You should be able to pass up to 10 separate files, each containing an isolated name, and your network should produce a corresponding list of names recognised in those files.

Therefore, a new set of speech files should be collected (for example a further 10 or 20 examples of each word in the vocabulary) and input into the speech recogniser. You should use Python to compare the recogniser’s classifications to the true labels of the test files. You should be able to report the classification accuracy and present a confusion matrix that shows which word confusions took place.

Within the evaluation you can examine the effects of different configurations of the feature extraction. This may include different numbers of filterbank channels, different spacing of the channels, etc. You should be able to explain the effects on training/validation data loss and accuracy of changing the neural network architecture and hyperparameters, as discussed in the Acoustic modelling section above. You may also want to test your speech recogniser in noisy conditions (for example, factory noise, babble noise, etc) and under different signal-to-noise ratios (SNRs) to examine how the noise affects recognition accuracy. For all tests, be prepared to explain what is happening and why you think this is the case.

Group work

This work is to be undertaken in pairs. You must find your own partner and you should do this as soon as possible. In the coming weeks you will be asked to provide the names of both people in your pair.

In order to be successful in your pair, proper project planning and progress monitoring are vital. Good practice for undertaking a project such as this will be discussed in the lectures.

Relationship to formative assessment

Formative assessment takes place during all lab classes through discussion of your analysis, designs and implementations. These labs underpin the coursework and relate directly to the different parts.

Deliverables

The assessment covers one part and represents one of the assessed components of CMP-6026A:

Practical demonstration of the recogniser, and discussion of your design decisions and results (CW1)

The practical demonstration will take place in a lab. In the practical demonstration you will be asked to say a sequence of names that you will then decode using your speech recogniser. You will also be expected to discuss  your system and justify design decisions related to data collection, design, implementation, and evaluation of your speech recogniser. You will present, by way of a slideshow of no more than 10 minutes, an evaluation of the speech recogniser in terms of its performance with different configurations and test conditions (see point 5 above). One member of your group will submit your slideshow on  Blackboard as a group submission. Each group will have up to 25 minutes for the demonstration, in total.

Both group members will also submit a document providing your opinions of how the marks should be shared between your group members. This should be expressed as a percentage share for each group member (e.g. 50% Dave, 50% Sarah). We will use this to determine the distribution of marks in your group, following CMP’s Policy on Group Work. You should ensure that both people in the pair make a roughly equal contribution to the work, so that marks are shared fairly.

Resources

You will need to use audio/visual recording equipment/software and Python, as used in the lab classes. These resources have been introduced in the lectures and lab classes.

There will be a briefing session for this coursework in Week 2.

Marking scheme

Marks will be allocated as follows for the assessed component:

CW1: Demonstration and discussion (100%)

•    Speech collection and annotation methodology (10%)

    Design of justification of feature extraction (20%)

•    Acoustic modelling and noise compensation (10%)

•    Short presentation evaluating the performance of the speech recogniser under different conditions (30%)

    Discussion/question answering (30%)

Note – We will follow CMP’s Policy on Group Work (Made available to you on Blackboard) to allocate individual marks for this assignment. In summary, each group member’s estimation of individual contribution will be used to determine individual marks. It is expected that both people in the pairing will make an equal contribution to the work and the demonstration, so that marks can be awarded fairly.



热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图