代做COMP3221 Assignment 2: Federated Learning代写留学生Python语言

COMP3221

Assignment 2: Federated Learning

Due: April 19th, 2024 (Friday, Week 8), by 11:59 PM

The main goal of this assignment is to implement a simple Federated Learning (FL) system. This project can be done in a group of two students, with only one team member submitting the work on behalf of the group. You need to register a group on the CANVAS page: COMP3221 → People → Group - A2.

1 Learning Objectives

Figure 1: An example of a Federated Learning system with 1 server and 5 clients.

Your assignment involves developing a Federated Learning (FL) system that consists of one server and five clients as depicted in Fig. 1. Each client possesses its own private dataset used for training a local model, and then contributes its local model to the server in order to build a global model.

On completing this assignment you will gain practical knowledge in:

• Federated Learning principles: Understand how to scale machine learning across mul- tiple devices while ensuring data privacy and minimizing central data storage needs.

• Client-Server programming: Master the basics of network communications using sock- ets, including setting up connections, designing protocols, and handling network issues.

• Machine learning programming:  Learn to implement, train, and evaluate machine learning models, focusing on practical aspects such as data handling, model and per- formance optimization.

2 Assignment Guidelines

2.1 Simulation Environment

Due to the unavailability of a physical network for deployment, you will simulate the FL on a single computer for both implementation and evaluation purposes. This simulation requires running separate instances of your program for each entity in the client-server architecture, using ’localhost’ for communication.  Specifically, each entity, including every client and the server, will be run in a different terminal window on your machine.

2.2 Federated Learning Algorithm

Algorithm 1 Federated Averaging (FedAvg)

1: parameters: K is number of clients, E is number of local epoch, nk is local datasize of client k, n is total datasize of K clients.

   2: procedure SERVERUPDATE ▷ Run on server

3:

Generate w0 randomly

4:

fort from 0 to T do

5:

Server broadcasts global model wt to K clients

6:

for each client k ∈ K in parallel do

7:

wt(k)+1 ← ClientUpdate(k,wt )

8:

end for

9:

Server receives new local models from K clients and randomly selects

10:

a subset M clientsin K, M ≤ K to aggregates new global model:

11:

wt+1 =ε  n(n)k wt(k)+1

12:

end for

13:

end procedure

14:

procedure CLIENTUPDATE(k,wt ) ▷ Run on client k

15:

fore from 1 to E do

16:

Client k updates local model wt(k)+1 based on the global model wt

17:

using GD or Mini-Batch GD

18:

end for

19:

Client k sends new local model wt(k)+1 to the server

20:

end procedure

In this assignment, we will use the Federated Averaging (FedAvg) algorithm, a key approach in Federated Learning where client devices collaboratively train a model by computing up-dates locally and averaging these updates on a central server to improve the global model. The workings of FedAvg are elaborated in Algorithm 1. Here, K represents the total number of clients participating in the training process. T is the total number of global communication rounds between the clients and the server. wt  refers to the global model’s parameters at iter- ation t, while wt(k)+1  denotes the local model’s parameters of client k at iteration t + 1. E is the number of local epochs,i.e. the number of times each client goes through its entire dataset to train the model locally before sending updates to the global model. For local model training, clients can use either Gradient Descent (GD) or Mini-Batch GD as optimization methods.

2.3 Dataset and Model

Figure 2: Sammples of the California Housing Dataset.

For this assignment, we work with the California Housing Dataset, which is a widely recog- nized dataset used in machine learning for predicting house prices based on various features. This dataset contains 20640 data samples, which each include 8 features (median income, housing median age, average rooms, average bedrooms, population, average occupancy, lat- itude, and longitude) and 1 target variable (median house value) for different blocks in Cal- ifornia.  The dataset is insightful for understanding how house values vary by location and other factors.

To simulate an FL environment that reflects the heterogeneous nature of real-world data, we have distributed the dataset across K = 5 clients.  Each client receives a portion of the dataset, varying in size, to mimic the diversity in data distribution one might encounter in practical FL scenarios.  The federated dataset is prepared and accessible in FLData . zip, available for download on the page CANVAS  → Assignment  2.   For every client, we pro- vide two CSV files:  one for training set and one for testing set.   For instance, the train- ing and testing data for Client 1 are named  "calhousing_train_client1 .csv" and "calhousing_test_client1 .csv", respectively.

Considering the objective is a regression problem focused on predicting house values, a Linear Regression model is apt for this task.  It efficiently models the correlation between house features and their prices.  The ultimate goal is to train a Linear Regression model optimized across the distributed datasets.

2.4 Program Structure

This assignment involves developing two main programs: one for the server and another for the clients.  It’s essential to start the server program before running any client programs to ensure proper communication and data exchange.

2.4.1 Server

The server program, named COMP3221_FLServer.py requires two command-line argu- ments for execution as follows.

1 python COMP3221_FLServer.py erver>

: The port number on which the server listens for incoming model up- dates from the clients. For this assignment, it is set to 6000.

• <Sub-Client>: An integer value determines whether client subsampling is enabled. A value of 0 means no subsampling, so the server aggregates models from all clients.  A value of M (0 < M < K) activates subsampling, where the server randomly aggregates models from only M out of the K clients.

Example usage:

1 python COMP3221_FLServer.py 6000 2

2.4.2 Client

The client program, named COMP3221_FLClient .py, accepts the following command line arguments:

1 python COMP3221_FLClient.py <Client-id> <Port-Client> <Opt-Method>

• Client-id: The identifier for a client in a FL network which is indexed sequentially as client1, client2, client3, client4, and client5.

• Port-Client:  The port number used by the client to receive model updates from the server. Port numbers are assigned starting at 6001 for client1 and increment by one for each subsequent client, up to 6005 for client5.

• Opt-Method:  The optimization method used for local model training.  A value of 0 selects Gradient Descent (GD), and a value of 1 selects Mini-Batch GD.

Example Usage:

1 python COMP3221_FLClient.py client1 6001 1

2.5 Assignment Tasks

2.5.1 Server

Following the FedAvg algorithm (Alg. 1), at the beginning, the server initially generates a global Linear Regression model with random parameters, denoted as w0 .  It then starts lis- tening for initial connection requests ("hand-shaking messages") from clients wanting to join the Federated Learning (FL) system. These messages should include information about their data size and ID, giving the server insight into the participating clients.

Once the server receives a handshake message from one client, it will continue to wait for 30 seconds to allow more client registrations. This waiting period occurs only once at the server’s startup, ensuring a sufficient number of clients involved in the training process.  Following this, the server broadcasts the global model to all registered clients and waits for the return of their new local models for aggregation.

After receiving local models from every registered client, the server aggregates these models to update a new global model. Depending on the specific configuration, this aggregation may involve models from all clients or just a selected subset of M < K clients. Once updated, the server broadcasts this new global model to all registered clients, marking the completion of one global communication round.

In this assignment, the FL system will run for T global communication rounds.  Upon com- pleting these rounds, the server will broadcast a "finish message" to all clients, signaling them to stop the training process.

For each global round, the server will print out the following output to the terminal:

1 Global Iteration 10:

2 Total Number of clients: 5

3 Getting local model from client 1

4 Getting local model from client 3

5 Getting local model from client 5

6 Getting local model from client 4

7 Getting local model from client 3

8 Aggregating new global model

9 Broadcasting new global model

The server is responsible for managing clients and should keep a list that contains information about registered clients. If a new client attempts to register after the server has completed its initialization phase, the server will add this client’s information to the current client list and share the global model with them in the next global communication round.

2.5.2 Client

Upon starting up, each client loads its own dataset and registers with the server by sending a hand-shaking message. Once the global model is received, the client first evaluates this model using its local test data. After that, it utilizes this global model as an initial point to train an updated local model.  The local training process can be finished in E local epochs using optimization methods such as GD or Mini-Batch GD. Subsequently, the client sends this newly trained local model to the server and waits to receive the next iteration of the global model.

During each global communication round, the client outputs the following to the terminal:

1 I am client 1

2 Received new global model

3 Testing MSE: 0.0052

4 Local training...

5 Training MSE: 0.0012

6 Sending new local model

Additionally, it logs the training and testing Mean Square Error (MSE) results for each round in a file named _log.txt, serving as a means for later evaluation.

Important Notes:

• You have the flexibility to define the format of hand-shaking messages and data packets used for model exchange between the server and the clients.

• You are allowed to use Machine Learning libraries (e.g., Scikit-learn, Pytorch) that were introduced during the tutorials for your implementation.

• You have the flexibility to select the values for input parameters, including the number of training rounds (T), the number of epochs (E), the learning rate, and the batch size. By adjusting these parameters, you can optimize the training process to ensure that your model achieves high performance.

3 Report and Submission

3.1 Report

Your submission must include a report document that concisely describes your work within a strict limit of no more than 3 pages, with the exception of the references section,which may extend beyond this page limit.

Your report should present your understanding of the algorithm, model, and dataset, along- side your approach to implementation. It should also include an insightful discussion on the experiment outcomes and a comparative analysis evaluating the performance of the global model across various scenarios.

Here are some example scenarios you can explore in your experiments:

1. Evaluate the performance of the global Linear Regression model across each client’s test dataset.

2. Examine the differences in utilizing Gradient Descent (GD) versus Mini-Batch GD; con- sidering various batch sizes and learning rate.

3. Analyze the impact of subsampling a subset of clients (M < K) compared to involving all clients (M = K) in the training process.

We recommend using figures and tables to visualize the experimental results. For example, you can demonstrate the convergence of training and testing MSE over iterations to provide a clear representation of the model’s performance improvement over time.

3.2    Submission Files

You are required to submit your source code and a short report to CANVAS.

• Code (a zipped archive contains all your code files, no need to submit the data files) SSID_COMP3221_Code .zip.

• Code Text (a single .txt file includes all implementation code for Plagiarism checking) SSID_COMP3221_Code .txt.

• Readme (A detailed .txt file that outlines the coding environment, version of packages used, instructions to run your program, and commands to reproduce the experimental results.)

SSID_COMP3221_Readme .txt.

• Report (A .pdf file that includes all content required in the report section) SSID_COMP3221_Report.pdf.

Note that you must upload your submission BEFORE the deadline.  The CANVAS would con- tinue accepting submissions after the due date; however, late submissions would incur a penalty per day with a maximum of 5 days late submission allowed.

4 Academic Honesty / Plagiarism

By uploading your submission to CANVAS  you implicitly agree to abide by the University policies regarding academic honesty, and in particular that all the work is original and not plagiarised from the work of others.  If you believe that part of your submission is not your work you must bring this to the attention of your tutor or lecturer immediately. See the policy slides released in Week 1 for further details.

In assessing a piece of submitted work, the School of Computer Science may reproduce it entirely, may provide a copy to another member of faculty, and/or communicate a copy of this assignment to a plagiarism checking service or in-house computer program.  A copy of the assignment may be maintained by the service or the School of Computer Science for the purpose of future plagiarism checking.

5 Marking

This assignment contributes 15% to your final grade for this unit of study. The distribution of marks between the assignment components is as follows.

• Code: 70%.

• Report: 30%.

Please refer to the rubric in Canvas (COMP3221 → Assignment → Assignment 2 → Assign- ment 2 - Rubric) for detailed marking scheme.




热门主题

课程名

int2067/int5051 bsb151 babs2202 mis2002s phya21 18-213 cege0012 mgt253 fc021 mdia1002 math39512 math38032 mech5125 cisc102 07 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 efim20036 mn-3503 comp9414 math21112 fins5568 comp4337 bcpm000028 info6030 inft6800 bcpm0054 comp(2041|9044) 110.807 bma0092 cs365 math20212 ce335 math2010 ec3450 comm1170 cenv6141 ftec5580 ecmt1010 csci-ua.0480-003 econ12-200 ectb60h3f cs247—assignment ib3960 tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 econ7230 msinm014/msing014/msing014b math2014 math350-real eec180 stat141b econ2101 fit2004 comp643 bu1002 cm2030 mn7182sr ectb60h3s ib2d30 ohss7000 fit3175 econ20120/econ30320 acct7104 compsci 369 math226 127.241 info1110 37007 math137a mgt4701 comm1180 fc300 ectb60h3 llp120 bio99 econ7030 csse2310/csse7231 comm1190 125.330 110.309 csc3100 bu1007 comp 636 qbus3600 compx222 stat437 kit317 hw1 ag942 fit3139 115.213 ipa61006 econ214 envm7512 6010acc fit4005 fins5542 slsp5360m 119729 cs148 hld-4267-r comp4002/gam cava1001 or4023 cosc2758/cosc2938 cse140 fu010055 csci410 finc3017 comp9417 fsc60504 24309 bsys702 mgec61 cive9831m pubh5010 5bus1037 info90004 p6769 bsan3209 plana4310 caes1000 econ0060 ap/adms4540 ast101h5f plan6392 625.609.81 csmai21 fnce6012 misy262 ifb106tc csci910 502it comp603/ense600 4035 csca08 8iar101 bsd131 msci242l csci 4261 elec51020 blaw1002 ec3044 acct40115 csi2108–cryptographic 158225 7014mhr econ60822 ecn302 philo225-24a acst2001 fit9132 comp1117b ad654 comp3221 st332 cs170 econ0033 engr228-digital law-10027u fit5057 ve311 sle210 n1608 msim3101 badp2003 mth002 6012acc 072243a 3809ict amath 483 ifn556 cven4051 2024 comp9024 158.739-2024 comp 3023 ecs122a com63004 bms5021 comp1028 genc3004 phil2617
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图