代做COMP3221 Assignment 2: Federated Learning代写留学生Python语言-留学生作业帮

代做COMP3221 Assignment 2: Federated Learning代写留学生Python语言

COMP3221

Assignment 2: Federated Learning

Due: April 19th, 2024 (Friday, Week 8), by 11:59 PM

The main goal of this assignment is to implement a simple Federated Learning (FL) system. This project can be done in a group of two students, with only one team member submitting the work on behalf of the group. You need to register a group on the CANVAS page: COMP3221 → People → Group - A2.

1 Learning Objectives

Figure 1: An example of a Federated Learning system with 1 server and 5 clients.

Your assignment involves developing a Federated Learning (FL) system that consists of one server and five clients as depicted in Fig. 1. Each client possesses its own private dataset used for training a local model, and then contributes its local model to the server in order to build a global model.

On completing this assignment you will gain practical knowledge in:

• Federated Learning principles: Understand how to scale machine learning across mul- tiple devices while ensuring data privacy and minimizing central data storage needs.

• Client-Server programming: Master the basics of network communications using sock- ets, including setting up connections, designing protocols, and handling network issues.

• Machine learning programming: Learn to implement, train, and evaluate machine learning models, focusing on practical aspects such as data handling, model and per- formance optimization.

2 Assignment Guidelines

2.1 Simulation Environment

Due to the unavailability of a physical network for deployment, you will simulate the FL on a single computer for both implementation and evaluation purposes. This simulation requires running separate instances of your program for each entity in the client-server architecture, using ’localhost’ for communication. Specifically, each entity, including every client and the server, will be run in a different terminal window on your machine.

2.2 Federated Learning Algorithm

Algorithm 1 Federated Averaging (FedAvg)
1: parameters: K is number of clients, E is number of local epoch, nk is local datasize of client k, n is total datasize of K clients.
2: procedure SERVERUPDATE ▷ Run on server
3:	Generate w0 randomly
4:	fort from 0 to T do
5:	Server broadcasts global model wt to K clients
6:	for each client k ∈ K in parallel do
7:	wt(k)+1 ← ClientUpdate(k,wt )
8:	end for
9:	Server receives new local models from K clients and randomly selects
10:	a subset M clientsin K, M ≤ K to aggregates new global model:
11:	wt+1 =ε n(n)k wt(k)+1
12:	end for
13:	end procedure
14:	procedure CLIENTUPDATE(k,wt ) ▷ Run on client k
15:	fore from 1 to E do
16:	Client k updates local model wt(k)+1 based on the global model wt
17:	using GD or Mini-Batch GD
18:	end for
19:	Client k sends new local model wt(k)+1 to the server
20:	end procedure

In this assignment, we will use the Federated Averaging (FedAvg) algorithm, a key approach in Federated Learning where client devices collaboratively train a model by computing up-dates locally and averaging these updates on a central server to improve the global model. The workings of FedAvg are elaborated in Algorithm 1. Here, K represents the total number of clients participating in the training process. T is the total number of global communication rounds between the clients and the server. wt refers to the global model’s parameters at iter- ation t, while wt(k)+1 denotes the local model’s parameters of client k at iteration t + 1. E is the number of local epochs,i.e. the number of times each client goes through its entire dataset to train the model locally before sending updates to the global model. For local model training, clients can use either Gradient Descent (GD) or Mini-Batch GD as optimization methods.

2.3 Dataset and Model

Figure 2: Sammples of the California Housing Dataset.

For this assignment, we work with the California Housing Dataset, which is a widely recog- nized dataset used in machine learning for predicting house prices based on various features. This dataset contains 20640 data samples, which each include 8 features (median income, housing median age, average rooms, average bedrooms, population, average occupancy, lat- itude, and longitude) and 1 target variable (median house value) for different blocks in Cal- ifornia. The dataset is insightful for understanding how house values vary by location and other factors.

To simulate an FL environment that reflects the heterogeneous nature of real-world data, we have distributed the dataset across K = 5 clients. Each client receives a portion of the dataset, varying in size, to mimic the diversity in data distribution one might encounter in practical FL scenarios. The federated dataset is prepared and accessible in FLData . zip, available for download on the page CANVAS → Assignment 2. For every client, we pro- vide two CSV files: one for training set and one for testing set. For instance, the train- ing and testing data for Client 1 are named "calhousing_train_client1 .csv" and "calhousing_test_client1 .csv", respectively.

Considering the objective is a regression problem focused on predicting house values, a Linear Regression model is apt for this task. It efficiently models the correlation between house features and their prices. The ultimate goal is to train a Linear Regression model optimized across the distributed datasets.

2.4 Program Structure

This assignment involves developing two main programs: one for the server and another for the clients. It’s essential to start the server program before running any client programs to ensure proper communication and data exchange.

2.4.1 Server

The server program, named COMP3221_FLServer.py requires two command-line argu- ments for execution as follows.

1 python COMP3221_FLServer.py erver>

• : The port number on which the server listens for incoming model up- dates from the clients. For this assignment, it is set to 6000.

• <Sub-Client>: An integer value determines whether client subsampling is enabled. A value of 0 means no subsampling, so the server aggregates models from all clients. A value of M (0 < M < K) activates subsampling, where the server randomly aggregates models from only M out of the K clients.

Example usage:

1 python COMP3221_FLServer.py 6000 2

2.4.2 Client

The client program, named COMP3221_FLClient .py, accepts the following command line arguments:

1 python COMP3221_FLClient.py <Client-id> <Port-Client> <Opt-Method>

• Client-id: The identifier for a client in a FL network which is indexed sequentially as client1, client2, client3, client4, and client5.

• Port-Client: The port number used by the client to receive model updates from the server. Port numbers are assigned starting at 6001 for client1 and increment by one for each subsequent client, up to 6005 for client5.

• Opt-Method: The optimization method used for local model training. A value of 0 selects Gradient Descent (GD), and a value of 1 selects Mini-Batch GD.

Example Usage:

1 python COMP3221_FLClient.py client1 6001 1

2.5 Assignment Tasks

2.5.1 Server

Following the FedAvg algorithm (Alg. 1), at the beginning, the server initially generates a global Linear Regression model with random parameters, denoted as w0 . It then starts lis- tening for initial connection requests ("hand-shaking messages") from clients wanting to join the Federated Learning (FL) system. These messages should include information about their data size and ID, giving the server insight into the participating clients.

Once the server receives a handshake message from one client, it will continue to wait for 30 seconds to allow more client registrations. This waiting period occurs only once at the server’s startup, ensuring a sufficient number of clients involved in the training process. Following this, the server broadcasts the global model to all registered clients and waits for the return of their new local models for aggregation.

After receiving local models from every registered client, the server aggregates these models to update a new global model. Depending on the specific configuration, this aggregation may involve models from all clients or just a selected subset of M < K clients. Once updated, the server broadcasts this new global model to all registered clients, marking the completion of one global communication round.

In this assignment, the FL system will run for T global communication rounds. Upon com- pleting these rounds, the server will broadcast a "finish message" to all clients, signaling them to stop the training process.

For each global round, the server will print out the following output to the terminal:

1 Global Iteration 10:

2 Total Number of clients: 5

3 Getting local model from client 1

4 Getting local model from client 3

5 Getting local model from client 5

6 Getting local model from client 4

7 Getting local model from client 3

8 Aggregating new global model

9 Broadcasting new global model

The server is responsible for managing clients and should keep a list that contains information about registered clients. If a new client attempts to register after the server has completed its initialization phase, the server will add this client’s information to the current client list and share the global model with them in the next global communication round.

2.5.2 Client

Upon starting up, each client loads its own dataset and registers with the server by sending a hand-shaking message. Once the global model is received, the client first evaluates this model using its local test data. After that, it utilizes this global model as an initial point to train an updated local model. The local training process can be finished in E local epochs using optimization methods such as GD or Mini-Batch GD. Subsequently, the client sends this newly trained local model to the server and waits to receive the next iteration of the global model.

During each global communication round, the client outputs the following to the terminal:

1 I am client 1

2 Received new global model

3 Testing MSE: 0.0052

4 Local training...

5 Training MSE: 0.0012

6 Sending new local model

Additionally, it logs the training and testing Mean Square Error (MSE) results for each round in a file named _log.txt, serving as a means for later evaluation.

Important Notes:

• You have the flexibility to define the format of hand-shaking messages and data packets used for model exchange between the server and the clients.

• You are allowed to use Machine Learning libraries (e.g., Scikit-learn, Pytorch) that were introduced during the tutorials for your implementation.

• You have the flexibility to select the values for input parameters, including the number of training rounds (T), the number of epochs (E), the learning rate, and the batch size. By adjusting these parameters, you can optimize the training process to ensure that your model achieves high performance.

3 Report and Submission

3.1 Report

Your submission must include a report document that concisely describes your work within a strict limit of no more than 3 pages, with the exception of the references section,which may extend beyond this page limit.

Your report should present your understanding of the algorithm, model, and dataset, along- side your approach to implementation. It should also include an insightful discussion on the experiment outcomes and a comparative analysis evaluating the performance of the global model across various scenarios.

Here are some example scenarios you can explore in your experiments:

1. Evaluate the performance of the global Linear Regression model across each client’s test dataset.

2. Examine the differences in utilizing Gradient Descent (GD) versus Mini-Batch GD; con- sidering various batch sizes and learning rate.

3. Analyze the impact of subsampling a subset of clients (M < K) compared to involving all clients (M = K) in the training process.

We recommend using figures and tables to visualize the experimental results. For example, you can demonstrate the convergence of training and testing MSE over iterations to provide a clear representation of the model’s performance improvement over time.

3.2 Submission Files

You are required to submit your source code and a short report to CANVAS.

• Code (a zipped archive contains all your code files, no need to submit the data files) SSID_COMP3221_Code .zip.

• Code Text (a single .txt file includes all implementation code for Plagiarism checking) SSID_COMP3221_Code .txt.

• Readme (A detailed .txt file that outlines the coding environment, version of packages used, instructions to run your program, and commands to reproduce the experimental results.)

SSID_COMP3221_Readme .txt.

• Report (A .pdf file that includes all content required in the report section) SSID_COMP3221_Report.pdf.

Note that you must upload your submission BEFORE the deadline. The CANVAS would con- tinue accepting submissions after the due date; however, late submissions would incur a penalty per day with a maximum of 5 days late submission allowed.

4 Academic Honesty / Plagiarism

By uploading your submission to CANVAS you implicitly agree to abide by the University policies regarding academic honesty, and in particular that all the work is original and not plagiarised from the work of others. If you believe that part of your submission is not your work you must bring this to the attention of your tutor or lecturer immediately. See the policy slides released in Week 1 for further details.

In assessing a piece of submitted work, the School of Computer Science may reproduce it entirely, may provide a copy to another member of faculty, and/or communicate a copy of this assignment to a plagiarism checking service or in-house computer program. A copy of the assignment may be maintained by the service or the School of Computer Science for the purpose of future plagiarism checking.

5 Marking

This assignment contributes 15% to your final grade for this unit of study. The distribution of marks between the assignment components is as follows.

• Code: 70%.

• Report: 30%.

Please refer to the rubric in Canvas (COMP3221 → Assignment → Assignment 2 → Assign- ment 2 - Rubric) for detailed marking scheme.

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030

联系我们

EMail: 99515681@qq.com

QQ: 99515681

留学生作业帮-留学生的知心伴侣！

工作时间：08:00-21:00

微信客服：codinghelp

热门主题

课程名