代做INT303 Big Data Analysis - Coding Project 2: Loan Approval Prediction调试Python程序

INT303 Big Data Analysis - Coding Project 2: Loan Approval Prediction

Weightage: 100   points(30%of  total  course  grade)

Due  Date:12   DEC

Submission: Submit  your  Jupyter  Notebook(.ipynb)    and a concise  1-2 page executive  summary  report  via  [Learning  Mall/Submission  Portal].

1.Introduction

In  the world  of finance,accurate  and  efficient  loan  approval  decisions  are

paramount.Banks  and  financial  institutions  rely  on  robust  data  analysis  and

predictive  models  to  assess  applicant  creditworthiness,mitigate  risks,and  optimize their  lending  portfolios.This  project  challenges you  to step  into the  role  of a  Data    Scientist  at  a  burgeoning  financial  technology(FinTech)firm.Your  task   is  to

develop a machine learning model that predicts whether a loan application will be approved or rejected based on a comprehensive set of applicant data.

This  project aims to solidify your understanding of the entire  machine  learning pipeline,from  exploratory  data  analysis  and  preprocessing  to  model  building,   evaluation,and  interpretation.You  will  be  provided  with  a  dataset  containing   various  applicant  attributes  and  their  corresponding  loan  approval  status.

2.Project Objectives

Upon  completion  of this  project,you  should  be  able  to:

Perform  comprehensive  Exploratory  Data  Analysis(EDA)to  understand  data distributions,identify   potential   issues,and   derive   insights.

Implement    effective     data      preprocessing    techniques,including      handling missing  values,encoding  categorical  features,and  scaling  numerical  features.

Engineer   new,meaningful   features   from   existing   ones   to   enhance   model performance.

Select  and  implement  appropriate  machine  learning  models  for  classification tasks.

·               Evaluate  model  performance  using  various  metrics  and  techniques.

·               Interpret   model   results   and   explain   the   factors   influencing   loan   approval decisions.

Present your findings clearly and  professionally  in a technical  report.

Demonstrate   proficiency    in    Python   programming    for   data    analysis   and machine  learning.

3.Dataset

You will  be working with  a  dataset  named  loan_approval_dataset_copy.csv(a

sample   of   the    "architsharma01/loan-approval-prediction-dataset").This    dataset contains  the  following  columns:

loan_id:Unique  identifier  for  each  loan  application.

·               no_of_dependents:Number  of  dependents  the  applicant  has.

·               education:Applicant's    education    level(Graduate/Not   Graduate).

·               self_employed:Whether  the   applicant  is  self-employed  (Yes/No).

·               income_annum:Applicant's   annual   income.

loan_amount:The  requested  loan  amount.

loan_term:The duration of the  loan  in years.

·               cibil_score:Applicant's   CIBIL   credit   score(a   creditworthiness   indicator).

·               residential_assets_value:Value  of  residential   assets.

commercial_assets_value:Value  of  commercial  assets.

luxury_assets_value:Value  of  luxury  assets.

·               bank_asset_value:Value  of  bank  assets.

loan_status:The  target  variable,indicating  whether  the  loan  was  'Approved' or  'Rejected'.

Note: The provided CSV is a small sample.Assume you are working with a larger, more realistic version of this dataset where you may encounter missing values,

outliers,and  varying  data  distributions.Your  solution  should  be  scalable  and  robust enough to  handle  such  real-world  scenarios.

4.Project  Tasks

Your submission should  include a well-commented Jupyter  Notebook and a

separate   executive  summary   report(PDF)summarizing   your  approach,findings, and   recommendations.

Task      1:Exploratory     Data      Analysis(EDA)and     Data      Preprocessing(30     points)

1.                      Load  and  Initial  Inspection:  Load   the  dataset  into  a  Pandas  DataFrame. Display    the    first    few     rows,check    data    types,and     identify    missing    values. Summarize  key  statistics.

2.             Univariate Analysis: Analyze   the   distribution   of  each  feature.For   numerical features,create    histograms   and    box    plots.For    categorical   features,create    bar plots.Describe   your   observations.

3.             Bivariate Analysis: Explore    the    relationships    between   features,particularly their     relationship    with     the     loan_status    target      variable.    Use    appropriate  visualizations(e.g.,scatter    plots,stacked    bar    plots,heatmaps).

4.             Data Cleaning:  Handle   any    identified   missing   values,outliers(if   present),or inconsistencies.Justify  your  chosen   methods.

5.             Feature Engineering: Create  at  least  two new,meaningful   features   that   you believe  could  improve  model  performance.Explain  your  rationale.

6.             Categorical   Encoding:   Convert    all    categorical    features    into    numerical representations   suitable    for   machine    learning   models    (e.g.,One-Hot    Encoding, Label  Encoding).

Feature Scaling: Apply    appropriate   scaling   techniques(e.g.,StandardScaler, MinMaxScaler)to   numerical   features.

 

Task    2:Model     Development     and     Evaluation(40    points)

Data Splitting: Split  your  processed  data  into  training  and  testing  sets(e.g., 70%training,30%testing).

Model Selection: Choose  at  least  three different classification algorithms.

Good  candidates  might  include:

                                      Logistic   Regression

Decision  Tree  Classifier

                                  Random  Forest  Classifier

Gradient    Boosting    Classifier(e.g.,XGBoost,LightGBM) Support  Vector  Machine(SVM)

K-Nearest     Neighbors(KNN)

Model Training: Train your chosen  models on the training data.

Hyperparameter Tuning: Implement  a  strategy  to  tune   hyperparameters  for each     selected     model     (e.g.,GridSearchCV,RandomizedSearchCV).Explain     why hyperparameter  tuning  is  important.

Model Evaluation: Evaluate the  performance of each tuned  model on the test set  using  various   metrics.At  a   minimum,include:

Accuracy

Precision,Recall,F1-score   (for   both   'Approved'and   'Rejected'classes) ROC AUC Score

Confusion   Matrix

Provide a comparative analysis of the  models  based  on these  metrics, considering  the  business  context(e.g.,what  kind  of  errors  are  more  costly  for  a bank?).

6.             Feature  Importance  (if  applicable):  For     tree-based    models,analyze     and visualize   feature   importance.Discuss   which    features   your    model   deems    most crucial  for  loan  approval  prediction.

Task    3:Executive    Summary    Report(20    points)

Write a  1-2  page  executive  summary  report  (in  PDF  format)that  addresses the following:

1.                      Introduction:   Briefly  state the  problem  and the objective of your  project.

2.             Methodology:   Summarize  your  data   preprocessing  steps,feature  engineering choices,and  the  models  you  experimented  with.

3.             Key   Findings: Present  the  performance  of  your  best  models  using  relevant metrics.Discuss  the  most  important  features.

4.             Recommendations     &Insights: Based  on  your  analysis,what  insights  can  you provide   to   the   FinTech   firm   regarding   loan   approval?Which   model   would   you recommend   and  why?Suggest   potential   improvements   or   next   steps   for   future work.

5.             Ethical        Considerations(Bonus,5         points):Briefly     discuss     any     ethical considerations  related  to  building  and  deploying  such  a  loan  approval  model(e.g., bias,fairness,transparency).

Task   4:Code   Quality   and   Documentation    (10   points)

1.                     Code   Readability: Your   Jupyter   Notebook   should   be   well-structured,logical, and easy to follow.

2.             Comments:  Include  appropriate  comments  to  explain  complex   logic,choices, and  reasoning.

3.             Reproducibility:    Ensure   your   notebook   can   be   run   from   top   to   bottom without  errors  and  produces  consistent  results.

 

热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图