代做ITEC 320 FINAL EXAM代做留学生SQL语言程序

ITEC 320

FINAL EXAM (PRACTICE)

Part 1: Multiple Choice Questions

1. When applied to new data points, logistic regression provides a column in the RapidMiner output called “Confidence(1).”  What does the number in that column tell us?

A) The probability that the new data point is similar to what we’ve observed in the original dataset
B) The probability that the outcome for the new data point will be 1
C) The accuracy of the logistic regression model
D) The probability that logistic regression was the correct model

2. When comparing different predictive methods for numeric outcomes, how do we determine which is the most accurate?

A) Select the method with the highest root mean squared error
B) Select the method with the lowest root mean squared error
C) Select the method with the highest classification accuracy
D) Select the method with the lowest classification accuracy

3. A binary independent variable called SpecialOrder in a linear regression model for predicting ProcessingTime of orders (measured in days) has a coefficient of 3.36?  What does that number mean?

A) Each additional special order leads to an average increase in processing time of 3.36 days.
B) The level of significance of SpecialOrder is 3.36.
C) Special orders require an average of 3.36 days to process.
D) On average, special orders have processing times that are 3.36 days longer than regular orders.

4. When trying to figure out what predictive method will work best, all of the following are benefits of using cross validation EXCEPT:

A) Cross validation is often the best predictive method.
B) Cross validation enables each method to produce the same accuracy or error metric.
C) Cross validation provides measures of predictive accuracy rather than measures of fit.
D) Cross validation helps prevent overfitting.

5. The table below shows the performance of a classification model on our dataset.  What percentage of the model’s “1” predictions turned out to be correct?

A) 74.88%
B) 38.53%
C) 23.86%
D) 5.25%

6.  Which operator in RapidMiner should be used to create a forecasting model for the time series shown in this line chart?

A) Exponential Smoothing
B) Apply Forecast
C) Holt-Winters
D) Decision Tree

Part 2: Problems

1.  (10 pts.) Why is it better to use a 5-period moving average to make predictions than it would be to either A) use the most recent value as your prediction, or B) use the average value for the whole time series as your prediction?

2. (10 pts.) The classification tree below is used to predict whether or not a charity’s request for donations by mail will be successful (indicated by a 1).  The following independent variables are used:

previous_donor: a binary variable equal to 1 if the person has given to this charity before, and 0 if not
months_since_last_donation: for previous donors, the number of months since their last donation
income: the average household income of the person’s neighborhood

a) (5 pts.)  Does the classification tree predict that the following person will donate?
previous_donor = 1
months_since_last_donation = 6
income: = $127,500

b) (6 pts.)  Briefly (1-2 sentences) explain the logic that this tree is using to make predictions.

3. (15 pts.) A publishing company is analyzing a dataset of its published books to try to figure out characteristics of a book that make it more or less likely to become a bestseller.  They have run a logistic regression model using four of these attributes as independent variables, and obtained the following results (the dependent variable is 1 if the book was a bestseller, and 0 if it was not):

a) (5 pts.) Which two of these attributes were significant?

b) (5 pts.) If a book has lots of action verbs, what effect does that have on the estimated probability that the book will be a bestseller?

c) (5 pts.) What does this logistic regression output tell us about the effect of the length of the book (in pages) on the probability that the book will be a bestseller?

4. (25 pts.) This problem is based on analysis of a dataset from a non-profit called Connect the Planet, which aims to develop infrastructure and help individual countries plan to improve their citizens’ internet access.  They believe that the two primary factors associated with a country’s internet usage are its economic productivity (GDP per capita) and its adult literacy rate, and are trying to develop a predictive model to capture these relationships.  The attribute being predicted is the country’s number of frequent internet users per 100 people.

a) (5 pts.) The screenshot below shows the subprocess within the Cross Validation operator.  Why are we getting an error?  What needs to be done to fix it?

b. (5 pts.) After fixing the issue from part a, we ran the process and got this result:

What does that 13.626 number mean (conceptually, not mathematically), and what should we do with it?

We have created the following linear regression model using this dataset, used in the next two questions:

c. (5 pts.) What is the relationship between a country’s adult literacy rate and its number of frequent internet users per 100 people?

d) (5 pts.) This regression model would predict that a country with a per capita GDP of $0 and an adult literacy rate of 0% would have -24.331 frequent internet users per 100 people.  Why does it give us an obviously incorrect prediction?

e) (5 pts.) RapidMiner’s linear regression output omits several pieces of information that we get when using Excel.  Identify one such number, and explain what it means.

5. (15 pts.) This problem is based on a telecom company’s dataset containing all of its mobile plan customers from last month whose plans were due to expire at the end of the month.  The dataset includes, for each customer, the monthly cost of the customer’s plan (in $), the total quantity of data the customer used last month (in GB), and a binary variable indicating whether or not the customer still has a mobile plan with the company (1=Yes, 0=No).  If a customer still has a mobile plan with the company, it means that either they renewed their previous plan, or they changed to a different plan.  The company would like to be able to predict more accurately which customers are likely to remain and which are likely to leave.

a) (5 pts.) We ran cross validation using k-nearest neighbors with k=5, k=10, and k=30.  The overall accuracies of the models were:

k = 5: 68.04%
k = 10: 72.33%
k = 30: 73.46%

Of these three models, which is best at predicting whether customers will stay?

The company applied one of the models from the previous question to five customers whose plans are due to expires soon, and obtained the results shown below, used in parts b & c:

b) (5 pts.) How many of the five customers does the model predict will stay?

c) (5 pts.) A manager at the company believes that customers are likely to leave if they have low-cost plans and high data usage, because the company slows down these customers’ download speeds once their data usage exceeds a given threshold.  Do the results from applying k-nearest neighbors to these five customers support the manager’s claim?  Why or why not?



热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图