-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathMBA_Basics.R
96 lines (79 loc) · 4.41 KB
/
MBA_Basics.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#Market basket Analysis:
#Basics:
#install.packages("arules")
#install.packages("arulesViz")
library(arules)
library(arulesViz)
#Reading a transaction data----
groceries <- read.transactions("F:/rWork/rProjects/AbhiAnalytics/groceries.csv", sep = ",")
#Exploring data----
#Summary of the transactions data imported
summary(groceries)
#Using summary(),we see details like: 2513 out of 9835 transactions contain whole milk,
#while 1809 out of 9835 transactions contain rolls/buns.
#There are 2159 transactions that contain only 1 item purchased,
#and only 1 transaction with 32 unique items bought.
#Using the inspect() for the sparsed matrix,
#the list of first 5 transactions can be seen below.
inspect(groceries[1:5])
#Using an itemFrequency() on the sprased matrix, the first three items with names arranged
#alphabetically in the sprased matrix is shown with their respective occurance frequency.
itemFrequency(groceries[, 1:3])
#Using an itemFrequencyPlot() function on the sparsed matrix, the relative frequency of
#each items can be seen as barchart.Setting support = 0.1 means that
#items with greater than or equal to 10% frequency will be shown in the barchart.
itemFrequencyPlot(groceries, support = 0.1)
#To specify the parameter in the itemFrequencyPlot(), we can also use "topN".
#It specifies the top N items in frequency from the groceries data.
#Here, items will be arranged in descending order by default.
itemFrequencyPlot(groceries, topN = 20)
#Data Model creation----
#Using an apriori() function on the groceries dataset, with specific support and
#confidence values as well as number of items specified in each itemset,
#An apriori object, groceryrules is created.
groceryrules <- apriori(groceries, parameter = list(support =
0.006, confidence = 0.25, minlen = 2))
#This dataset contains 463 rules each with minimum of 2 items.
groceryrules
#Model Performance Evaluation----
#To see the summary of the rules object, a summary() is used.
summary(groceryrules)
#It contains information for total of 463 rules generated by the specific requirements
#in the previous apriori(), and splitting down into three possible itemset length.
#There are 150 rules for 2 items, 297 rules for 3 items and 16 rules for 4 items.
#Using the inspect() to examine the first three rules below:
inspect(groceryrules[1:3])
#In the first rule: Lift shows that customer who purchased the potted plants is 1.56 times
#more likely to purchase the whole milk. A larger lift value is a strong indicator that
#arule is important and reflects a true connection between the items in the rules.
#Model Performance Improvement----
#1.A way to examine the association rules set is to look at rules with higher lift because
#a larger lift value indicates a strong connection between items that were hidden previously.
#Example:A rule with highest lift in dataset is one that associate herbs with root vegatables;
#Customer who bought herbs is almost 4 times likely to purchase root vegatables than a typical customer.
inspect(sort(groceryrules, by = "lift")[1:5])
#2.Another way to examine rules in the data is by looking at specific subset of rules which
#contains {berries}.This can be useful if one were asked to create advertisement to promote
#berries at a particular season.
berryrules <- subset(groceryrules, items %in% "berries")
inspect(berryrules)
#By looking at the list of rules related to {berries} with highest lift value, one can see
#that customers who bought berries is 3.8 times likely to buy whipped cream/sour cream, and
#2.3 times more likely to buy yogurt. After these information is revealed, one can understand
#that maybe the berries and cream/yogurt is a good combination for dessert.
#Saving rules----
#The rules object can be saved as a csv. file or a dataframe for future analysis.
write(groceryrules, file = "F:/rWork/rProjects/AbhiAnalytics/groceryrules.csv",
sep = ",", quote = TRUE, row.names = FALSE)
groceryrules_df <- as(groceryrules, "data.frame")
str(groceryrules_df)
#Scatter Plot----
#We can use the default plot() method to create a scatter plot.
#It will plot the support on X axis, confidence on Y axis and
#lift is represented by the opaqueness/alpha of the color of the points.
library(arulesViz)
plot(groceryrules)
#Network Plot----
#We can create a network plot of first 25 rules using the method argument and
#supplying it the value graph.
plot(groceryrules[1:10], method = 'graph')