This project comprised of Linear Regression in python.
Address | Avatar | Avg. Session Length | Time on App | Time on Website | Length of Membership | Yearly Amount Spent | ||
---|---|---|---|---|---|---|---|---|
0 | [email protected] | 835 Frank Tunnel\nWrightmouth, MI 82180-9605 | Violet | 34.497268 | 12.655651 | 39.577668 | 4.082621 | 587.951054 |
1 | [email protected] | 4547 Archer Common\nDiazchester, CA 06566-8576 | DarkGreen | 31.926272 | 11.109461 | 37.268959 | 2.664034 | 392.204933 |
2 | [email protected] | 24645 Valerie Unions Suite 582\nCobbborough, D... | Bisque | 33.000915 | 11.330278 | 37.110597 | 4.104543 | 487.547505 |
3 | [email protected] | 1414 David Throughway\nPort Jason, OH 22070-1220 | SaddleBrown | 34.305557 | 13.717514 | 36.721283 | 3.120179 | 581.852344 |
4 | [email protected] | 14023 Rodriguez Passage\nPort Jacobville, PR 3... | MediumAquaMarine | 33.330673 | 12.795189 | 37.536653 | 4.446308 | 599.406092 |
Avg. Session Length | Time on App | Time on Website | Length of Membership | Yearly Amount Spent | |
---|---|---|---|---|---|
count | 500.000000 | 500.000000 | 500.000000 | 500.000000 | 500.000000 |
mean | 33.053194 | 12.052488 | 37.060445 | 3.533462 | 499.314038 |
std | 0.992563 | 0.994216 | 1.010489 | 0.999278 | 79.314782 |
min | 29.532429 | 8.508152 | 33.913847 | 0.269901 | 256.670582 |
25% | 32.341822 | 11.388153 | 36.349257 | 2.930450 | 445.038277 |
50% | 33.082008 | 11.983231 | 37.069367 | 3.533975 | 498.887875 |
75% | 33.711985 | 12.753850 | 37.716432 | 4.126502 | 549.313828 |
max | 36.139662 | 15.126994 | 40.005182 | 6.922689 | 765.518462 |
RangeIndex: 500 entries, 0 to 499
Data columns (total 8 columns):
Email 500 non-null object
Address 500 non-null object
Avatar 500 non-null object
Avg. Session Length 500 non-null float64
Time on App 500 non-null float64
Time on Website 500 non-null float64
Length of Membership 500 non-null float64
Yearly Amount Spent 500 non-null float64
dtypes: float64(5), object(3)
memory usage: 31.3+ KB
Data Exploration
<seaborn.axisgrid.JointGrid at 0x7ffa895f8a10>
<matplotlib.axes._subplots.AxesSubplot at 0x7ffaac938f50>
** with the Time on App column instead. **
<seaborn.axisgrid.JointGrid at 0x7ffaae6a5d10>
** jointplot to create a 2D hex bin plot comparing Time on App and Length of Membership.**
<seaborn.axisgrid.JointGrid at 0x7ffaae5a0490>
types of relationships across the entire data set.
<seaborn.axisgrid.PairGrid at 0x7ffaae5f8e90>
**linear model plot (using seaborn's lmplot) of Yearly Amount Spent vs. Length of Membership. **
<seaborn.axisgrid.FacetGrid at 0x7ffaacef4bd0>
coefficients of the model
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
Coeff | |
---|---|
Time on App | 37.892600 |
Time on Website | 0.560581 |
Avg. Session Length | 25.691540 |
Length of Membership | 61.648594 |
Predicting Test Data
<matplotlib.collections.PathCollection at 0x7ffa8405d3d0>
** scatterplot of the real test values versus the predicted values. **
<matplotlib.text.Text at 0x135546320>
Model Evaluation
Model performance wrt residual sum of squares and the explained variance score (R^2).
** Mean Absolute Error, Mean Squared Error, and the Root Mean Squared Error.**
mae : 7.74267128583874 mse : 93.83297800820083 rmse : 9.686742383701594
Residuals
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa7d1a0590>
Conclusion
We still want to figure out the answer to the original question, do we focus our efforst on mobile app or website development? Or maybe that doesn't even really matter, and Membership Time is what is really important.
Coeffecient | |
---|---|
Avg. Session Length | 25.981550 |
Time on App | 38.590159 |
Time on Website | 0.190405 |
Length of Membership | 61.279097 |
** How can we interpret these coefficients? **
should the company focus more on their mobile app or on their website?
Data shows Mobile App since time on App shows to have a larger slope wrt yearly amount spent.