Logistic Regression

This Project comprises of Logistic Regression exercise in python.

Data Head

Daily Time Spent on Site Age Area Income Daily Internet Usage Ad Topic Line City Male Country Timestamp Clicked on Ad
0 68.95 35 61833.90 256.09 Cloned 5thgeneration orchestration Wrightburgh 0 Tunisia 2016-03-27 00:53:11 0
1 80.23 31 68441.85 193.77 Monitored national standardization West Jodi 1 Nauru 2016-04-04 01:39:02 0
2 69.47 26 59785.94 236.50 Organic bottom-line service-desk Davidton 0 San Marino 2016-03-13 20:35:42 0
3 74.15 29 54806.18 245.89 Triple-buffered reciprocal time-frame West Terrifurt 1 Italy 2016-01-10 02:31:19 0
4 68.37 35 73889.99 225.58 Robust logistical utilization South Manuel 0 Iceland 2016-06-03 03:36:18 0

** Crude data metrics**

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
Daily Time Spent on Site    1000 non-null float64
Age                         1000 non-null int64
Area Income                 1000 non-null float64
Daily Internet Usage        1000 non-null float64
Ad Topic Line               1000 non-null object
City                        1000 non-null object
Male                        1000 non-null int64
Country                     1000 non-null object
Timestamp                   1000 non-null object
Clicked on Ad               1000 non-null int64
dtypes: float64(3), int64(3), object(4)
memory usage: 78.2+ KB
Daily Time Spent on Site Age Area Income Daily Internet Usage Male Clicked on Ad
count 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.00000
mean 65.000200 36.009000 55000.000080 180.000100 0.481000 0.50000
std 15.853615 8.785562 13414.634022 43.902339 0.499889 0.50025
min 32.600000 19.000000 13996.500000 104.780000 0.000000 0.00000
25% 51.360000 29.000000 47031.802500 138.830000 0.000000 0.00000
50% 68.215000 35.000000 57012.300000 183.130000 0.000000 0.50000
75% 78.547500 42.000000 65470.635000 218.792500 1.000000 1.00000
max 91.430000 61.000000 79484.800000 269.960000 1.000000 1.00000

Data Exploration

** Histogram of the Age**

<matplotlib.axes._subplots.AxesSubplot at 0x7f9ba54bf510>

png

Jointplot showing Area Income versus Age.

<seaborn.axisgrid.JointGrid at 0x7f9ba5189750>

png

Jointplot showing the kde distributions of Daily Time spent on site vs. Age.

<seaborn.axisgrid.JointGrid at 0x7f9ba2d23590>

png

** Jointplot of 'Daily Time Spent on Site' vs. 'Daily Internet Usage'**

<seaborn.axisgrid.JointGrid at 0x7f9ba2b88150>

png

** Pairplot with the hue defined by the 'Clicked on Ad' column feature.**

<seaborn.axisgrid.PairGrid at 0x7f9ba2aad210>

png

Model Evaluation

** Classification report for the model.**

             precision    recall  f1-score   support

          0       0.91      0.95      0.93       157
          1       0.94      0.90      0.92       143

avg / total       0.92      0.92      0.92       300

** Confusion Matrix **

[[149   8]
 [ 15 128]]

Leave a Reply

Your email address will not be published. Required fields are marked *