Suppose you need to know if cash makes individuals comfortable, so that you obtain the Higher Life Index information from the OECD’s website and World Bank stats about gross home product (GDP) per capita. You then be a part of the tables and kind by GDP per capita.

Let’s plot the info for these nations.

There does appear to be a development right here! Though the info is *noisy* (i.e., partly random), it seems to be like life satisfaction goes up roughly linearly because the nation’s GDP per capita will increase. So that you resolve to mannequin life satisfaction as a linear operate of GDP per capita. This step known as *mannequin choice*: you chose a *linear mannequin* of life satisfaction with only one attribute, GDP per capita.

life_satisfaction = *θ1**gdp_per_capita + *θ*0(just like y = mx + c)

Earlier than you should use your mannequin, you want to outline the parameter values *θ*0 and *θ*1. How will you know which values will make your mannequin carry out greatest? To reply this query, you want to specify a efficiency measure. You’ll be able to both outline a *utility operate* (or *health operate*) that measures how *good* your mannequin is, or you possibly can outline a *price operate* that measures how *dangerous* it’s. For linear regression issues, individuals sometimes use a price operate that measures the space between the linear mannequin’s predictions and the coaching examples; the target is to minimise this distance.

That is the place the linear regression algorithm is available in: you feed it your coaching examples, and it finds the parameters that make the linear mannequin match greatest to your information. That is referred to as *coaching* the mannequin. In our case, the algorithm finds that the optimum parameter values are *θ*0 = 3.75 and *θ*1 = 6.78 × 10–5.

Now the mannequin suits the coaching information as intently as attainable (for a linear mannequin).

You might be lastly able to run the mannequin to make predictions. For instance, say you need to understand how comfortable Cypriots are, and the OECD information doesn’t have the reply. Luckily, you should use your mannequin to make a very good prediction: you search for Cyprus’s GDP per capita, discover $37,655, after which apply your mannequin and discover that life satisfaction is prone to be someplace round 3.75 + 37,655 × 6.78 × 10–5 = 6.30.

Instance 1–1. Coaching and operating a linear mannequin utilizing Scikit-Study

`import matplotlib.pyplot as plt`

import numpy as np

import pandas as pd

from sklearn.linear_model import LinearRegression# Obtain and put together the info

data_root = "https://github.com/ageron/information/uncooked/predominant/"

lifesat = pd.read_csv(data_root + "lifesat/lifesat.csv")

X = lifesat[["GDP per capita (USD)"]].values

y = lifesat[["Life satisfaction"]].values

# Visualize the info

lifesat.plot(variety='scatter', grid=True,

x="GDP per capita (USD)", y="Life satisfaction")

plt.axis([23_500, 62_500, 4, 9])

plt.present()

# Choose a linear mannequin

mannequin = LinearRegression()

# Practice the mannequin

mannequin.match(X, y)

# Make a prediction for Cyprus

X_new = [[37_655.2]] # Cyprus' GDP per capita in 2020

print(mannequin.predict(X_new)) # output: [[6.30165767]]

Ref — Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition