Koj xav pib nrog tshuab kev kawm?
Kuv tau tsim qhov kev qhia yooj yim thiab yooj yim rau cov pib ua tiav. Ua ke, peb yuav dhau mus dhau cov kauj ruam yooj yim ntawm kev cob qhia tus qauv kev kawm tshuab.
Thaum piav qhia txog cov kauj ruam ntawm kev cob qhia tus qauv ib los ntawm ib qho, kuv kuj tseem yuav muab ib qho piv txwv yooj yim ntawm cov teeb meem kev kawm tshuab ib yam. Yog li, yog tias koj xav ua raws li, koj tuaj yeem rub tawm cov ntaub ntawv piv txwv no los ntawm qhov no txuas.
Qhov no tsuas yog ib qho piv txwv dataset los pab koj pib nrog kev kawm tshuab.
Peb muaj 18 qhov tseem ceeb ntawm cov neeg muaj hnub nyoog sib txawv thiab poj niam txiv neej uas muaj lawv cov nkauj nyiam. Los ntawm kev siv, cov yam ntxwv ntawm "hnub nyoog" thiab "poj niam txiv neej" peb yuav sim twv seb hom nkauj twg yog lawv nyiam.
Nco tseg: 1 thiab 0 raug muab rau poj niam txiv neej li poj niam thiab txiv neej hauv cov ntaub ntawv no.
Txawm li cas los xij, yog tias koj tsis xav ua raws li qhov piv txwv, nws kuj zoo kawg nkaus. Kuv yuav piav qhia txhua yam ntawm cov kauj ruam no. Yog li, cia peb dhia dej!
Thawj Yam Yuav Tsum Paub
Ua ntej yuav mus rau hauv cov kauj ruam ntawm kev cob qhia tus qauv, cia peb qhia qee cov ntsiab lus. Machine learning yog ib qho artificial txawj ntse kev qhuab qhia uas tsom rau kev tsim cov algorithms uas tuaj yeem kawm los ntawm cov ntaub ntawv.
Ua li no, cov qauv kev kawm tshuab raug cob qhia ntawm cov ntaub ntawv uas qhia tus qauv yuav ua li cas thiaj li ua tau qhov tseeb los yog kev cais tawm ntawm cov ntaub ntawv tshiab, yav tas los tsis paub.
Yog li, cov qauv no yog dab tsi? A tshuab kev kawm qauv zoo ib yam li daim ntawv qhia uas lub khoos phis tawj siv los tsim cov ntaub ntawv kwv yees lossis kev xaiv.
Ib qho qauv, zoo li daim ntawv qhia, ua raws li cov lus qhia los ntsuas cov ntaub ntawv thiab tsim cov kev kwv yees lossis kev txiav txim raws li cov qauv pom hauv cov ntaub ntawv. Cov ntaub ntawv ntau dua tus qauv raug cob qhia, qhov tseeb dua nws qhov kev twv ua ntej.
Hom qauv twg peb tuaj yeem cob qhia?
Wb saib dab tsi yog cov qauv kev kawm tshuab yooj yim.
- Linear Regression: tus qauv uas kwv yees qhov sib txawv ntawm lub hom phiaj txuas ntxiv los ntawm ib lossis ntau qhov kev hloov pauv.
- Neural Networks: ib lub network ntawm cov txuas txuas uas tuaj yeem kawm paub txog cov qauv nyuaj hauv cov ntaub ntawv.
- Cov ntoo txiav txim siab: ib txoj hauv kev txiav txim siab tsim los ntawm cov saw ntawm cov ceg ntoo yog tias-lwm nqe lus.
- Clustering: ib txheej ntawm cov qauv uas pab pawg sib piv cov ntsiab lus raws li qhov sib xws.
- Logistic Regression: tus qauv rau cov teeb meem kev faib tawm binary uas lub hom phiaj sib txawv muaj ob qhov txiaj ntsig.
- Cov ntoo txiav txim siab: ib txoj hauv kev txiav txim siab tsim los ntawm cov saw ntawm cov ceg ntoo yog tias-lwm nqe lus.
- Random Forest: ib qho qauv tsim los ntawm ntau cov ntoo txiav txim siab. Lawv feem ntau siv rau kev faib tawm thiab kev siv regression.
- K-Nearest Neighbors: tus qauv uas kwv yees lub hom phiaj sib txawv uas siv cov ntsiab lus k-nearest cov ntaub ntawv hauv kev cob qhia.
Nyob ntawm peb cov teeb meem thiab dataset, peb txiav txim siab seb tus qauv kawm tshuab twg haum rau peb qhov xwm txheej tshaj plaws. Txawm li cas los xij, peb yuav rov qab los rau qhov no tom qab. Tam sim no, cia peb pib kawm peb tus qauv. Kuv vam tias koj twb downloaded lub cov ntaub ntawv yog tias koj xav ua raws li peb tus qauv.
Tsis tas li ntawd, kuv xav kom muaj Jupyter Phau Ntawv ntsia rau ntawm koj lub tshuab hauv zos thiab siv nws rau koj lub tshuab kev kawm.
1: Txhais qhov teeb meem
Thawj theem hauv cob qhia machine learning qauv yog txhais qhov teeb meem yuav daws tau. Qhov no suav nrog kev xaiv cov kev hloov pauv uas koj xav kwv yees (paub tias yog lub hom phiaj sib txawv) thiab cov kev hloov pauv uas yuav raug siv los tsim cov kev kwv yees (paub tias cov yam ntxwv lossis kev kwv yees).
Koj yuav tsum tau txiav txim siab seb yam teeb meem ntawm kev kawm tshuab koj tab tom sim daws li cas (kev faib tawm, kev rov qab, pawg, thiab lwm yam) thiab hom ntaub ntawv twg koj yuav xav tau los sib sau lossis tau txais kev cob qhia koj tus qauv.
Cov qauv uas koj ua haujlwm yuav raug txiav txim los ntawm hom kev kawm txog cov teeb meem uas koj xav daws. Classification, regression, thiab clustering yog peb yam tseem ceeb ntawm kev sib tw ntawm kev kawm tshuab. Thaum koj xav kwv yees qhov sib txawv categorical, xws li seb email puas yog spam lossis tsis, koj siv kev faib tawm.
Thaum koj xav kwv yees qhov sib txawv tsis tu ncua, zoo li tus nqi ntawm lub tsev, koj siv qhov kev hloov pauv. Clustering yog siv los muab cov ntaub ntawv sib piv cov khoom raws li lawv qhov sib xws.
Yog peb saib peb tus yam ntxwv; peb qhov kev sib tw yog los txiav txim siab tus neeg nyiam suab paj nruag los ntawm lawv cov poj niam txiv neej thiab hnub nyoog. Peb yuav siv cov ntaub ntawv ntawm 18 tus neeg rau qhov piv txwv no thiab cov ntaub ntawv ntawm lawv lub hnub nyoog, poj niam txiv neej, thiab cov suab paj nruag nyiam.
2. Npaj cov ntaub ntawv
Tom qab koj tau teev qhov teeb meem, koj yuav tsum tau npaj cov ntaub ntawv rau kev cob qhia tus qauv. Qhov no suav nrog kev tu thiab ua cov ntaub ntawv. Yog li ntawd, peb yuav xyuas kom meej tias nws yog nyob rau hauv ib hom ntawv uas tus tshuab kawm algorithm tuaj yeem siv.
Qhov no yuav suav nrog kev ua ub no xws li tshem tawm qhov tseem ceeb uas ploj lawm, hloov cov ntaub ntawv categorical rau cov ntaub ntawv tus lej, thiab ntsuas lossis ua kom cov ntaub ntawv zoo ib yam kom ntseeg tau tias txhua tus yam ntxwv nyob rau tib qhov ntsuas.
Piv txwv li, qhov no yog qhov koj rho tawm qhov tseem ceeb uas ploj lawm:
import pandas as pd
# Load the data into a pandas DataFrame
data = pd.read_csv('data.csv')
# Check for missing values
print(data.isnull().sum())
# Drop rows with missing values
data.dropna(inplace=True)
# Check that all missing values have been removed
print(data.isnull().sum())
Daim ntawv me me: Hauv kab o "import pandas as pd",
peb import lub tsev qiv ntawv Pandas thiab muab nws lub npe hu ua "pd" kom yooj yim rau kev siv nws cov haujlwm thiab cov khoom tom qab hauv cov cai.
Pandas yog ib qho zoo-paub module rau Python rau kev tswj cov ntaub ntawv thiab kev tsom xam, tshwj xeeb tshaj yog thaum ua hauj lwm nrog cov ntaub ntawv los yog tabular.
Hauv peb qhov piv txwv ntawm kev txiav txim siab hom suab paj nruag. Peb mam li xub import cov ntaub ntawv. Kuv tau muab nws lub npe hu ua music.csv, txawm li cas los xij, koj tuaj yeem hu nws li cas koj xav tau.
Txhawm rau npaj cov ntaub ntawv rau kev cob qhia tus qauv kev kawm tshuab, peb faib nws ua tus cwj pwm (hnub nyoog thiab poj niam txiv neej) thiab lub hom phiaj (suab paj nruag hom).
Peb tseem yuav faib cov ntaub ntawv rau hauv 80: 20 kev cob qhia thiab kev sim ntsuas los ntsuas qhov kev ua tau zoo ntawm peb cov qauv thiab tsis txhob overfitting.
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
# Load data from CSV file/code>
music_data = pd.read_csv('music.csv')
# Split data into features and target
X = music_data.drop(columns=['genre'])
y = music_data['genre']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
3. Xaiv tus qauv kev kawm tshuab.
Tom qab koj tau npaj cov ntaub ntawv, koj yuav tsum xaiv lub tshuab-kev kawm qauv uas haum rau koj txoj haujlwm.
Muaj ob peb lub algorithms xaiv los ntawm, xws li cov ntoo txiav txim siab, logistic regression, txhawb vector tshuab, neural networks, thiab lwm yam. Cov algorithm uas koj xaiv yuav raug txiav txim los ntawm cov teeb meem uas koj tab tom sim teb, hom ntaub ntawv koj muaj, thiab koj xav tau kev ua haujlwm.
Peb yuav siv cov ntoo txiav txim siab rau qhov piv txwv no vim tias peb tab tom ua haujlwm nrog cov teeb meem kev faib tawm (xws li cov ntaub ntawv categorical).
# Import necessary libraries
from sklearn.tree import DecisionTreeClassifier
Nov yog qhov pom ntawm qhov Decision Tree Classifier ua haujlwm li cas:
4. Qhia tus qauv
Koj tuaj yeem pib cob qhia tus qauv thaum koj tau xaiv qhov siv tau tshuab-kawm algorithm. Qhov no suav nrog kev siv cov ntaub ntawv tsim tawm yav dhau los los qhia cov algorithm ntawm yuav ua li cas rau kev kwv yees ntawm cov ntaub ntawv tshiab, yav dhau los tsis pom.
Lub algorithm yuav hloov kho nws cov kev tsis sib haum xeeb thaum lub sij hawm kev cob qhia kom txo qis qhov sib txawv ntawm nws qhov kev kwv yees qhov tseem ceeb thiab qhov tseem ceeb hauv cov ntaub ntawv qhia. Qhov ntau ntawm cov ntaub ntawv siv rau kev cob qhia, nrog rau cov txheej txheem algorithm tshwj xeeb, txhua tus tuaj yeem cuam tshuam rau qhov raug ntawm cov qauv tshwm sim.
Hauv peb qhov piv txwv tshwj xeeb, tam sim no peb tau txiav txim siab txog ib txoj hauv kev, peb tuaj yeem cob qhia peb tus qauv nrog cov ntaub ntawv qhia kev cob qhia.
# Train the decision tree classifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
5. Ntsuas tus qauv
Tom qab tus qauv tau txais kev cob qhia, nws yuav tsum tau soj ntsuam ntawm cov ntaub ntawv tshiab kom ntseeg tau tias nws yog qhov tseeb thiab muaj kev vam meej. Qhov no suav nrog kev sim cov qauv nrog cov ntaub ntawv uas tsis tau siv thaum lub sijhawm kev cob qhia thiab muab piv rau nws qhov kev xav tau rau qhov tseem ceeb hauv cov ntaub ntawv xeem.
Qhov kev tshuaj xyuas no tuaj yeem pab txhawm rau txheeb xyuas cov qauv tsis zoo, xws li overfitting lossis underfitting, thiab tuaj yeem ua rau muaj kev kho kom zoo uas yuav tsum tau ua.
Siv cov ntaub ntawv xeem, peb yuav ntsuas qhov tseeb ntawm peb tus qauv.
# Import necessary libraries
from sklearn.metrics import accuracy_score
# Predict the music genre for the test data
predictions = model.predict(X_test)
# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: ", accuracy)
Cov qhab nia raug tsis zoo li tam sim no. 🙂 Txhawm rau txhim kho koj cov qhab nia raug, koj tuaj yeem ntxuav cov ntaub ntawv ntau dua lossis sim ua cov qauv kev kawm sib txawv kom pom qhov twg muab cov qhab nia siab tshaj.
6. Kho tus qauv zoo
Yog tias tus qauv kev ua haujlwm tsis txaus txaus, koj tuaj yeem kho nws los ntawm kev hloov ntau yam algorithm tsis tau lossis los ntawm kev sim nrog cov algorithms tshiab nkaus.
Cov txheej txheem no yuav suav nrog kev sim nrog lwm cov kev kawm, hloov kho qhov chaw tsis tu ncua, lossis hloov tus lej lossis qhov loj ntawm cov txheej zais hauv lub neural network.
7. Siv tus qauv
Thaum koj txaus siab rau tus qauv kev ua tau zoo, koj tuaj yeem pib siv nws los tsim kev kwv yees ntawm cov ntaub ntawv tshiab.
Qhov no yuav ua rau muaj kev noj cov ntaub ntawv tshiab rau hauv tus qauv thiab siv tus qauv kev kawm tsis tau los ua kom muaj kev kwv yees ntawm cov ntaub ntawv ntawd, lossis kev sib koom ua ke tus qauv rau hauv daim ntawv thov dav dav lossis qhov system.
Peb tuaj yeem siv peb tus qauv los tsim kev kwv yees ntawm cov ntaub ntawv tshiab tom qab peb txaus siab rau nws qhov tseeb. Koj tuaj yeem sim cov txiaj ntsig sib txawv ntawm poj niam txiv neej thiab hnub nyoog.
# Test the model with new data
new_data = [[25, 1], [30, 0]]
predictions = model.predict(new_data)
print("Predictions: ", predictions)
Qhwv Sau
Peb tau kawm tiav peb cov qauv kev kawm thawj zaug.
Kuv vam tias koj tau pom nws muaj txiaj ntsig. Tam sim no koj tuaj yeem sim siv cov qauv kev kawm sib txawv xws li Linear Regression lossis Random Forest.
Muaj ntau cov ntaub ntawv thiab cov kev sib tw hauv Kag yog tias koj xav txhim kho koj cov coding thiab nkag siab txog kev kawm tshuab.
Sau ntawv cia Ncua