This repository contains the course notes, implementations of various ML models and Hackathon submission notebooks of Summer Analytics program.
Some Tools and Technologies for ML model:
1. for creating UI of model - Gradio
2. for hosting ML model - Higging Faces (majorly used for NLP, but can be used here... read more on its website)
and many more .... still dicovering :)
df[exp1 '|' or '&' exp2]
instead of df[exp1 'or' or 'and' exp2]
? ()
, press SHIFT + TAB
. on pressing it 4
times, ?<function name without parethesis>
and run the cellif we want that effect of parameters, $\theta_1$ and $\theta_2$ isn’t neglected, how can we handle that? since in the orginal formula, we multiply $\lambda$ with all the weights from $\theta_1$ to $\theta_n$ (where n is the number of features.), How to handle that?
Ans:
For increasing the complexity of model, or to increase the effects of parameters, we apply higher degrees to them like $x^2$, $x^3$ …
Those are inserted in the model as :
$X_0$,
$X_1$,
$X_2 = X_0^2$,
$X_3 = X_1^4$
etc.
So, the new hypothesis may actually represent something like:
.predict()
and .predict_proba()
in Scikit-learn? Note: We cannot use the evealuation metrics of classification based algorithms on Regression based algorithms.
READ MORE HERE
Test Dataset
, then why it is applied on Train Dataset
? Cross validation is used to verify the best parameters on which the model is trained.
The diagram shows the position where cross validation is used, while, evaluation metrics
are used in final evaluation,
and hence applied on TEST DATASET
.
Cross-validation is a technique for validating the model efficiency by training it on the subset of input data and testing on previously unseen subset of the input data. We can also say that it is a technique to check how a statistical model generalizes to an independent dataset.
Here, input data
is the Training Data
, not complete dataset
(The logic is that we use train test split to get our test data, but in real world situation, only Train data is given to the model;
hence, we consider input data as Train data.)
fit_transform()
method on Training Dataset
and transform()
on Test Dataset
? DATA LEAKAGE (it means model learns something new from test dataset!! which is not allowed.)
Reference LINK to watch in complete detailCategorical Data
and Continuous Data
? floats and ints
are not frequently used.datatype = Float
is used here.pd.get_dummies()
and Scikit-learn's OneHotEncoder()
?
ANS:
Meta - Estimator
? Feature importances
and Permutation Feature Importances
?Feature Importance
: Feature Importance refers to techniques that calculate a score for all the input features for a given model — the scores simply represent the “importance” of each feature. A higher score means that the specific feature will have a larger effect on the model that is being used to predict a certain variable.Permutation Feature Importance
: Permutation feature importance is a model inspection technique that can be used for any fitted estimator when the data is tabular. This is especially useful for non-linear or opaque estimators. The permutation feature importance is defined to be the decrease in a model score when a single feature value is randomly shuffled. This procedure breaks the relationship between the feature and the target, thus the drop in the model score is indicative of how much the model depends on the feature. This technique benefits from being model agnostic and can be calculated many times with different permutations of the feature.fit_transform()
with SimpleImputer rather than fit()
alone?
ANS:
fit_transform()
apply all the transformations and return the 2D array which can be converted to DataFrame.SVM
and SVC
? Cardinality
? Ensemble
(For Eg., as we do from sklearn.ensemble import RandomForestRegressor)? Certainly they can overfit the data, as other models do. No model is perfectly prone to overfitting: The reasons can be, excessive complexity of its architecture (if we add many layers), or the model has too many parameters.
this overfitting problem is common to all the algorithms, infact, it is more common in case of Deep Learning algorithms as they are designed to understand (or memorize) more complex patterns.
This article tells about the techniques by which we can prevent overfitting… and the fun thing is they are also same as those of classical Machine Learning. READ THE ARTICLE HERE