### Math foundation for AI ML development

Linear algebra and calculus help with the data science part of AI MLOps implementations. An understanding of probability and statistics is useful for model implementation. For example, if you want to predict the probability of rain or likelyhood of reaching a place through dense traffic, you need to understand probability and stats. Similarly if you want to predict the chances of a batsman scoring a century or being bowled out.Basic math concepts and terms worth understanding before jumping into AI:

- Mean,Average, Median, Mode, Quartiles.
- Data classification into deterministic variable, random variable, discrete values, discrete random variables and continuous random variables.
- Histograms, probability mass functions
- Variance, standard deviation
- Covariance and correlation
- Joins and intersections
- Sample space
- Probability Distribution Function (PDF), Probaility Mass Function (PMF)

Feature engineering. Example - converting 1-5 rating scale to Negative, Neutral, Positive ratings. Helps in turning a dataset into a uniform distribution.

One hot encoding.

Label encoding. Converting categories into numberic representations.

Statistical estimation.

Normal (Gaussian) distribution.

A scaled histogram of a scaled variable is the PDF of that variable.

How to find out how many records should be selected for sampling? Distribution does not change much by adding another data point.

Mean absolute deviation (M.A.D.)

Mean of square deviation (M.S.D. or variance)

Standard normal table

### ML Models

Good reference:- Best for most: Linear/logistic regression, Decision Tree, Neural Networks, XGBoost, NaiveBayes, PCA, KNN, SVM,t-SNE.
- ML models for tabular datasets- a paper published by Intel suggests XGBoost is best for tabular data as of 2021.
- XGBoost alternatives: CatBoost , HistGradientBoosting , LightGBM .

Decision Tree

Data driven models for classification and regression. Decision tree models will give zero error on training data set. Powerful model for fitting capable dataset. Trained by greedy optimization algo called Classification and regression tree algorithm (CART). Decision trees work like recursive if-else conditions, eliminating branches based on the criteria separating one decision from another. For example to decide the specie of a flower, the data points considered mybe petal length, petal width, color. A decision tree will check for petal length condition, then go down a branch for which petal length matches, then check for petal width and go down the sub-branch. At one point the decision branches are exhausted and a final prediction is available.

Terms used in DT

- Gini, class, sample attribute, value attribute. Gini is impurity at node.
- Decision boundaries, leaf node, feature space.

Confusion matrix. Precision.Recall.F1 score. F2 score.

Root mean square error. Mean absolute error. Relative error. R2=1-MSE/Variance