.. ot-topic:: python.misc.ai.machine_learning_intro Machine Learning: Concepts and Terminology ========================================== .. contents:: :local: .. ot-graph:: :entries: python.misc.ai.machine_learning_intro This topic is an introductory one. We will only scratch the surface; there is much terminology to learn, and this is what this topic is about. Data science, machine learning, and artificial intelligence are a very broad field, and there is much more to it. See this video for a really good and thorough introduction. Beware though, it is a whopping 4 hours long. .. raw:: html Data scientists are not always known to be good (let alone diligent) programmers. I have seen people use `Jupyter Notebooks `__ as editor and runtime environment for their programs. While I like notebooks to play around, creating nicely looking web pages with plots and charts *as a side effect* (really cool), there is a point where one has to bite the bullet and start to *program*. The latter is the focus of these AI topics. See Also -------- Concrete applications of the concepts and terminology introduced in this topics can be found here: * :doc:`linear-regression/topic` * :doc:`k-means` How Far Is Mankind from Creating God ------------------------------------ * **Artificial Narrow Intelligence** (ANI, Weak AI) * Stage that we are at now: can solve special problems * Weather forecast * Image recognition * Autonomous driving * **Artificial General Intelligence** (AGI, Strong AI) * By far not there: can do everything a human can * **Artificial Super Intelligence** (ASI) * `Terminator `__ and such Basic Terminology: Algorithm and Model -------------------------------------- * **Algorithm**. For example ... * Linear regression * Decision tree * Random forest * (many many more) * **Model**. Trained by using an *algorithm*. * Uses the algorithm * Takes the input and maps it to output * Built through *training*. Basic Terminology: Features and Data ------------------------------------ * **Input features** or *predictor variables* * Set of variables used as input to the model * **Output features** or *response/target variables* * Set of variables calculated by the model, based on *input features* * **Training Data**. * Used to create the model (the more the better) * Divided (*spliced*) into two parts * *Training data*; used to actually create/train the model * *Testing data*; used to test the efficiency/accuracy of it Types of Machine Learning ------------------------- * **Supervised Learning**. * Each input training datum has its known/desired output attached as a *label*. * Used for regression and classification * **Unsupervised Learning**. * Works on *unlabeled* data. * Creates clusters on its own, identifying features. * Used for association and clustering * **Reinforcement Learning**. * Agent learns from *actions* by measuring *rewards*. Rather advanced. No training. Trial and error. Problems Solved --------------- * **Regression** * *Output*: continuous quantity (usually a forecast of something) * Solved by *supervised learning* algorithms like *Linear Regression*. * See topic: :doc:`linear-regression/topic` * **Classification** * *Output*: categorical quantity ("spam or not") * Solved by *supervised learning* algorithms like * Support Vector Machines * Naive Bayes * Logistic Regression * K Nearest Neighbor * **Clustering** * *Output*: clusters of input data * Solved by *unsupervised learning* algorithms like K-means * See topic: :doc:`k-means`