Machine Learning: Concepts and Terminology#

This topic is an introductory one. We will only scratch the surface; there is much terminology to learn, and this is what this topic is about. Data science, machine learning, and artificial intelligence are a very broad field, and there is much more to it.

See this video for a really good and thorough introduction. Beware though, it is a whopping 4 hours long.

Data scientists are not always known to be good (let alone diligent) programmers. I have seen people use Jupyter Notebooks as editor and runtime environment for their programs. While I like notebooks to play around, creating nicely looking web pages with plots and charts as a side effect (really cool), there is a point where one has to bite the bullet and start to program. The latter is the focus of these AI topics.

See Also#

Concrete applications of the concepts and terminology introduced in this topics can be found here:

How Far Is Mankind from Creating God#

  • Artificial Narrow Intelligence (ANI, Weak AI)

    • Stage that we are at now: can solve special problems

    • Weather forecast

    • Image recognition

    • Autonomous driving

  • Artificial General Intelligence (AGI, Strong AI)

    • By far not there: can do everything a human can

  • Artificial Super Intelligence (ASI)

Basic Terminology: Algorithm and Model#

  • Algorithm. For example …

    • Linear regression

    • Decision tree

    • Random forest

    • (many many more)

  • Model. Trained by using an algorithm.

    • Uses the algorithm

    • Takes the input and maps it to output

    • Built through training.

Basic Terminology: Features and Data#

  • Input features or predictor variables

    • Set of variables used as input to the model

  • Output features or response/target variables

    • Set of variables calculated by the model, based on input features

  • Training Data.

    • Used to create the model (the more the better)

    • Divided (spliced) into two parts

      • Training data; used to actually create/train the model

      • Testing data; used to test the efficiency/accuracy of it

Types of Machine Learning#

  • Supervised Learning.

    • Each input training datum has its known/desired output attached as a label.

    • Used for regression and classification

  • Unsupervised Learning.

    • Works on unlabeled data.

    • Creates clusters on its own, identifying features.

    • Used for association and clustering

  • Reinforcement Learning.

    • Agent learns from actions by measuring rewards. Rather advanced. No training. Trial and error.

Problems Solved#

  • Regression

    • Output: continuous quantity (usually a forecast of something)

    • Solved by supervised learning algorithms like Linear Regression.

    • See topic: Linear Regression

  • Classification

    • Output: categorical quantity (“spam or not”)

    • Solved by supervised learning algorithms like

      • Support Vector Machines

      • Naive Bayes

      • Logistic Regression

      • K Nearest Neighbor

  • Clustering

    • Output: clusters of input data

    • Solved by unsupervised learning algorithms like K-means

    • See topic: K-Means