Artificial Intelligence(AI), Machine Learning(ML) and Data Knowledge assets

Collected from Linkedin

Table of Contents

Putting Data, Artificial Intelligence, Machine Learning, Gen. AI etc in context

The Artificial Intelligence, Machine Learning, LLM, etc., etc. terminology can be a little confusing. Below is a very brief overview of how these technologies/terminologies fit together.  This page is geared to someone new to Data, Analytics, and the domain of Artificial Intelligence – or someone who wants to leverage the collateral collected. You ‘all can use Google to drive into more details and academic definitions – there are so many details left out here.

Where it all begins… Data Analyst

  • Understands the business and data – need to know both
  • It all starts with Data (screw this up and the rest of your work will produce inaccurate outcomes)

This gives a good perspective of what a Data Analyst does by showing a path to become one: Data_Analyst_How to become one 8pp.pdf

– Data accuracy (to your business domain) is critical to ensure success including:

  • Data Governance: data owners that can define what/where on the data
  • Data Dictionary: Clear definitions on what the data is
  • Which data do you need to solve your business problem?

Database:

  • Many types, many use cases.. Bottom line ,this is the primary place where data is stored.

Note: data can also be stored in file systems (including the cloud) – but without a defined-governed structure (if enabled in Databases) you’re probably going to get one off results.

Not complete  but a good introduction: 1 Database fit for function good setup 1pp.jpg

Data Pipeline (Part of “Data Architecture”) – this is the heavy lifting and can take 70-80% of your overall Data Analysis/Science effort. Typically performed by Data Engineers

Perspective on Data Pipeline, see: Data_Pipeline_Basics_2pp.pdf

  • The process for taking data from a source (were its created or stored)
  • Copied, validated, translated ( like a string to a date, or business logic applied), to a location that then be used for:
    • Representing the business view of the data
      • Called Data Modeling (not to be confused with Data Science Models below)
    • Storing data in a business usable format
    • Doing data analysis
    • Input to Data Science projects (and outcomes from these projects also)
    • Etc….

Domain of the Data Engineer: Data Engineering Scope Intro.jpg

So now that you have data you can (at the most basic level):

  • Create reports/dashboards
  • Slice/Dice the data to understand basic facts about your data & business

Taking analytics to the next level: Now that you understand more about your data and business you can leverage advanced technologies (think Artificial Intelligence) to better understand, visualize and better create/enable/predict/automate business outcomes.

That brings us to the role of a Data Scientist and the field of Artificial Intelligence.

Artificial Intelligence (AI), is the science of making computers/machines “Intelligent” to think more like a human.

Simple overview of the steps for developing an AI solution: AI Implement seamlessly Cheat sheet 1pp.jpg

Machine learning (ML) is a sub-field of AI. ML is the science of training a computer/machine what to do based on data. Essentially you take data, create Data Science Models, train & test those data models to automatically apply learnings/behaviors (from the data model) to new incoming data so that that data is processed in a way that allows the machine to make “decisions” – because you’ve trained the machine from your earlier data. This allows the computer (for example) to:

  • Predict outcomes (based on prior data behavior and desired behavior), xx will happen
  • Classifying information (think grouping/clustering of behaviors)
  • Image scanning (teaching this is what a cat looks like.. So it must be a cat).. Think millions of images
  • Chatbots -think Natural Language processing (NLP). They learn based on trained data, and “lean” as more conversations (data created) take place. More data in, more positive/negative outcomes and added to data set
  • Complex visualizations/analysis of data
  • Etc….

This is a nice, single page illustration of how Machine learning works: 1 Machine Learning model – how it works 1pp.jpg

Here’s a perspective of the overall domain of ML: Perspective. – ML scope 1pp.jpg

Generative AI: AI is limited to processing existing data, (to influence outcomes of future data). Generative AI goes to the next step and generates context (data) from data it has. Think how ChatGPT works. ChatGPT has parsed much of the internet and gained a deep library of content. It can then take phrases/context by looking for patterns in the text to generate a result. Which then goes to the requestor (user), and then feeds back into the library of content.

Note: as you can see bad data in, bad data out.. So  curated/governed data used to build AI models  is absolutely critical for accurate results.

To get deeper in this, see (I’m looking for a better doc but here’s what I have so far); Generative_ai_Mastering 41pp.pdf

With Generative AI, you now have a new job class appearing: “Prompt Engineer”. A prompt engineer will define the right question or series of questions to a Generative AI model to get the most “accurate outcome” (kind of like assembling the right words together to quickly get a google result).  Again, accurate input data is critical.

There’s also a large, growing field of AI ethics.. Again, goes back to data and model development. People, and data (depth/breadth of data) can influence what will get returned and obviously outcomes. So ethics plays a critical role here (search online for examples)

Here’s some good examples for ChatGPT, (you can see how important this can become): 1 50_Awesome_Chatgpt_Prompts__1696702805.pdf  – also see the ChatGPT cheatsheets.

Large Language Models (LLM): is a sub category of Generative AI that deals with language/text and primarily creates text output (and some visualizations)

  • Like ChatGPT

Good overview of how they work: 1 LLm_Really good description 31pp.pdf.  or here: LLM Large Lang models how they work 14pp.pdf

Python is an interpreted programming language used by Data Analysts/Data Scientists to analyze data , data models and build AI capabilities. Many analytical, modeling, and visualization libraries are available for python making it the “simple” Swiss army knife of Data Analysts/Scientists.

Good overview with context of Artificial Intelligence: ML_Using_Python_84pp.pdf

Visualization: With Data Analytics/Data Science how you represent data is critical. There are many Visualizations/libraries/tools available.

A good simple overview of which chart to use for which use case.. Not complete but a good way to start: 1 Data WHICH_CHART_WHEN_YOUR_GUIDE_TO_CHOOSING_THE_RIGHT_CHART__19PP.pdf or here: 1 Charts supporting story telling 1pp.jpg

Also, a simple cheatsheet to squeeze visualizations out of ChatGPT: 1 ChatGPT for Data Analysis cheat sheet 1pp.jpg

Data Science projects, below are examples of what others have done.

I’ve assembled these assets primarily from https://Linkedin.com contributors with the intent to help folks better understand the field of Data Science and beyond. (Note: I realize there’s a lot here and I need to shrink)

Usage Hints:

–  You’ll see in the file lists some files starting with a 1 , these ones I believe are good place to start (but you may feel otherwise). Keep learning and ping me if you have any feedback/issues.

– Number of pages (in each document) are that last 3-5 characters of the filename. Gives you an idea on how detailed/not detailed the document is.

– Interview questions: there are some documents covering this topic in each section

Productivity and Miscellaneous Cheatsheets

Miscellaneous cheatsheets and productivity collateral.

Generative AI (like ChatGPT) and Large Language Models (LLM)

Understanding what Large Language Models are and how to take advantage.. Really cool, and very powerful BUT you need to validate/verify anything you get. Data is king in any ML/AI/LLM – garbage in garbage out as they say. See examples of good prompt engineering ideas to more quickly get to your outcome. 

Good ChatGPT prompts, cheetsheets that show alternative LLM tools, ChatGPT – Excel integration..etc…

Data Science: Artificial Intelligence and Machine Learning

Data Science is the overall practice of delivering ML/AI solutions, and although I use ML and AI interchangeably here – they are different, related concepts.. See https://en.wikipedia.org/wiki/Machine_learning for a more professional definition.

Data, Data Analytics, and Database

Miscellaneous Data and Analytics papers

Data Science Projects

Miscellaneous Data Science projects solving business problems with ML/AI.  Some are incomplete, but should help you understand approach and you can go figure out the additional details if you need to.

Python

Python is a great, “simple”, programming language for being productive quickly in the field of Data Science (and elsewhere). Its my programming language of choice for things that need to get done quickly.

Visualization Tools and Libraries

Collection of Visualization tools and libraries for integration to Python (or other languages). A good visualization strategy will make your Data Science story come to life.