Illustration by Pablo Stanley
Words by Tapojit Debnath
Since we’ve launched Obviously AI a few months back, we’ve received several technical questions on our algorithms, such as what machine learning algorithms we offer, how our algorithms work, how accurate our algorithms are, etc.
All great questions and ones that need to be answered—but in a non-technical way. Since we first and foremost serve non-technical users and aim to make data science effortless for all, we’ve collected the questions we’ve received and have been brainstorming how to provide answers in an easy-to-digest-way. We really want to give our users a way to explain what happens to their data using Obviously AI and democratize ML knowledge.
If you’ve ever wanted to know the tech specs of our algorithms and how our platform processes queries in natural language, this post is meant to describe everything that happens in our platform from typing in a query and pressing “Go” to getting your Prediction Report.
We’ve Simplified the Traditional Machine Learning Process Into 3 Steps
If you’re an avid reader of our blog, we’ve covered how we’re challenging traditional data science by introducing a new way of performing machine learning predictions and analytics. Obviously our mission starts with our platform.
We took the magic number “3” and made it easy for anyone to receive the power of ML in 3 steps.
Compared to the traditional process:
The main way we’re making data science effortless is we took out code and replaced it with natural language—a.k.a basic human language. Now, the same way you searched Google, you can make predictions or search through your data to get reports.
What Happens When You Press “Go” Inside Obviously AI
Here’s how Obviously AI receives your question, assigns it to an algorithm, and gets a Prediction Report.
1. Preprocessing/Feature Engineering/Normalization
Once you press “Go” to make a prediction, Obviously AI begins the preprocessing phase where it essentially turns raw data into inputs the machine learning algorithm can understand. It removes rows or columns with empty/null values, feature columns with too many unique non-numeric values, upsamples and downsamples the data, and finally runs several other processes to make your data machine learning ready. This is also called Feature Engineering and is a popular ML term to improve ML model accuracy.
Obviously AI also performs normalization where it changes the values of the numerical columns to get more accurate ranges. Not every dataset requires normalization, but it is mainly used to improve accuracy when there are two very different ranges. Say for example, there’s a column of Age and a column of Salary. These columns will have two very different ranges. Age will primarily be number 0 to 100 and Salary could be anywhere between $40K to $1M. We don’t want the column with the larger range to influence the smaller range and make the prediction inaccurate, so we normalize the data and put it on a similar scale to Age.
2. Training Models
This is where machine learning gets technical.
Think of building an algorithm the same way you would trying to make music on a synthesizer. When you take a synthesizer out of the box, there’s pre-loaded settings inside of it. The same is true for algorithms. There are basic algorithms that are essentially a blank canvas. Each algorithm has different settings. Think of these settings as knobs or buttons on a synth—the same way you would Attack, Release, Decay, etc. A professional musician could take these pre-set sounds and find the most fitting one they want for a track pretty quickly, compared to a beginner. Think of Obviously AI as this professional musician that takes a pre-set algorithm and tries out thousands of permutations based on the dataset’s properties and finds the right combinations on the fly for optimized accuracy. This is music to a non-technical business user’s ears because they might be a ML beginner and it would take a while to build the most accurate algorithm.
All the user has to do is enter a query and press “Go.”
3. Testing for Accuracy
On top of the previously mentioned processes Obviously AI performs for accuracy, we also take an extra step to improve the accuracy of your Prediction Report.
While Obviously AI is testing your dataset, it sets aside a section of rows to test separately for consistency. For example, out of a 1,000 row dataset, it separates 100 rows and tests them for the same accuracy as the rest of your dataset. This ensures that the algorithm is accurate for all of your data—even out of context of the 1,000 rows.
AND the crazy thing about these 3 steps is this all happens in 30 seconds or less.
The Types of Algorithms We Use
We use several algorithms ranging from something as simple as Logistic Regression, to more complex neural networks like Perceptrons and Elastic Nets. We also turn to probabilistic algorithms like Naive Bayes, but it all depends on your dataset.
Typically these algorithms can perform two key functions:
- Classification: Anything and everything where you take data and try to predict labels like “Is it a good day to play tennis?” (YES or NO) or “What groceries should I stock up today?” (Bread, Pasta or Juice).
- Regression: Anything and everything where you try to predict a number output for a new item. Example, what is the price of this apartment gonna be in 2 months from now? ($2,300). What is the time of productivity if people work from home? (5hrs)
Classification and Regression both fall under Supervised Machine Learning.
Further Your Knowledge of Technical ML
Our platform is only one of the ways we’re putting ML in the hands of non-technical users. We will be publishing technical posts like this periodically, but we also provide a dedicated Data Scientist to up-level machine learning knowledge and technical expertise.
Read more on how to add a Data Scientist to your team.