From the second you start predicting in Obviously AI, your data is preprocessed, trained, and tested using custom no-code machine learning algorithms.
Today, we're happy to announce we've added two new algorithms to our platform: Random Forest Classifier and Random Forest Regressor 🌲
Refresher on How Codeless Machine Learning Works
Here’s how Obviously AI receives your question, builds a custom algorithm, and generates a Prediction Report.
1. Preprocessing/Feature Engineering/Normalization
Once you make a prediction, Obviously AI begins the preprocessing phase where it essentially turns raw data into inputs the machine learning algorithm can understand. It removes rows or columns with empty/null values, feature columns with too many unique non-numeric values, upsamples and downsamples the data, and finally runs several other processes to make your data machine learning ready. This is also called Feature Engineering and is a popular ML term to improve ML model accuracy.
Obviously AI also performs normalization where it changes the values of the numerical columns to get more accurate ranges. Not every dataset requires normalization, but it is mainly used to improve accuracy when there are two very different ranges. Say for example, there’s a column of Age and a column of Salary. These columns will have two very different ranges. Age will primarily be number 0 to 100 and Salary could be anywhere between $40K to $1M. We don’t want the column with the larger range to influence the smaller range and make the prediction inaccurate, so we normalize the data and put it on a similar scale to Age.
2. Training Models
This is where machine learning gets technical.
Think of building an algorithm the same way you would trying to make music on a synthesizer. When you take a synthesizer out of the box, there’s pre-loaded settings inside of it. The same is true for algorithms. There are basic algorithms that are essentially a blank canvas. Each algorithm has different settings. Think of these settings as knobs or buttons on a synth—the same way you would Attack, Release, Decay, etc. A professional musician could take these pre-set sounds and find the most fitting one they want for a track pretty quickly, compared to a beginner. Think of Obviously AI as this professional musician that takes a pre-set algorithm and tries out thousands of permutations based on the dataset’s properties and finds the right combinations on the fly for optimized accuracy. This is music to a non-technical business user’s ears because they might be a ML beginner and it would take a while to build the most accurate algorithm.
All the user has to do is enter a query and press “Go.”
3. Testing for Accuracy
On top of the previously mentioned processes Obviously AI performs for accuracy, we also take an extra step to improve the accuracy of your Prediction Report.
While Obviously AI is testing your dataset, it sets aside a section of rows to test separately for consistency. For example, out of a 1,000 row dataset, it separates 100 rows and tests them for the same accuracy as the rest of your dataset. This ensures that the algorithm is accurate for all of your data—even out of context of the 1,000 rows.
AND the crazy thing about these 3 steps is this all happens in 30 seconds or less.
The Types of Algorithms We Use
We use several algorithms ranging from something as simple as Logistic Regression, to more complex neural networks like Perceptrons and Elastic Nets. We also turn to probabilistic algorithms like Naive Bayes, but it all depends on your dataset.
Typically these algorithms can perform two key functions:
- Classification: Anything and everything where you take data and try to predict labels like “Is it a good day to play tennis?” (YES or NO) or “What groceries should I stock up today?” (Bread, Pasta or Juice).
- Regression: Anything and everything where you try to predict a number output for a new item. Example, what is the price of this apartment gonna be in 2 months from now? ($2,300). What is the time of productivity if people work from home? (5hrs)
Classification and Regression both fall under Supervised Machine Learning.
We wrote a blog post covering how no-code algorithms work if you wish to read more!
About the New Random Forest Algorithms
1. Random Forest Classifier (RFC): Trains multiple decision trees with training dataset samples (sampling occurs by replacement). Once trained, the random forest makes a prediction based on votes from individual decision trees, i.e: the category predicted by most trees is considered to be the random forest prediction.
RFCs performs better than other models for datasets with mostly categorical features/categorical features with high unique category count (eg: "Country" column).
2. Random Forest Regressor: Similar to RFCs, this model also trains multiple decision trees by sampling dataset, but the predicted output is calculated as an average of the outputs from all the decision trees. It also performs better on datasets with mostly categorical features.
REMINDER: You can see which algorithm your prediction used and the accuracy of the model in the tech specs tab inside Obviously AI.
Login to Obviously AI here to start making predictions. 🌲🌲🌲