How We’re Democratizing Data Science for SMBs

Illustration by Vijay Verma

In January 2020, Nirman, our CEO, and Jack, our Creative Director, were interviewed on the No Code Podcast to discuss no-code machine learning, how Obviously AI democratizes AI, and the technical aspects of how our platform works.

Here are the highlights from the interview. If you wish to listen to the entire episode, listen to the No Code Podcast on Spotify.

Obviously AI Was Founded In 2018

Alex: This is episode two of the No-Code Podcast and on today's show I'm joined by Nirman and Jack, the team behind Obviously AI. In this show we talk about a range of topics surrounding machine learning, AI and no code, and then we discuss their platform Obviously AI. Before we jump in, Obviously AI is aiming to make data science effortless for everyone. It's a platform that allows anyone to use machine learning on their own data by using plain text inputs.

I'd been planning on doing a show about no code machine learning much later, but when I had the chance to talk with Nirman and Jack, two machine-learning experts, I had to jump at it. What they're doing is really powerful stuff and I hope this discussion gives you a broad understanding of machine learning, how you could use this in your business and gets you thinking about some of the broader applications of machine learning. If you're a business analyst at a company, or a digital marketer, or even just wondered what machine learning is all about, you're going to want to listen to this episode. This was a fascinating show to research for and record and I hope you enjoy it.

So I've just given our listeners a quick intro but if you could just start off by introducing yourselves, what your roles are at Obviously AI and give a quick background as to what led you to start Obviously AI, and your founder's journey.

Nirman: Cool. So I'll quickly jump in, I'm the CEO at Obviously AI, and we started Obviously I with my co founder, Tapojit and the general idea was that a lot of non-tech business users had a bunch of questions around data that they couldn't have answers without the technical expertise. So we said, "Hey, is there a way to make it really easy to use and get them the data answers that they want without being too technical." So that's the general idea that we started with Obviously AI.

Jack: Yeah, and I'm Jack, the Creative Director at Obviously AI.

Alex: Sweet. So when, when did you guys start the company? How long have you been going now?

Nirman: So I started the company with my co-founder Tapojit, who is more technical back in 2018.

Alex: 2018. Cool. So I really want to get into, Obviously AI. What it is, what it does and how it works? Because I think it's honestly, it's pretty amazing. After the demo walkthrough we had last week, my mind hadn't really stopped racing. So even through the weekend I was thinking of different applications and what it could do. But before all that, I really want to take a step back and set the scene for our listeners. Machine learning versus AI. I know there's tons of talks about these two topics nowadays.

Read Obviously AI's full origin story here.

Machine Learning vs AI and How it Relates to Obviously AI

Alex: You can hardly read anything tech-related without hearing these terms. And you know, I feel like AI, it's almost just a buzzword that gets thrown around these days like blockchain or whatever you have. And also a lot of the discussion, AI is going to take over the world. Singularity's coming, whatever. So maybe if we could just set the scene, if you could break down these two terms, machine learning and AI. I feel like a lot of people use them interchangeably. Can you talk about what they are and how they differ?

Nirman: Absolutely. So AI is the larger overarching branch, and machine learning is more of a subset of AI. So if you think of AI, AI really is the idea that we can build smart intelligent machines that behave and act like human beings. And machine learning is a subset of AI where the general idea is to build machines that can find complex patterns within lots and lots of data, and then use those patterns to understand what's likely to happen next. Today we see most AI work happening through machine learning. We've got prediction algorithms, we've got algorithms that can recognize spaces, we got algorithms that can understand or replicate someone's voice. All of that comes under machine learning.

Alex: Okay, great. So with that, let's hear about Obviously AI. What is it, what does it do? Why did you build it? Who did you build it for? What were the problems you're looking? I know you covered that in the intro, but if we can drill down on those points

Nirman: For sure. So, Obviously AI is a tool that enables anyone to run complex predictions and analytics on their data without having to write any code.

So that's what Obviously is, that's what it does. We built it because about a year and a half back, I was working at a small startup in San Francisco. It was about 50 to 60 people at the company, and I was the only data scientist slash machine learning engineer at the company. And everyone that was a business user would often come up to me and ask me questions around data science, right? Something as simple as, "Hey, tell me the list of top hundred customers." Do something more complex like, "Hey, who's likely to cancel our subscription plan?"

And every time they had a question I would have to build up this quick and dirty algorithm, I'd have to write some scripts and give it back to them.

Then it would come up to me with follow up questions, and it would take me away from the work that I was doing, and that's where the light bulb struck where it was like, "Hey, why is it that these non-tech business users are the ones asking questions, but all the tools to answer them are made for technical people like me?"

So can we build a tool that enables anyone to do, A: Predictions, and B: Analytics, which deep dives into the data that they already have without the writing code? And that's where the idea of Obviously AI really started: Where anyone could plug in the historical data that they have, ask a question in plain English.

Where they could say, A, "Tell me which customer is likely to buy again, or give me a list of top hundred customers." And from that plain English question, our technology will automatically understand what they're asking, build a machine learning algorithm that's tailored, and give them a result back. That's the general kind of workflow for Obviously AI.

Alex: You've basically come up with a no-code solution for this process, for machine learning?

Nirman: Correct.

Why We Built a Tool That Democratizes Machine Learning

Alex: Yeah.So, I mean that speaks to my next point, which is why did you choose to build this out as a no code solution? Well, I mean most businesses are born from frustrations encountered or problems you've found. So obviously you found that non-technical people were looking to use machine learning for their needs, but that was inaccessible to them.

Nirman: Absolutely. I think we'd started to see that machine learning problems were everywhere. No matter how big a business you are, there are a bunch of machine learning problems that could help you optimize the business. But we started to also notice that the pattern was not the people that were business users required deep technical knowledge to run data science on the data.

So we really started with the vision that we wanted to make data science effortless for everyone, and that's why the whole no-code solution was the best way to go. Because it gives the power to the business user who's not deeply technical.

Alex: Totally. Basically just creating more, I guess, open access really to kind of tools and solutions, and I think that's really good. Like, I don't know where we are in the machine learning adoption curve, but I feel like for a lot of people, machine learning is still hidden away behind closed doors, and only really big tech companies like Google or Amazon might be able to use these tools. And what you showed me last week it's certainly almost, I'd say democratizes this technology even, or just the processes.

Nirman: Exactly.

How No-Code Democratizes AI

Alex: So, super cool. So back to one of the points raised, why no code, which I think we've spoken about, but what do you think the current state of no code machine learning tools are out there? Like you mentioned applying filters and pretty basic type things like identifying, is this a cat? Is this a dog? Pretty simple stuff.

So where are we in that kind of machine learning adoption curve and maybe the no-code machine learning adoption curve?

Nirman: Absolutely. Machine learning has been incredibly hard to adopt historically. It's only been accessible to people with deep technological knowledge, deep knowledge in data science, in machine learning, in statistics, and very a great understanding of how the algorithms work.

So those are the kinds of people that have access to machine learning today. Now really what we see is that let's say a small restaurant owner wants to optimize their business and they want to understand where they can get more sales, what they can do to get more people at the front door, stuff like that, but they really can't afford to build a data science team. It's really very limited to the companies like Google and Amazon that have this ability to bring in bunch and bunch of data scientists inbound. That we've started to see a bunch of easy to use machine learning solutions coming out. But then it's still technical for someone who's a super non-tech person, but someone who is a business user.

What we need to be able to do is bring a revolution in the way people think of data science—in the way people think of machine learning—to take a leap from being too technical and making the access to algorithms insanely easy.

So when someone can ask a question in plain English and get access to algorithms, that's how we really think machine learning should become, as it becomes incredibly ubiquitous as a part of life and how business users think about it.

How Obviously AI Works and How Machine Learning Speeds Up the Data Science Process

Alex: It's going to be super interesting. I think there's a huge potential. So let's maybe get a little bit more into the machine learning side of things and then I'd actually really like you to do a deep dive into Obviously AI and how it works and what it does. And you've mentioned kind of the plain English a couple times. I really want to get into that, but machine learning can do things and solve problems if people can't, or maybe solve problems that a programmer may not know how to program for. Machine learning is teaching itself how to solve the problems, rather than just a programmer coding a script and you it being told. On the Mark there or is that...

Nirman: Yeah, to some extent that's something that we start to see on surface today. So let me tell you a little bit about just the kind of problems that machine learning solves. A lot of people believe that machine learning will solve problems that human beings cannot do at all. But in reality what machine learning does is it expedites the process in which human beings solve problems. So say for example, I gave you a list of 10 buildings in San Francisco and I asked you to tell me one common entry in that list. And you would almost instantly, if you look at those 10 buildings you can say, "Hey, all these buildings are painted black." for example. And that's very easy to understand for us as humans because it's just a list of 10 buildings.

But if that was a list of let's say 10 million buildings across the United States, now that pattern becomes incredibly difficult. That's where we bring in machines that can easily understand and find those patterns and give us insights that we wouldn't have been able to pick up as human beings. So a lot of machine learning that's out there today may it be facial recognition, may it be just finding patterns in like text data. What it does today is it picks up on those patterns and acts on it. So, that's what the machines do today and it gives us a feeling that it's being able to do something that humans totally cannot. But what it's really doing is making that process really, really fast.

If I probably gave you a year to find a pattern in that 10 million buildings of data, it would be fairly safe forward to do. But still, if you want to do it in a second, that's where machine learning comes into the picture.

But there's also this other aspect of machine learning, which is programs that are writing their own code, and learning and teaching themselves and growing themselves and that's an aspect which has started to bubble up now where we see that a machine learning algorithm is writing an app by itself.

Alex: Very cool. Okay, so Obviously AI, let's blow the lid off the box here. How does it work? What does it do? Why would people want to use it and what are maybe some valuable use cases here?

Nirman: Obviously AI, if you put it very simply, it helps you take your historical data that you have, and predict something off of it. And the way you do it is by going conversely yet, plugging in your data source, it could be a database, it could be a CSV file, or even a CRM system that you use. And then you get a Google like search bar to ask a question that you would like to predict.

So say for example, I work at a phone company and I have a list of customers that have canceled my phone service, and I want to predict which new customers are likely to cancel. All I have to do is plug in the dataset, ask a question which says, A "Tell me which customers are likely to cancel." And in under 30 seconds, our technology will kind of understand what you're asking, find the right data, build a machine learning algorithm that is tailored to your use case, and give you the prediction results back.

Alex: Wow.

Nirman: You literally get a user by user list with a probability that says, "Hey, this is the user, this is the probability of them canceled." Now who's going to be canceled is a very simple use case.

You can do a bunch of different use cases from, use a turn to predicting supply chain costs, new predicting loan defaulting, for example, whether someone will pay the credit card bills on time, yes or no. That kind of thing, to predicting which sales lead is likely to convert. We've had doctors that predicted I'm not going to show up on time for their appointments.

So lots of different use cases. As long as you have historical data and you want to predict something off of it, that's what you can do on Obviously AI.

What Kind of Data Do You Need for Obviously AI?

Alex: Wow. Pretty pretty powerful stuff. And what do people need to get started? You mentioned just a dataset.

Nirman: Exactly. So all you need is either a CSV file of your dataset, or if your database is in let's say a Salesforce or HubSpot kind of a CRM system. 

Alex: Cool. So are there any I guess prerequisites of the data like is literally just a CSV file or like how much data do you need?

Nirman: Yeah, that's a great question. So we need at least a couple thousand rows of data to get started. That will give you a really good accuracy in the kind of predictions that you build. And what we need is that the data needs to be structured to some extent, meaning your data has to be in columns and in rows, rather than being in a hundred different CSV files. So those are the kinds of few limitations that we have as you set up your data, and we have them listed on the website and so you could easily check them.

We Match The Right Algorithm With Your Natural Language Question

Alex: Yeah. I just wanted to go back to what you said and clarify the point because my mind is kind of spinning here. So the algorithms used to make the predictions, so the user is typing in what they want to know and then how has that algorithm been chosen?

Nirman: Yeah, great question. So say for example, you type in, I want to predict which female users in San Francisco are likely to cancel subscription, right? You're getting a little detail here and that's what we type in. Then, first of all, Obviously AI will take that question, and it'll take your data and it'll run the different filters on the data to find the columns that are important to use.

After that, it will look at the properties of your data. So it's going to look at how sparse your data is, how many medical categorical columns that you have, how big your data set is and other details on your dataset. And based on that, it'll come up with an assumption of three different algorithms to use that can give the highest accuracy. It will then run those three algorithms with thousands of differences, and pick the one with highest accuracy.

So if you were to think about it in very simple terms, the question that you're asking on the backend, Obviously AI is going to run thousands of variations of different algorithms and give you one that gives the highest accuracy.

Alex: Right? Okay. So let's say the one algorithm that is chosen, how accurate is that algorithm? And maybe if we could take a step back here, I don't know if I'm sure that this is a thing, but being a machine learning newbie I don't know, is there a high watermark or a kind of a goal post that algorithms are judged upon for accuracy?

Nirman: Absolutely. So, ideally an algorithm would be considered good or fairly accurate, if it has something around 75% accuracy. So the way the way accuracy is structured is the total amount of data that you plug in, it's going to put aside a small percentage of that data. It's going to break your data into two chunks. One is going to be a smaller chunk and another is going to be a larger chunk, and it's going to use that larger chunk of data to predict to build the machine learning algorithm, and then it's going to test it on that smaller chunk. You'll say, "Hey how well did I perform?" And whatever the results come out that's going to be the accuracy of your algorithm. So that's how I accuracy is defined in general. But at 75% is a fairly standard starting point

Alex: For sure. So is that the benchmark for the technology or is that as the algorithms get better, and you have more data and as time goes on, like do you see that 75% is going to come up to 90, 95% as kind of a benchmark over time or?

Nirman: So say for example, you connect a database, and the database keeps updating every day as new users sign up, and you created a prediction on Obviously AI, now as new users come in the prediction will automatically update and over time the accuracy will only increase and get better. So the accuracy that you've seen today will be totally different than the accuracy that you see, let's say three months from now.

Alex: Sure. Cool. And so before you were talking about plain text, plain english being entered into by the user to tell it what they want to do. So as a, let's say, a no code machine learning solution aimed at a nontechnical person, I see that as being one of the big drawbacks, is they may not know what algorithm to use.

Nirman: Exactly.

Alex: So, how did you guys go about, I imagine that's a key point of differentiation that you were trying to solve. How did you guys go about solving that or like making that easier for the user?

Nirman: Right. So, that's a great point. A lot of tools that are out there the way they work when it comes to running predictions, is that you plug in your historical data and then you often have to select a column that is, that you want to predict. And then you have to select a few details like the algorithm that you want.

Alex: Yeah.

Nirman: You got to select a few settings, like what loss function you want on that algorithm. And what happens is often as a business user, it gets a little confusing where you often don't know what algorithms to make. You often don't know what a lost function is, and sometimes it also becomes challenging if you don't work a lot with data to understand what are the columns and which ones you pick. So that's why we brought in this very simple idea where we say forget about the drop downs and picking the settings, just ask a question, like you would to another human being, right?

So if you were to ask me a question, that's exactly how I want you to interact with this platform. We want to make it as smooth as possible?

For the no-code business user to get started? So that's what this really is idea came into the picture where I think if we were to go back a couple of versions back when we built the very first version of our platform, we actually had those drop downs. We actually had drop-downs where we got to select the algorithm, select the settings and stuff like that. And we went to one of our users and we said, "Hey, here it is, go ahead and make your prediction."cAnd he stood blank. He was like, "Hey, I don't know where is this dropdown? And then we came to a point where we said, "Okay, we'll think of a few other things that you can do." Eventually we built this Google like search bar, and we said, "Hey, all you have to do is ask the predictive question you have in mind." And with a little bit of hesitation, he kind of leaned in and he started typing, and he typed, "Which users will cancel." Right. And he asked that question. All of a sudden it found the right columns, it built the algorithm, it gave the results.

How Fast Can Obviously AI Make Predictions?

Alex: So speaking of speed, how long would like say I wanted to predict something on 250 million, like how long does this process take?

Nirman: So any prediction or analytics questions that you ask in Obviously AI, the results available to you in under 30 seconds are over 10 million rows of data. So just add another 30 seconds for every 10 million rows.

Alex: All right. So, okay. 20 million a minute. Got you.

Nirman: Ideally a process that would usually take a data scientist or just if you were to sit and write your own algorithm, the process that would usually take a couple of days to weeks. That's the kind of speed that we bring. So I imagine that point alone would have implication even for a data science team. Like if they're just trying to churn through their workload.

How We Approach Data Security

Alex: Cool. And so I imagine people are pretty mindful, very topical as well as data integrity, and security, and keeping data safe. What kind of issues arise here when people are running these algorithms are using, making predictions upon their data on say, an external platform or a cloud based platform, an external company.

Nirman: Exactly. And that's a great point. We've built our entire security and the app infrastructure on Google Cloud's System. So what it does is it helps us and it enables us to take advantage of the security compliance systems that Google Cloud hasn't faced. So we are compliant with GDPR, we are compliant with HIPAA. Those are the kinds of things that kind of help us make really compliant. But on top of that,

There's one other very important thing that a lot of people don't know, which is when it comes to machine learning predictions, you don't need any personal identifying information. You don't need any PII. So by default, if you were to upload a CSV file with a list of customers, you don't need to put in their names, you don't need to put in their phone numbers, you don't need to put in their email addresses because that's not going to help with the predictions anyway.

If you want to hear more, listen to the full podcast on Spotify. If you want to reach out to Alex, DM him on Twitter @NoCode_Podcast, or you can flip him an email, at nocodepod@gmail.com.


Exclusive datasets, guides, and insights to your inbox.

Join 3,000 subscribers. GDPR and CCPA compliant.