We’re living through an extinction-level event. No, not COVID19. I’m talking about the demise of the popular KNN algorithm that is taught in pretty much every Data Science course! Read on to find out what’s replacing this staple in every Data Scientists' toolkit.
Finding “K” similar items to any given item is widely known in the machine learning community as a “similarity” search or “nearest neighbor” (NN) search. The most widely known NN search algorithm is the K-Nearest Neighbours (KNN) algorithm. In KNN, given a collection of objects like an e-commerce catalog of handphones, we can find a small number…
Deciding which pre-trained model to use in your Deep Learning task ranks at the same level of classical dilemmas like what movie to watch on Netflix and what cereal to buy at the supermarket (P.S. buy the one with the least sugar and highest fiber content). This post will use a data-driven approach in Python to find out the best Keras Pre-Trained model for the
cats_vs_dogs dataset. This post and the code provided will also help you easily choose the best Pre-Trained model for your problem’s dataset.
Coding tests are pretty much standard in Data Science interview processes these days. As a Data Science hiring manager, I find a 20–30 min live coding test with some prepared tasks to be effective at identifying candidates who would be successful in the roles that I typically hire for.
Google Colab [Link] is an excellent tool for various offline and live Data Science coding interviews due to its familiar notebook environment and convenient sharing options. But Colab is pretty much limited to Python (and R with some hacks).
There comes a time in every production Data Science project when the code base has become complex, and a refactor is necessary to maintain your sanity. Perhaps you want to abstract out commonly used code into Python modules with classes and functions so that it can be reused with a single line instead of copy-pasting the whole block of code multiple times in your project. Whatever your reason, writing informative logging into your program is critical to ensure you can track its operation and troubleshoot it when things inevitably go wrong.
In this article, I’ll share ~80% of the python…
Publishing Data Science stories on Medium is hard work. It takes weeks (even months) to research interesting topics, architect code in the simplest way possible, and weave it all into an engaging story. For example, last December (2020), I published a story titled “KNN is Dead,” which was the culmination of more than three months of my research into the field and is one of my finest works to date.
Unfortunately, someone had the great (*sarcasm*) idea to take a shortcut and completely plagiarised my story in January 2021, almost word for word. This person had more than 100 followers…
Docker containers are crucial for Data Science at Scale [Link]. That’s very well the case for Approximate Nearest Neighbors (ANNs) on “big” data too!
Everything must run in a container
Speed and Accuracy (or Recall) are the top two considerations while choosing a Nearest Neighbors or Similarity Search algorithm. In my previous post, KNN is Dead, I have proven the tremendous (>300X) speed advantage ANNs have over KNN at comparable accuracy. I’ve also discussed how you can choose the fastest, most accurate ANN algorithm on your own dataset [Link].
In my previous post [KNN is Dead!], I have compared an ANN algorithm called
sklearn's KNN and proved that HNSW has vastly superior performance with a 380X speed up while delivering 99.3% of the same results.
To make things even more interesting, there are several ANN algorithms like
As a data scientist, I am a huge proponent of making data-driven decisions, as I mentioned in How to Choose the Best Keras Pre-Trained Model. So, in this post, I’ll demonstrate a…
Disclaimer: The opinions in this article are my own and not related to my employer in any way.
Data Science and Data Analytics are some of the hottest jobs on the market going into 2021. The field is so popular and job descriptions so broad that most job openings receive hundreds or even thousands of applicants because most men know they can apply to a position even when they don’t meet 100% of the requirements [Link]. For some reason, women are more conservative [Link].
With so many applications pouring in and Data Science/Analytics being new fields in many of these…
Artificial Intelligence is a broad term that encompasses many techniques, all of which enable computers to display some level of intelligence similar to us humans.
The most popular use of Artificial Intelligence is robots that are similar to super-humans at many different tasks. They can fight, fly, and have deeply insightful conversations about virtually any topic. There are many examples of robots in movies, both good and bad, like the Vision, Wall-E, Terminator, Ultron, etc. Though this is the holy grail of AI research, our current technology is very far from achieving that AI level, which we call General AI.
AI is not going to replace managers, but managers who use AI are going to replace those who don’t.
Machine Learning (ML) is one of those heavily used buzzwords that you often hear these days. Most managers want to use it but don’t know where to start or even what it actually means. It may seem mysterious, technical, and intimidating at first. But in this post, I’ll breakdown what ML is, its applications, how ML is built, and the skills you need to develop ML at a very high “management” level.
In most simple words, Machine Learning is the ability…