Creating a requirements.txt file is a necessary process, particularly when sharing your code, developing MLOps or just pushing something up into a Docker container.
It is surprisingly tricky to get a clean requirements.txt file from a Jupyter Notebook, so I’ve investigated the different ways to do it…
This is the most common way I’ve seen on Stack Overflow is to use pip freeze. This will list every package and version in your current virtual environment.
Simply open a terminal and navigate to the folder you want your requirements file to…
MLOps is one if the most popular buzzwords in Machine Learning and Data Science at the moment, but one of the areas least covered by online courses, YouTube videos and bootcamps.
You can read up on how I define MLOps at home here:
Since writing the above article, I have been researching MLOps with the following question in mind…
How can I develop and learn MLOps at home, without expensive software, in a way that transfers to real world problems?
My solution is to develop the idea of notebook MLOPs; a seamless transition from a Jupyter Notebook into a repeatable…
Data to support your search for the next meme stock.
Financial data is the backbone of modern Hedge Funds, Banks, FinTechs and many others. These organisations typically have plenty of cash and are able to spend it on data, hence financial data often fetches a hefty price tag.
As a regular person working on side projects, this cost does not feel worth it as side projects aren’t always profitable. Luckily for you, I’ve spend the last few days tracking down some of the most useful data sources for financial data that you can use in your projects.
As we know, Machine Learning is ubiquitous in our day to day lives. From product recommendations on Amazon, targeted advertising, and suggestions of what to watch, to funny Instagram filters.
If something goes wrong with these, it probably won’t ruin your life. Maybe you won’t get that perfect selfie, or maybe companies will have to spend more on advertising.
How about facial recognition in law enforcement? Loan or mortgage applications? Driverless vehicles?
In these high-risk applications, we can’t go in blind. We need to be able to dissect our model, we will need to be able to understand and explain…
Gradient Boosting models such as XGBoost, LightGBM and Catboost have long been considered best in class for tabular data. Even with rapid progress in NLP and Computer Vision, Neural Networks are still routinely surpassed by tree-based models on tabular data.
Enter Google’s TabNet in 2019. According to the paper, this Neural Network was able to outperform the leading tree based models across a variety of benchmarks. Not only that, it is considerably more explainable than boosted tree models as it has built-in explainability. It can also be used without any feature preprocessing. …
Linear Models are considered the Swiss Army Knife of models. There are many adaptations we can make to adapt the model to perform well on a variety of conditions and data types.
Generalised Additive Models (GAMs) are an adaptation that allows us to model non-linear data while maintaining explainability.
A GAM is a linear model with a key difference when compared to Generalised Linear Models such…
Over the past few years, many organisations have found that although they train great models, they don’t always gain long-term value from them. The reason for this is deployment and monitoring.
Deploying a model isn’t always that easy. Sometimes they are large. How long does inference take? You might need a GPU. Do you want to predict in batches?
Monitoring is key. If we train a model on something fashion-related, it could be invalid within a few months due to a rapidly changing trends.
MLOps is the solution to this problem, but it also covers much more.
Data Science is still a roaring field with demand continuing to outstrip supply and many business expecting to increase their IT spend drastically over the next few years.
Although there has been a sharp rise in online courses, bootcamps and degrees and with them, an increase in junior talent, it is still a great time to get into Data Science.
There are some amazing resources out there for project ideas but many of them have been done by most new Data SCientists. Pretty much everyone has done a Twitter sentiment analysis project (myself included!), looked at the Titanic dataset or…
When I first started learning Data Science and looking at projects, I thought you could either do a Deep Learning or regular project. This is not the case.
With powerful models becoming more and more accessible, we can easily leverage some of the power of deep learning without having to optimize a neural network or use a GPU.
In this post, we are going to look at embeddings. This is the way deep learning models represent words as vectors. …
What I’ve learned after writing 9 articles on Medium — 10 tips for my 10th article.
I’ve been an avid reader on Medium for well over a year now, but I only started writing towards the end of 2020.
While I haven’t enjoyed any considerable success so far, my stories have had over 2000 views and I am on track to pay back my membership.
This article lists what I’ve learnt in the brief few months I’ve been using the platform.
1. Write, and write often
I think this is the number 1 tip in many fields and it is…