The 5 Most Important Skills You Need To Get a Data Job in 2021

A data-driven approach to understanding technical skills in the UK market.

Photo by Lukas Blazek on Unsplash

Introduction

This article uses a web scraped dataset to analyse text and find the most common skills in data-related jobs. I’m focusing on technical/hard skills for this article and will tackle soft skills later on.

For each of the 5 most common skills, I’m going to cover:

  • What is it?
  • What the Data Says
  • My Opinion
  • Salary
  • Learning

The dataset consists of 3,015 job titles including salaries.

The mean salary is £49,543.

The median salary is £44,000.

These seem quite high and we’ll discuss why later on.

Table of Contents

Methodology

This dataset has been scraped from Indeed.co.uk using Python’s BeautifulSoup package. The data was scraped in November and December 2020.

  • The search terms used were a range of data-related roles, Indeed also returns a wide range of roles e.g. ‘Data Analyst’ would also return ‘Data Scientist’ roles.
  • Duplicates have been removed using the job title, description and location.
  • The data was filtered to only jobs containing salaries.
  • The titles and descriptions have been cleaned to remove common words such as ‘the’, ‘and’, ‘is’, etc.
  • Some cleaning of hard skills was done to group common misspellings together e.g. ‘PowerBI’ and ‘Power BI’.

The most common words are not hard/soft skills and there is no definitive list of data-related skills. The lists used to pick these out were put together qualitatively, using intuition and looking at a small number of job descriptions.

The charts were put together using a variety of Python packages using the scraped data and these lists of words.

There is likely to be a lot of bias in this dataset but some of the trends should hold true more generally for the market.

Indeed.co.uk is very popular with recruitment companies, rather than the actual companies hiring. From experience, recruiters tend to write shorter job descriptions (highlighting common skills) and may include a wider, higher salary range in the posting.

Recruitment companies are much more likely to include salary information than individual companies, so this dataset has even more bias towards recruiters.

The main skills were picked out manually, so it’s possible something significant was be missed. Let me know if you spot anything!

Job Titles

I’ve grouped titles to Analyst, Scientist, Engineer and Other. This was done in a very simple way so there could be a few mislabelled titles in the Other category.

For the rest of this article, I’m going to use the following notation:

  • DS: Data Scientist(s)
  • DA: Data Analyst(s)
  • DE: Data Engineer(s)

The below violin chart is effectively a smoothed histogram of the salaries for each role. We can see that DA roles are the most common with the majority sitting around the £30k-£40k mark. The dashed line is the mean salary for each group. The black dots are the individual roles and you can hover over these to see the actual (cleaned) title.

Individual jobs per category (hover for title) and Violin chart of salary ranges. Created by Author.

The clear trend here is that DA roles are the most common, but pay less than DS and DE roles. From my experience, it is much more common to find entry-level DA roles, which explains the salary trends here.

Hard Skills

Hard skills are the technical keywords included in the job description. This could be Python or Java or a more general term such as Machine Learning.

The below chart shows the count of jobs using each term, with a breakdown for a keyword in the title (Engineer/Scientist/Analyst). A box plot of the salary range is also included, which shows the median salary and any outliers.

The skills are sorted by the total number of jobs including them, I am going to run through the top 5.

Bar Chart and Box Plot showing the most common technical skills and their salaries. Created by Author.

This looks at the percentage occurrence of each skill for each title group. So for example, we can see that SQL is used in DA job descriptions around 50% of the time.

Bar Chart showing the percentage of jobs in each role category that contains the skill. For example, almost 50% of Analyst jobs contain ‘SQL’. Created by Author.

We can view the relationships between skills with a chord chart. Unfortunately, I can’t embed an interactive version into this article so I’ve only included a screenshot. Follow the link to my portfolio for the fully interactive version (it’s worth it!).

Chord Diagram highlighting the relationship between SQL and Python. Image by Author.

What is it?
SQL is the main coding language used to interact with relational databases.

DS and DA may use it differently to DE, DE will create, update and delete tables whereas other roles may only retrieve and manipulate data.

What the Data Says
Part of the reason SQL is the top language (Chart 1) is because it is strongly associated with DA roles (Chart 2) and these are the most common roles scraped in this dataset (Titles Chart), however, it is still essential to success in other roles.

My Opinion

Data extraction is the first technical step in any analytics workflow, so it makes sense for SQL to be the most common requirement as it is the go-to language for this.

With the amount of data increasing in size, being able to use SQL effectively allows Data Analysts to more efficiently extract information.

Salary
Salaries are lower here than some of the other top 5 skills due to a strong association between DA roles and SQL. However, having proficiency in SQL is almost always necessary for all data roles and should not be ignored.

The exception might be management roles, Analytics Managers are typically more hands-off ; managing databases, workloads and strategy, rather than running SQL queries. This would also account for a lower median.

Learning
If you are looking to learn or upgrade your SQL skills, I have several articles on the topic.

What is it?
Python is one of the most widely used coding languages in 2021. For DS, it is used for building Machine Learning models and production-level pipelines. It can also be used for automation, building applications, web design and many other uses.

What the Data Says
Python has the strongest tie with DS/DE roles (Chart 2). Interestingly, the associated between Python and these roles is much stronger than SQL and DA roles.

47% of jobs with Python also have R, whereas 78% of jobs with R also have Python (Chart 3).

My Opinion
The Python/R debate will continue to rage on. I feel that Python is more widely used but many jobs will specify either Python or R. It is also common for people to learn another language for certain applications, once they already know the first.

I don’t have any data to support this but since I started as an Analyst several years ago, I’ve seen more DA jobs asking for Python as an extra skill, something I’m sure will continue as the technical barrier for Machine Learning decreases.

Salary
From the data, the mean salary for DA jobs without Python is £41,969. For DA jobs with Python, it is £45,025

Data Analysts with knowledge of Python can earn, on average, an extra £3,000.

Learning
If you’d like to learn Python, there are countless resources out there. The first course I did was Automate The Boring Stuff, which is a great introduction to some of the main concepts.

What is it?
Power BI is Microsoft’s flagship BI tool which allows users to connect to databases and build great-looking dashboards.

What the Data Says
Chart 1 shows us that the majority of jobs which include Power BI are DA roles.

Chart 3 shows us that Power BI is strongly associated with SQL, this is the classic combination of skills of a DA with 1–2 years experience.

My Opinion
I was personally surprised to see a difference between Power BI and Tableau, my assumption was they would be much closer or Tableau would be higher.

Generally, having skills in one BI tool is enough for hiring managers.

Salary
Due to its association with DA roles, Power BI’s salary range is slightly lower than Python. It is also unlikely for very senior roles to have Power BI listed as a skill because they won’t be building dashboard, which explains the smaller number of outliers for this skill.

Learning
Power BI does have some learning curve — DAX is used to build more complex views (eg time-based comparisons such as year to date).

Learning a visualisation tool is important for anyone looking to work in data. It demonstrates an ability to communicate your findings, which is essential.

A good way to learn this is to include a dashboard in any side projects you work on; either your Exploratory Data Analysis or with your final model. In fact, if you can answer a question only using a dashboard, without building a model, this can be attractive as it shows you can simplify problems.

4. R

What is it?
R is a similar coding language to Python that puts more emphasis on statistical computing and graphical representation.

What the Data Says
We can see in Chart 2 as R is more common in DS roles vs DE roles, whereas Python is common in both.

R may be included in DA roles because of statements such as “Python/R is a plus”, this is supported by Chart 3 which shows a strong association between Python and R.

My Opinion
I haven’t personally used R much but have spoken with DS who have.

R is typically used over Python because it provides much more statistical metrics when modelling. This means it can be more widely used for experiment design and scientific study.

I’ve also heard really good things about Shiny and ggplot, and data visualisation in R more generally.

Salary

R has a similar salary range to Python, however it is worth noting that there are less £100k+ roles which specify R. This could either be because R is less commonly used at manager level or because R may not be used as much in financial services, which is the industry for many of these 6 figure roles.

Learning
If you’re looking to go down the Product Analytics route (ie A/B testing, Hypothesis testing) I’d suggest learning R over Python. Either way, if you learn Python first, you’ll probably end up using R at some point!

Much like Python, there are many free and paid courses available.

What is it?
Coming in 5th is our first ‘softer’ technical skill. It’s hard to know exactly what this means here, is it knowledge of algorithms or application in Sklearn?

What the Data Says

Machine Learning has the highest median salary in the top 5 (£60k) and has a clear association with DS job titles. This makes sense given the current buzz around Data Science and Machine Learning.

My Opinion
Knowing how to apply models in SKlearn and understanding the stats behind them is essential however, the market is changing with Data Science becoming a much more mature field.

I’ve noticed a shift towards commercial application of Machine Learning; how do you apply it to a given problem, deployment, maintenance and algorithmic fairness.

Salary
ML commands the highest salary in our top 5 due to its association with DS (Chart 2).

Learning
There are a lot of ways to learn ML. I would suggest going down the typical route of following a tutorial for a classification and regression problem. Ensure you understand why we create train/test splits, standardise data and what the metrics are.

Creating an ML-based project is always an impressive way to show off your skills.

The highest median salaries in this dataset are associated with NLP and Spark (Chart 1).

NLP is undergoing a huge shift with the release of models such as GPT and BERT, with companies clamouring to better interpret text. It makes sense that this skill is offering high salaries given the buzz around it at the moment. These roles may also have a high technical requirement.

The high salary linked to Spark is interesting. I believe this supports my statements about Machine Learning — there is a great focus at the moment on productionisation and training at scale.

Regression Model

Finding this confusing? I have trained a regression model on this data which you can play with!

This model will take a job title, location and description as input. It will output a predicted salary best on this information.

Take a look at the model here

Photo by Jp Valery on Unsplash

Conclusion

The popularity of Data Science has exploded over the last 5 years and we are now seeing knock-on effects to other data-related roles as companies race to generate value from machine learning.

This dataset provides some insight into the salaries that can be offered for this type of work. It is important to remember the potential bias due to the prevalence of recruitment agencies in this dataset.

This dataset reveals information that we pretty much already know — SQL is essential for data roles, Python is supreme for Machine Learning and there aren’t many entry-level Data Scientist roles. This also shows similarities between the UK and USA markets.

In my next article, we’ll explore some other common words, as well as just as valuable soft skills such as teamwork and communication.

If you’ve explored some of the data yourself and have your own conclusions, let me know!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store