How to be 10x more productive than the average data scientist

How To

June 18, 2020

Discover what it takes to become a better data scientist and how we can support you each step of the way.

Being more productive than your super competitive peer group is hard. Being 10 times more productive might sound like an impossibility, an exaggeration.... or even a myth (unicorn, you say?).

A 10x data scientist is literally 10 times more productive than the average data scientist. The skillsets of these data scientists create better career opportunities, higher peer recognition, and more interesting projects to work on.

But the disproportionate gap between super performers and average Joes is one that is well known within the data community. Let’s explore how you can stand out from the crowd and become a top performer.

‍

The 3 pillars of data science

With the proliferation of data production, the increase in compute capacity and the competitive edge that data offers for innovation, data science arose to capitalize on the novel opportunity.

The core of data science consists of three pillars: math, computer science, and domain knowledge. A data scientist takes data (math and statistics), molds it at previously unimaginable speeds and with tailored techniques (computer science) to address business challenges which were once insurmountable (domain expertise).

‍

But that seems like a wide surface area to cover. One can quickly spread themselves thin with their efforts to master all three pillars. So, by the pure interdisciplinary nature of data science, can one ever be a master - a 10x-er - or is one doomed to be a Jack of all trades?

Or as the famous quote puts it:

“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”

The truth is, real mastery is rare. So rare, in fact, that it’s highly unlikely that anyone can reach the top of a single discipline, let alone three.

This is also recognized by the industry hiring choices and organizational structure for data scientists. The best players in the data science world acknowledge that all three areas are necessary for data science, but that a person will mainly specialize in one area:

Airbnb divides their data scientists into Analytics (focus on domain expertise), Inference (focus on statistics), and Algorithms (focus on computer science).
Stitch Fix thinks of data science as a distinction between an analyst (math and domain knowledge) and a builder (computer science and domain knowledge).
Google has multiple career tracks for data scientists, from software engineering (computer science) and quantitative analyst (math), to data scientists in Sales Ops, Marketing and People Ops (domain expertise).

The industry giants hire some of the best players - the 10x-ers - in the field. So, to 10x your data science skills, you do not need to master all three areas. Nonetheless, you do need to become a data science expert.

The path to becoming a 10x data scientist

To improve your data science skills significantly compared to your peer group, you have to follow a certain path.

1. Solidify the foundational skills

The foundations are necessary for understanding how to deliver your work as a data scientist. Review your linear algebra, statistical inference, calculus, algorithms, programming design patterns, and get knowledgeable about the domain that you are operating in.

Unless you are working as a machine-learning engineer, there is no need to overindulge in the theory behind it. For example, you need to know that not setting your max depth on decision trees can cause overfitting, but you do not need to know how to implement a different version of C4.5 as the algorithm behind the decision tree.

Solidify the foundations by putting it into practice. Start up that Jupyter Notebook and tackle data science projects. You need to develop enough muscle memory to know how to call the relevant libraries, set up a classifier or regression model and optimize it (the hyperparameters are not going to tune themselves) without turning to Google for help at each step.

But once you’ve done your Kaggle exercises, it’s time to move on from guided learning. The tutorials, books, and lectures are a good companion on your path to 10x, but you need to carve your own way if you ever wish to be the leader in your peer group, not the follower.

2. Practice deliberately

What sets apart an expert (any expert, not just a data science expert) from a novice is the amount of practice that the individual put into their craftsmanship.

“The master has failed more times than the beginner has even tried” - Stephen McCranie

This idea has been popularized in Malcolm Gladwell’s book Outliers: The Story of Success. Gladwell gives us the magic number of 10,000 hours. Every outstanding performer, from the Beatles to hockey players, puts in 10,000 hours of sweat before eventually achieving greatness.

But practice in and of itself is insufficient. As we saw from the 10x developer research, programmers with comparable mileage can either be 10x-ers or average Joes, and they all had the same seven years of experience (7 years x 250 work days x 8 hours = 14,000 hours). Instead, what we need is to practice deliberately.

Deliberate practice involves consciously guiding your exercises to fill the gaps in your skillset. As a process, deliberate practice involves repeating these steps:

Deconstruct your target skills. Break down the desired skillset into component parts and analyze which ones you already have and which ones are missing.
Set stretch goals in alignment with the gaps you want to fill. Identify the goals which are just slightly out of your reach and push yourself to achieve them. Unlike ordinary goals, where you might just set a 1-hour coding session per night, stretch goals make you exert yourself and move out of your comfort zone to reach them. However, do not be overly optimistic - unreachable goals are highly demotivating. Align the stretch goals with the gaps in your skillset that you identified in the deconstruction phase.
Perform with focus. Concentrate on the skill that you are practicing by giving it your full attention. There is no substitute for putting the sweat in. Do not practice absentmindedly; our brains remember better when we are fully attentive to what we are studying. So, do not multitask or be carried away by distractions. Instead, devote your entire conscious effort to practicing the task at hand.
Analyze performance. This is one of the most crucial steps of deliberate practice: feedback. Analyze what worked and identify the hurdles that are impeding your performance. If possible, acquire feedback from others, who might spot something that you haven’t noticed. Recognizing the barriers helps you to become aware of what is hindering your growth.
Adjust. Alter the way in which you perform your skill based on the feedback. Unless you remove the blockers on your road, you are not going to overcome them by ignoring them or being unaware of them.
Repeat. Repetition makes your performance stick because it leads to improved retention. Repeating not just the performance, but the entire cycle of deliberate practice, will set you on the road towards constant improvement.

In a nutshell, deliberate practice means that you’ll be able to code out a solution to a data science problem. Then, with a fresh set of eyes, you will look at your solution, critique it, and recode it to improve.

3. Specialize

Being a specialist means that you dig deeper than the majority of your peers in the field.

Specialists come in many different forms. You could specialize in the family of clustering algorithms for insight discovery, high-performance computing for programmatically speeding up machine learning algorithms, or be a domain expert for fraud detection approaches in banking transactions (among many others).

No matter what area you specialize in, being a specialist results in higher paychecks, more interesting work problems (often at the frontier of what is currently known), and a seat at the 10x table.

Specialization automatically makes you more productive than your peers in the specific field of your expertise. But it does come at a cost. Every hour that you devote to deliberately developing your area of expertise takes time away from practicing other areas, which is why you also need to lateralize.

4. Lateralize

Lateralization means acquiring the skills that are adjacent to your current know-how. A good way to think about lateralization is to look at your current specialty in comparison to other adjacent areas:

What are the benefits of lateralization?

Higher independence. A lot of our work is dependent on others. For example, if a data scientist wants to create a model which predicts customer satisfaction, they will first need the data. This often requires the work of a data engineer who can collect, clean, and save the data into a form, which the data scientist can then use. Knowing how to set up your own ETL pipeline (usually in the domain of data engineers), can help you to deliver products on your own, without relying on other people.
Smoother collaboration. Being knowledgeable in different areas of work also means understanding the various constraints and challenges that are experienced in those areas. This smoothes communication and adjusts expectations when working with experts from different specialties.
Strategic overview. Understanding how data processes flow through different stages (and roles) gives you strategic insight into how to run those processes, making you better suited for strategic roles such as management.
Work flexibility. Having multiple skills under your belt allows you to switch work more easily between different areas, both within your current job and across different employers.

The end goal of specialization and lateralization is to acquire a T-shaped skill profile to capitalize on the benefits of both. Concentrate on your areas of specialty (the vertical line in T) while keeping a wider overview of your field of work (the horizontal line in T).

Practice data science in Keboola Connection

Keboola was designed to set data practitioners on the path to success. The data operations platform automates the entire data process, from raw data collection to machine learning insights.

Doing data science within Keboola helps you to reach your stretch goals faster with its devoted features:

Keboola comes fully equipped with data science notebooks, including Jupyter Notebooks. As the cornerstone of any data scientist’s toolbox, Jupyter Notebooks allow you to clean, transform, model, simulate, and visualize from the comfort of your browser.
Build ETL pipelines with a couple of clicks. The entire ETL pipeline can be deployed in just a couple of clicks, giving you the freedom to work with new data (not just precleaned Kaggle datasets) without wasting time on writing your own ETL scripts.
Fully customizable. If you want to lateralize and dig deeper into data engineering, Keboola is fully customizable. You can adjust extractors, writers and transformers to your specific needs and experiment with engineering your own data pipeline.
Code and data versioning. All notebooks and data buckets are versioned. This allows you to speed up experimentation and automate repetitive tasks (e.g. rerun only the data extraction pipeline to fit existing models to new data) or analyze your previous work for improvement without overriding it when deliberately practising.
Collaborate. Jupyter Notebooks are shareable with others in a password-secure way. But Keboola takes it a notch further. You can also share your source data with Data Catalogs. This allows for more extensive collaboration by giving others the raw materials for building your projects.

‍

Start practicing data science with Keboola. The first two weeks are on us.

‍

Guide to first steps with Keboola

STEP 1: SETUP YOUR KEBOOLA PLATFORM

Getting started with Keboola is easy:

Create a free account

The first time you log into your free account, the platform will guide you through constructing your own ETL pipeline: from raw data to cleaned and analysis-ready data. Just follow the Guide Mode.

Pro tip: If you don’t have any extractors (raw data) which you can load in LESSON 2, pick a CSV file from a dataset in Kaggle. Below, you can see us loading the famous Iris dataset (you thought it would be the Titanic, didn’t you?).

STEP 2: FIRE UP YOUR JUPYTER NOTEBOOK

Spinning up your Jupyter Notebook to start coding your data science project is as simple as creating a sandbox environment:

Within the platform, go to Transformations and click on the button “+ NEW BUCKET” to create a data bucket. Here, the data you loaded when setting up your ETL pipeline (e.g. during Guide Mode) is accessible.

2. If you want, you can add additional transformations before you start coding:

3. Click on “Sandbox” in the right-hand panel.

4. An overlay will appear, where you can confirm that you want to create a Sandbox environment and load the data into it.

5. A new password-protected Sandbox will be created. Click on “Connect” to run it. A Jupyter Notebook will open in a new browser window, which is accessible using the password from the previous step.

6. Voila, you’re ready to start coding!

STEP 3: SHARE YOUR WORK WITH OTHERS

Sharing your work with others is as simple as copy and pasting the URL for your Jupyter Notebook. As long as the notebook is running, people can access it (provided they have a password).

Start practicing data science with Keboola. The first two weeks are on us.

#getsmarter

Oops! Something went wrong while submitting the form.

Join our newsletter

How to be 10x more productive than the average data scientist

The 3 pillars of data science