How to grow from a junior after school into a seasoned data hacker, and why we at Dateio are dedicated to developing newcomers.
Finding ready-made data seniors on the market is a nearly impossible task. Plus, the problems we solve at TapiX (one of our data products) are unique, and require a specific experience and mindset. That’s why we recruit newcomers and dedicate ourselves to their development.
Therefore, each junior newbie starts their first couple weeks with manual payment transaction enrichment to get a good “feel” for the data and processes they’ll be automating over time.
We have a training session where we discuss the areas we are working on in detail. These are various methods of automation and data consistency checks. This part of onboarding is crucial for us to ensure the new crew experiences all the edge cases in this complex data. This will save them time and work in the long run.
After the onboarding process, the junior starts on smaller tasks, tweaking various existing scripts to familiarize themselves with the database and the typical algorithmic procedures and database tables we use.
These first steps are followed by increasingly complex tasks that teach the newcomer to break the work down into sub-steps while learning more about the technology involved.
We’re looking for people with a basic understanding of SQL and Python and business sense. For graduates, we can turn a blind eye to basic knowledge if we see that they have common sense and you can sense energy and potential.
We generally have experience with two types of juniors – those from engineering schools who excel in Python, SQL, and other technologies but have a problem in a business sense and those from business schools who have the exact opposite.
In Python, we primarily use the Pandas, Numpy and sqlalchemy libraries for database work and analysis. Also, folium and google places for visualizing data on a map, and Django. In SQL, newbies will work through a cross-section of work, from simple queries to writing functions, and procedures, and optimizing code.
In terms of technology, in addition to the above two, we use Jenkins for automation and Power BI for reporting and data visualization. We are developing a web-based environment in Django to annotate individual consistency checks, including those written using ML. While ML is not the core of our work, it’s an area we are constantly developing and one that a junior can also look into. We often use standard statistical concepts spiced with common sense.
For those interested in working at Dateio, we recommend to brush up on the basics of statistics and know the difference between median, mean and mode 😊
We have several PostgreSQL databases, each containing slightly different information, so you can learn how to work with data across databases, how to transport it using psql and so on. Another specificity of our work is that apart from structured data, we also handle transactional open data such as data retrieved through PSD2, hacking data using regex and parsing it. Data cleansing awaits those who have already moved up from junior to mediator.
We are building our database by automated drawing from a lot of different sources. During this automation, our data scientists will learn about connecting dtb to different APIs like Google Places API, and they will also learn how to scrape data using the requests and Beautiful Soup libraries.
Dateio is growing rapidly and our employees are with it. That’s why they not only get to do more complex work, but we also expect them to mentor younger colleagues. They can also look forward to exciting collaborations with other departments – for example, with our sales team or with the second data team in charge of discount targeting.
Seniors can expect to work independently on larger projects. For example, we are currently working on calculating the CO2 footprint of individual purchases.
What does such a project consist of?
First, we need to study the methodology and find out where to pull the individual input data from and what patterns to use in the calculation algorithm, etc. We’ll connect to the API, which we’ll automate through Jenkins, and work with the backend programmers to extend the TapiX API with another helpful endpoint. This API is used by hundreds of thousands of clients every day. The final icing on the cake is helping our production and markup people describe this new feature 😊
If this sounds good to you, check out the open positions in the data team on our careers page or email firstname.lastname@example.org