By Marco Lagi | October 1, 2016
Maybe you want to change careers and you’ve heard that lots of people are hiring front-end engineers, data scientists, and mobile developers. You’re tempted to try Coursera or Codecademy because you really want to get into this coding thing that’s in such high demand. Salaries are high and every company seems to need “hackers”.
But the big question remains: what technology should you learn? In the end, companies look for people with specific capabilities, and you want to maximize the probability of being hired. Wouldn’t it be awesome if you could just ask Siri?
Yeah, thanks Siri. Problem is, even if she understood the question, she wouldn’t have the data to answer it. But Kemvi does.
Over the last few years, we have developed technology that let us extract information off the web in a structured and queryable form about people and things and companies. All this information is collected in our Knowledge Graph, a semantic database of entities linked to each other by their relationships.
A fraction of this massive database is fed by job postings. We collect more than 400,000 unique job postings per week. From these we extract the hiring company, the location, the first date the job was posted, its title and description.
But our information extraction goes deeper. We also reconcile more than 100,000 different requirements, concepts and topics with our Knowledge Graph. The requirements span from “MS Office” to “marketing automation”, from “Kubernetes” to “100-ton vessel license”. Each of these entities belongs to a specific category, so, for example, we can easily separate technology requirements from transportation requirements.
And that’s how this:
Chevron Phillips Chemical Company LP (CPChem) is seeking a full-time Laboratory Technician for work in the Plastics Technical Center in Bartlesville, OK. The PIC is engaged in routine and non-routine testing activities in a number of different areas related to plastics/polymers. This position…
is transformed into this:
Once you have this structured database, the question “What should I learn today?” really becomes “What set of technologies is in highest demand, and where?”. So let’s look at the data.
What and where?
Here are the programming languages and frameworks listed as requirements for US job postings over the last month (August 2015) that have at least 5% of the tech job market share:
About 6% of the job posts in the analysis contain at least one of the requirements above. The numbers next to the bars represent the probability that a requirement appears in a technical job post.
As a comparison, nurses and transportation account for 13% and 12% of the total, respectively, which tells us that despite all of the press about demand for data scientists, there’s much higher demand for nurses. The perceived data scientist deficit is likely due to a larger supply-demand mismatch.
So, should you learn SQL? Well, yes, but it’s not enough to learn SQL alone. More often than not (80% of the time), it appears together with other requirements. If two or more technologies frequently co-occur in the same posts, they are often part of a required stack, i.e. a particular set of skills, skills I have acquired over a very long career… sorry. Therefore we can get a better representation of the current job market by plotting a directed graph of the probability of co-occurence,
In this network, each node is a technology, with size proportional to the logarithm of its popularity. Edges are drawn with more than 50 shades of grey, according to the probability
P they appear in the same post (we don’t draw the arrows for clarity). By running a greedy agglomerative hierarchical clustering algorithm on this network, we can revise the list above and extract the set of technologies, or stacks, that are most requested on the US job market:
- C, C++
- C#, .NET, ASP.NET, SQL
- Python, Perl, Ruby, Java
- iOS, Android, Objective-C, Java
- Hadoop, MongoDB, NoSQL, Java, SQL
(Footnote: In order to allow for a technology to show up in more than one cluster, we added multiple copies of that node until convergence.)
Without any input from the outside, you can see the emergence of professional profiles! Group 3 is the front-end developer, 5 the full-stack developer, 6 could be DevOps, 7 the mobile hacker, 8 the MEAN stack guy, 9 the data scientist. The C family is fairly independent, while SQL and Java are ubiquitous server-side.
So one successful strategy might be to choose one of these groups and learn the crap out of all its components.
At this point, I can hear you say “OK Kemvi, I’ve made up my mind! I want to be a mobile developer and learn Objective-C and Java. Where should I look for jobs?”.
Let’s take the location of job postings with these two requirements over the last 6 months, and reconcile it with lat/long pairs. Each dot is proportional to the number of jobs in that city:
So… hope you like coffee, drizzle, and grunge, because it looks like you’re going to Seattle!
The value trapped in open unstructured data like job postings, press releases, patents, company websites, conference calls and so on is enormous. Humans know this intuitively: when you want an answer to any question, without thinking, you Google it and read unstructured results to retrieve an answer.
Kemvi is focused on extracting more of that knowledge and using it to help businesses make better decisions. It can help answer questions like “why do we lose deals?” Or “how well are my competitors doing?” “How do I increase my company’s revenue?”
Over the next 5 years, AI will help us unleash this value to fuel the next generation of business intelligence.