Written by thoughtbot

Analyzing GitHub Trends

It seems like language popularity is constantly fluctuating in the tech world. I tried to prove this by visualizing GitHub pushes and looking for trends.

The GitHub Archive is a project that records forks, watches, pushes, pulls, and various other events on public GitHub repositories and archives it for public use. With this data, we can look at the programming community’s history and trends. The GitHub Archive only has push data since March 2012, so I only looked for trends during the past 18 months.

I first queried daily PushEvent counts on GitHub Archive's Google BigQuery Database*.

SELECT STRFTIME_UTC_USEC(TIMESTAMP(repository_pushed_at), "%Y-%m-%d") as days,
count(repository_language) as lang_count
FROM [githubarchive:github.timeline]
WHERE type="PushEvent" AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-03-01 00:00:00')
GROUP BY days, repository_language
ORDER BY repository_language DESC;

The original goal was to visualize daily push counts, but after I drew the first version of the graphs, I noticed that daily counts varied too much between days. This caused a messy, jagged effect in the graphs.

v1 of this graph

People push to public repos on GitHub much less on weekends than they do on the weekdays. So I smoothed out the graph by summing up the counts over the course of a week.

After sampling a few languages and plotting the results with sparklines, there were a few things I noticed.

On the week of August 25th, 2012, there was a huge spike in PushEvents for repositories in all the languages. I tried to look at errors on GitHub's status page, but anyone who tries to look at the status messages around that time will get a 500 error.

The lowest weekly push count for JavaScript, CoffeeScript, Ruby, Python, PHP, and Java was right around Christmas and New Year's. It makes sense since people tend to spend more time celebrating the holidays than programming, but this dip was especially visible in the more popular languages. JavaScript, Ruby, Python, Java, and PHP were the top 6 popular languages by repositories created for 2012 and 2013.

The popularity of Go and Elixir in 2013 is interesting to see. Elixir's popularity seemed to wane a little after this past summer, but pushes for Go seem to be steadily increasing.

I only analyzed popular languages I was curious about. There are probably a number of trends I missed by just looking at such a small subset of languages. If you're wondering about the GitHub activity of your favorite language, you can dig into the GitHub Archive yourself. You can also see how I visualized this data.

*For those trying it at home:

  • Go to the BigQuery Sign Up Page.
  • Follow the instructions to set up Google BigQuery
  • Make sure you enable billing. The first 100GB of data you process is free.
  • Go to the BigQuery page
  • Click on the tiny arrow next to your Google Cloud project name.
  • Click "Switch to Project"
  • Click "Display project…"
  • Type in "githubarchive"
  • Click the "Compose query" button just under the logo and paste the sample query above.