Making Big Data Small

The Scale and Possibilities of Web Analytics

Big Data. It's a world of incredible possibilities, previously unmanageable quantities of data, delivering insight that was unfathomable just a few years ago. It's producing meaningful one-pagers from petabytes of transaction records, it's data scientists such as Antony Green producing increasingly accurate election analytics that give us results after about half an hour of vote-counting. It's harvesting public opinion and brand perception through social media, it's predicting vulnerability to diseases and delivering personalised medical treatment through DNA analysis. It's an elusive target that will realise your wildest dreams and fulfil your analytical fantasies. You might not know what it is, but you probably know that if you could just harness big data, you'd surely triple your sales, and double the size of your business.



Big Data is so hot right now! (in India, South Korea, Singapore, the United States and Australia)

If that sounds confusing and ambiguous, I think we've made a good start in getting a sense of 'big data' as it exists today. While everyone is talking about big data, there is very little clarity in exactly what 'big data' means. While the term is associated with cases where incredibly succinct insight has been produced from immense volumes of data, these genuine cases  are shrouded by a dense fog of much less meaningful 'big data' hype. This billowing 'hype cloud' makes it very easy for those of us who work with, or rely on data, to treat 'big data' claims with a grain of salt, along with a healthy dose of skepticism and cynicism.

Google Analytics: Big Data for the Rest of Us

Now that we're amongst the hype of big data, with a precarious balance of skepticism and intrigue, I'm going to blow your mind. I’m going to put it to you that you're already using big data. Whoa!

Way back in 2006, a Google white paper on Big Table showed us that the Google Analytics data set was over 200 terabytes, at that point, that was a quarter the size of the crawl database that powers search results. We can only guess how big it is now, but it's safe to say that it is a data set that can be safely labelled as 'big data'. Some of our larger clients log tens of millions of pageviews each month, with extra data logged for events, custom variables and roll-up accounts, this could easily produce 100 million records on a monthly basis. A few years’ worth of Google Analytics data places us well into the realms of billions of records, one of the oft-cited criteria for authoritative use of the term 'big data'.

If we use definitions that are not based on volume but on speed of analysis or variety of data types, again, Google Analytics checks a lot of the boxes. We can look at how users are interacting with our site in real-time, and with advanced configuration, we can tie the length of time that a user watched a video to the traffic source or specific marketing channel that brought them to the site, and even see how visitors have arrived at our site on their multiple visits across a two-week research period that led to a sale or enquiry.



If your site is amongst the bulk of the sites on the internet that use Google Analytics, you are tapping into one of the single largest data sources in the world. And, if you're logging tens of millions of pageviews each month, you are conducting analysis on an immense volume of data. But this is not what you see when you use the tool. Rather, you see concise charts, dashboards and overviews produced quickly through use of data sampling. Standard reports render billions of records in a way that is understandable and most importantly, meaningful and actionable, for non-technical marketing professionals. For those with advanced knowledge of Google Analytics, custom reports enables detailed analysis of these billions of records which is presented in the same highly legible, visual formats. Google Analytics quietly does what so many people cite as the goal of big data analysis, it makes big data small, manageable and legible.

But I Love Unmanageable Data! How to Make it Big Again

For those of us who are suckers for punishment, or who genuinely find themselves wanting to conduct analysis on Google Analytics data that is not possible with standard or custom reports, the possibility of dealing directly with unsampled data has recently arrived. Google Analytics Premium supports a greater number of custom variables and opens up the opportunity for much richer web analytics data. Where the standard version supports report exports with up to 50 thousand records, the premium version supports up to two million. If you're routinely exporting reports of this size, you'll quickly be moving into the realms of the unmanageable – or at least into the realms of needing a database server of considerable capacity to handle your analysis. That might be an SQL server, or you might join (data geeks will have to mind the pun!) the ranks of Google and Twitter in storing your records in a NOSQL database like MongoDB or Google's own Big Query.

You might want to download your raw unsampled data for applications that are not possible with standard or custom reports such as:

  • Quickly producing highly-specific reports such as "non-landing pages ordered by pageviews", which are not available in the standard GA interface;
  • Analysing the location and online behaviour of visitors who purchased specific combinations of products;
  • Finally doing all those things you wanted to do every time that you were reminded that the 'customise' button isn't available for the 'social sources' report;
  • Or aggregating and segmenting data from multiple Google Analytics profiles – maybe to group your brands, or define a broad product area – and then analysing that data in whatever way you see fit.

Once you extract and store data in this way, analysis is unrestricted, limited only by your technical ability (ok, maybe budget and infrastructure too). However, you then lose the simplicity of Google Analytics in making the data manageable. While you’ll now be able to produce that report that links social traffic to search marketing, you'll need to build your own analytical tools, write your own queries and develop your own visualisations, or invest in visual analytics tools like Tableau. All of this to go through that process that seems so easy with the Google Analytics interface. That is, making sense of messy data, reducing millions or even billions of records into manageable, concise analysis and making your big data small again.