Yellow Pages analyzes big data 10-20 times faster with Tableau, Cloudera, and AtScale


Yellow Pages Canada offers information on local businesses, products, and services. The company works with billions of rows of data—so big that they collected 57 billion rows within just 25 months. The enterprise data team at Yellow Pages stores this data in a Cloudera Hadoop data lake — and then analyzes the data in Tableau with help from Tableau and Cloudera Partner, AtScale. Today, the team is analyzing live data—fast—with Tableau Desktop and then sharing insights across the enterprise by publishing to either Tableau Server or Tableau Cloud. In this video, Richard Langlois, Enterprise Data Management Director speaks about his experience running Tableau and AtScale on top of Hadoop. Today, the team can analyze big data 10 to 20 times faster leading to better, faster decision making.


Tableau: How big is the database that you’re working with? Richard Langlois, Enterprise Data Management Director: The size of the database we're using, one table we have 57 billion rows. This is only for 25 months of data. So we do a rolling 25. So if you want to do all that on Hadoop—even if you have something like Hadoop—it's fast, but it might not be fast enough. Tableau: Can you talk about your experience with AtScale and Tableau? Richard: Actually, to connect Tableau to AtScale, it's like connecting to any other sources. And Tableau is quite good at connecting to a lot of different sources. And the job of that AtScale is to intercept the query that Tableau does, create, aggregate behind the scene, and rewrite the SQL that Tableau generates so that the user will never know. Now, but when you're trying to do analysis and you don't want to slow people's line of thought, right? They have something they're trying to solve. So if they can have an answer back much faster it's great.

We got anything from 10 to about 20 times faster. So on Hadoop, Impala, that's the recipe—10 to 20 times faster using AtScale. So Tableau now just runs super fast on our implementation. Everybody is faster.

Tableau: How much time have you saved? Richard: And when we did that, we got anything from 10 to about 20 times faster. So on Hadoop, Impala—that's the recipe—10 to 20 times faster using AtScale. So Tableau now just runs super fast on our implementation. Everybody is faster with that one. So by adding the AtScale pieces to it, the aggregates could give you a 10 times’ improvement. So that's the point—you're getting ten times faster, so the two minutes, one actually. In fact, it's even more. And so it's extremely responsive and we have been doing a couple of events because we're so happy with the result. Tableau: Can you tell us about your Tableau deployment? Richard: We're playing with different types of Tableau licenses. We have Tableau Cloud, we have Tableau Desktop, and we have Tableau Server. So we're working on the Server side for our on-premise Hadoop implementation. That's one of the thing we do...So for stuff that marketing produces, they produce an extract, put it (in) Online, and then the users can connect to it.