Last 26th of September I had the chance to attend the Fall edition of the Hadoop Conference Japan. The conference is organized two times per year, Spring and Fall editions, in Tokyo. The Hadoop User Group Japan is the responsible of organizing the event.

The conference was a one-day event. The morning, till lunch break, consisted of a common track with keynotes from members of Cloudera, Hortonworks and MapR. After that, the event was split in two tracks: one technical and another one called community (real cases of applying Hadoop in the enterprise), with talks either in Japanese and English. According to the Hadoop User Group Japan, there were more than 1000 registrations, something impressive considering that the whole conference was just about the Hadoop ecosystem. Most of the attendants were Japanese, which shows the growing interest of Japanese companies like NTT Data, Mixi or Rakuten in technologies for managing and analysing big amounts of data. Certainly, Big Data technologies have gone mainstream in the last years.

Hadoop, the small yellow elephant, was everywhere :)

Hadoop, the little yellow elephant, was everywhere

The venue, Hotel Bellesalle Shiodome (Shinbashi), was very well located, near from Tokyo station. The facilities were spacious and convenient, with free drinks and meals ready to enjoy at any time. Everything worked like a charm.

About to start the Hadoop Conference Japan 2011 - Fall edition

About to start the Hadoop Conference Japan 2011 - Fall

The morning started with an introduction from the Hadoop User Group Japan, and a welcome message from Recruit, one of the sponsors and also an user of Hadoop. After that, Todd Lipcon from Cloudera proceeded with the first keynote.

“The role of the distribution in the Apache Hadoop Ecosystem” by Todd Lipcon (Cloudera).

Todd gave a small introduction to the Hadoop and the need for Big Data. In that regard, Todd presented CDH, the distribution based on Hadoop technologies (HBase, Hive, Pig, ZooKeeper, etc), that Cloudera commercialises, and that basically represents their main business line, being others support and training. Todd explained why using a distribution is the best way of starting with Hadoop.

“About Hortonworks” by Owen O’Malley (Hortonworks).

If you have never heard of Hortonworks don’t worry, they are a new company in the Hadoop ecosystem (founded in July 2011), although they are not new to Hadoop. Hortoworks is a company founded by more than 20 developers who mostly worked on Hadoop at Yahoo!, in fact the company is backed economically from Yahoo!, and as Owen explained in the talk, they still got Yahoo! as client. Although the company is new, Hortonworks is the company with the highest number of Hadoop committers, which gives a good of idea of their expertise. In the talk, Owen explained what are in his opinion the areas were the company can offer additional value to Hadoop (API integration with other tools, easier to use administration systems, etc), and which probably will become Horton main business lines.

“How Hadoop needs to evolve and integrate into the enterprise” by Ted Dunning (MapR technologies).

I liked Dunning, he is at the same time a business man and a hacker. He was wearing a red cap, when the hat was on, it was the business man who talked, when he took his hat off, was the engineer who talked. He gave a brief view of what are the additional tools that MapR is currently offering for making Hadoop better in the enterprise. Although these tools are not free software, they can be downloaded and tested for free. Ted mentioned that Recruit, one of the sponsors, is a long-time client of MapR. Interestingly all the three companies mentioned above have or are going to have (in the case of Hortonworks) presence in the Japanese market, even having offices in Japan, in the case of Cloudera for instance.

After a nice shrimp sandwich (plus salad and french-fries) and a refreshing glass of Oolong tea, I came back with new energies for the talks of the afternoon.  I attended the following talks:

“Apache HBase: An introduction” by Todd Lipcon (Cloudera).

Todd continued in the afternoon with a more technical talk. HBase is to Hadoop what BigTable is to GFS (Google File System), or to put it in one sentence, “HBase is an open source, distributed, sorted map datastore modeled after Google’s BigTable”. In the same way as BigTable, HBase fits in the category of columnar database (same as Cassandra or HyperTable). Considering the CAP diagram, HBase puts emphasis on consistency of data. Todd explained more about the way of operating with HBase, when to use it, when to not, languages and APIs compatible with HBase, etc. Very interesting and well explained talk. The talk finished with a reference to links and resources on the web, as well as a recommendation of the recently published book “HBase: The Definitive Guide” by Lars George, a co-worker of Todd at Cloudera.

“Architectural details and implications of MapR technology” by Ted Dunning (MapR technologies).

Very technical talk from Ted. It basically consisted on an explanation of the improvements MapR has done to Hadoop to make it better. Some of these improvements have been shared with the Hadoop community, others are part of the products MapR offers as company.

“The history and the future of Hadoop use case at Rakuten” by Terje Marthinussen (Rakuten).

Although mostly unknown in the Western world (at least for me :P), Rakuten is the 2nd largest website in Japan, and one of the tenth largest websites worldwide. A kind of Japanese version of Amazon. Terje’s talk was part of the community track, formed by companies sharing their experiences using Hadoop in production environments.

According to Terje, Rakuten has recently built a new search engine which needed to take care of 1.4 billion documents. They needed capacity for indexing 10k docs/sec and do searching for more than 400 queries per second. One of the challenges in this project was collecting and moving huge amounts of data. For that task, they chose Flume, a Hadoop-ecosystem tool initially built by Cloudera (an Apache incubator project now). Terje exposed the difficulties and frustrations of working with systems like HBase and Cassandra, projects which are still under heavy development, but at last the task was done and other way of doing seemed impossible. Rakuten has modified Flume extensively, and has recently sent those modifications to the main Flame trunk, hoping to become part of the core. Very interesting talk.

“Hadoop 0.23 and MapReduce v2″ by Owen O’Malley (Hortonworks)

Owen gave a brief view of what is new in the upcoming 0.23 version of Hadoop, as well as the new gaps that MapReduce v2 will try to cover. Here is a similar talk, also by Hortonworks: “Apache Hadoop 0.23″.

In addition, I also attended two other talks:

  • “Hadoop for enterprise batch processing” by Takashi Kambayashi (Nautilus technologies)
  • “Processing big data with MapReduce at Yahoo! Japan” by Naoyuki Kakuda & Issei Yoshida (Yahoo! Japan)

Although my limited knowledge of Japanese was not enough to follow them :)

Summarizing, it was definitively worth attending. Apart from the interesting technical talks, the event was a good insight in the Hadoop world and the ecosystem of companies around it. It has also provided me with an insight of the status of big data technologies in the Japanese market. Certainly I never expected so many attendants.

A mention apart deserves the Hadoop User Group Japan and all the staff members for their hard work and attention. The event was one of the best organized events I have ever been to, and if I could suggest something to improve for future editions I would say that I missed a free wireless spot with free internet connection, which could be used to tweet the event, for instance. But maybe I was one of the few attendants who didn’t have a 3G internet connection in their phone:) All in all, I would give 9/10 points to the organization, awesome!

Lastly, I could not finish this post without mentioning one of another great things we got here at Igalia which is the personal training budget. It basically allows every Igalian to take courses or attend conferences of their own interest, and that may mean sometimes to take a bullet-train to Tokyo and spent there 2 nights 😉

More information: