Volume, velocity, variety: Pat Martin and Watson
There can’t be many Queen’s professors who get to work with the most notable champ ever from TV’s long-running Jeopardy! quiz show.
Soon, that honour is going to fall to Professor Emeritus Patrick Martin of the Queen’s School of Computing, who along with business professor Brent Gallupe, is being given the chance to use IBM’s Watson cognitive computing system as an integral part of the department’s CISC 490 course, Deep Analytics using Watson. Touted for its prowess in handling large volumes of disparate data and analyzing them in novel and complex ways, Watson is probably best known for defeating two human competitors on the popular general knowledge game show in February 2011.
The founder and long-time head of the Queen’s Database Systems Laboratory, and a visiting scientist with IBM’s Centre for Advanced Studies, Martin’s specialty is data – and how computers deal with it. “I always tell my students that everything is a data problem.” For the past few years, their research has focused on improving DB2, IBM’s relational database product. (A relational database represents data as a set of related tables that can be manipulated in powerful ways.)
Today, increasingly, his lab’s emphasis is on what is called “big data,” and connected to that, “deep analysis.”
What is big data? “That is always the question,” Martin replies patiently. “I guess the standard definition is the three v’s – volume, velocity and variety.”
Simply, you have a lot of data and it’s coming at you quickly and, perhaps most importantly, the data are wildly varied – some might be structured in relational databases, but could be almost anything – images, sound recordings, “emails, tweets, all kinds of things.” These are data, says Martin, “that can’t be processed efficiently using our standard tools and algorithms.”
Being able to manage and analyze big data can have incredible payoffs. “One application I am particularly interested in,” says Martin, is neurological. The Ontario Brain Institute (OBI) is building a database called Brain-CODE here at Queen’s, and they are integrating data from a lot of different research groups. Each of these groups has their own approaches and types of data. OBI is hoping that if they can give researchers an analytics platform they may be able to combine these data in new and different ways to solve their problems. So if there is one group that uses MRIs, and if they combine that with data from people who are working on the cellular level, they may be able to find things they couldn’t otherwise find without that level of integration.
Watson (named for the founder of IBM) represents a big leap forward in the handling of big data and deep analytics. Capable of understanding natural languages, it is, says Martin, “a whole set of programs in machine learning, data mining and statistics, and all these algorithms on top of a very large parallel machine, accessed through the IBM cloud.” The Queen’s students will be developing business applications on top of Watson. That requires creating a body of knowledge for it, called a corpus – all documents and data relevant to your potential application, which Watson will process and then index. “And then you have to train it by asking it questions and then rating its answers.” Watson tries to figure out what the question means and then comes up with a whole series of hypotheses and tries to pick the best one. “It does this by running all these different algorithms. It may process hypotheses in different ways, key words or what it finds geographically, or by time, and then it will try to support them by going to its body of knowledge and using multiple algorithms to verify the answer.” The more you work with Watson, the more specific its answers get – rather like a grand version of one of those simple algorithms online that help you pick books or music based on the preferences you give it.
Martin thinks the next few years will be spent working on ways to make deep analytics more accessible. There is a shortage of people who can work with these analytics, so if they can create something that’s easier for researchers to work with, “a nice interface on a tablet” for instance, that would be good. As part of their work with the Southern Ontario Smart Computing Innovation Platform (SOSCIP), Martin and his group created a “front-end service,” essentially a webpage, for IBM analytic software kept on a cloud that sits at Western University and allows researchers to use this software in an easier way. “Researchers,” he says, “shouldn’t have to know all the details about working in the cloud. Our service takes care of these details for them.”
“This is such a great area to be in,” he says. “It really keeps you on your toes.”
Centre for Advanced Computing
The Centre for Advanced Computing (CAC) at Queen's University supports the research community by providing access to innovative digital research infrastructure (DRI) tools, skills, and services, such as world-class, high availability, high-performance, and highly secure computing. The Centre’s commitment to security is highlighted through their conformity to national security standards and offers clients one of the most secure academic research environments in Canada. The CAC complies with a variety of frameworks and standards including Queen's, University, ISO 27002, and PHIPA.