An EU-funded project has developed a platform that converts vast user-generated content from a problem of information overload into a new, collective intelligence with a range of applications, from handling emergencies to enhanced city tourism. The project has filed for several patents and a handful of products and results are destined for public or commercial release. Is this the beginning of data mining 3.0?
Information and knowledge are increasing at great speed. Our capacity to store, transmit and compute information has grown at an annual rate of 23 % since 1986. By 2007, the average internet user transmitted about six newspapers worth of information every day, and received 174 newspapers of data, according to a study published in Science last February.
Social media sharing websites are also adding to the information load, hosting billions of images and videos, most of which have been annotated and shared among friends, or published in groups that cover a specific topic of interest.
Data is increasing at such a rate that it is outpacing our capacity to organise it. 'It is easy for users and organisations to generate and share content, individually or within communities, thanks to advanced communication devices like laptops, tablets and smart phones,' explains Yiannis Kompatsiaris, coordinator of the EU-funded project WeKnowIt.
With all this content, it makes extracting useful information extremely complex and costly, and current applications do not fully support intelligent processing and management of the data, suggests Dr Kompatsiaris. 'Users are failing to process information efficiently and cannot exploit the underlying knowledge,' he says.
The WeKnowIt integrated project set out to change all that, starting out with a vision that conceptualises the different forms of data we encounter daily and gathering it into cohesive, actionable information that the WeKnowIt team calls 'collective intelligence'.
The project focused for the most part on mining the type of data generated by social bookmarking, tagging and networking, where collective opinions and input creates a highly detailed dataset.
'There are different types of intelligence we receive and process every day,' says Dr Kompatsiaris. 'There is the digital content and contextual information we call "media intelligence". User feedback on a large scale constitutes "mass intelligence", while personal interactions engage "social intelligence", all leading to the personal and organisational intelligence of individuals and companies.'
The WeKnowIt vision started from the conviction that, combined, these various information sources could create an emergent 'collective intelligence' that can powerfully enhance the capacity of individuals and institutions to find and act upon relevant information at the right time. It was a hugely ambitious goal when WeKnowIt started work in 2008, and now the vision is a lot closer.
The project recruited some of the world's leading names in data management and integration, like the Brno University of Technology in the Czech Republic, the Koblenz-Landau University in Germany, Yahoo! in Spain, and Vodafone and the Centre for Research and Technology Hellas (CERTH) in Greece. The company Software Mind in Poland provided software development and integration while the University of Sheffield and Sheffield City Council in the UK developed key tools for use-case scenarios.
In all, ten partner organisations from the Czech Republic, Germany, Spain, Greece, Poland and the UK spent three years and EUR 7.5 million, EUR 5.37 million provided by the EU, to develop a platform and a series of associated tools to help people handle many different types and sources of information in a cohesive way.
'Using a wide variety of tools, the WeKnowIt platform transforms large-scale and poorly structured information into meaningful topics, entities, points of interest, social connections and events,' notes Dr Kompatsiaris.
To do this, the project developed a core platform in the form of a middleware application that can be deployed on servers to process incoming data and route it effectively.
The project's various partners then developed a very large number of tools - over 20 in all - that can be deployed and combined in the different ways, either directly or via the core WeKnowIt platform. 'We developed seven tools for the case studies - an emergency response scenario and a consumer social group scenario - and the partners created another 13 for specific tasks,' explains Dr Kompatsiaris.
'City exploration by use of hybrid clustering' (ClustTour) is an example of a standalone tool developed by CERTH-ITI for a specific task. It is an online exploration application that helps users find interesting places using groups of photos, called clusters, which correspond to landmarks and events.
ClustTour uses both visual and tag information with a 'cluster classification' and 'merging module' to identify photos that belong together, then it places the object on a map. Users can simply click the photos to see what's there.
This can dramatically enhance the information available to people exploring a city, as Dr Kompatsiaris explains. A conventional travel guide would highlight the National Archaeological Museum of Athens as a 'point of interest' (POI). But the ClustTour generated far more detail by identifying interesting photos relevant to exhibits within the museum, such as the Arkhagetas Inscription, the collection of Early Cycladic Art, a collection of well-known sculptures and even a collection of golden treasures discovered by Schliemann, a German archaeologist.
'It becomes obvious that the ClustTour tool, developed in WeKnowIt, can offer a fine-grained and media-rich exploration and travel preparation experience,' notes Dr Kompatsiaris.
And that's just one of many tools. Moreover, tools can be combined. For example, the consumer social group case study sought to help people to plan a day trip with a PC and then guide them via mobile phone during the tour itself. It used a variety of WeKnowIt tools to accomplish this, including ClustTour, Fannr (a Flickr annotator) and VIRaL, a visual retrieval and localisation tool, among others.
The emergency response scenario also combined various WeKnowIt tools to help provide relevant, timely information for emergency services. It used an intelligent uploading process that could, for example, identify a location based on a photo sent by a user. The system can even assess the level of urgency by assessing the severity of an incident from photos.
Both case studies were successfully completed, with the project finishing its formal phase earlier this year. But the project has since taken on a life of its own. 'Many of the partners are continuing to develop software and tools around the model and architecture that WeKnowIt defined, and the partners are staying in contact and sharing information on an on-going basis,' Dr Kompatsiaris reveals.
For example, CERTH-ITI, Yahoo! and Koblenz University are continuing their research activities and collaboration on real-time aspects of information extraction from social media. They are also looking at new applications including the news sector and large events, such as music and film festivals. At the same time, the University of Sheffield and Sheffield City Council are discussing updated emergency response versions of the project.
CERTH-ITI is participating in a new spin-out company, called Veribin, which will act as a content aggregator in various sectors and business areas. The two main markets targeted will be the news and e-learning areas and CERTH-ITI will apply and further develop the social media clustering techniques developed in WeKnowIt. Veribin is a spin-out company of ATC S.A. Information Technology Company and start-up funding was recently approved by the Greek Secretariat for Research and Development, according to Dr Kompatsiaris.
Meanwhile, Software Mind is developing new semantic web tools for the telecommunications and financial sectors and Vodafone will use the knowledge gained in the project to exploit network infrastructure for new services. The University of Koblenz has established a spin-off company, called Kreuzverweis, exploiting WeKnowIt project results from the organisational intelligence layer.
The project also filed for nine patents. These are just a handful of the products and results that are slated for public release. Others will emerge over time, particularly with the establishment of the WeKnowIt user group, which helps interested parties stay informed about the latest developments related to the project.
The WeKnowIt project received research funding under the EU's Seventh Framework Programme, under the 'Intelligent content and semantics' sub-programme.