Friday 26 September 2014

BIG DATA: A Powerful New Resource for the 21st Century

by Dirk Helbing

This chapter is a free translation of an introductory article on "Big Data - Zauberstab und Rohstoff des 21.Jahrhunderts" originally published in Die Volkswirtschaft - Das Magazin für Wirtschaftspolitik (5/2014), 

Abstract


Information and communication technology (ICT) is the economic sector that is developing most rapidly in the USA and Asia and generates the greatest value added per employee. Big Data - the algorithmic discovery of hidden treasures in large data sets - creates new economic value. The development is increasingly understood as a new technological revolution. Switzerland could establish itself as data bank and Open Data pioneer in Europe and turn into a leading place in the area of information technologies.


What is Big Data?

When the social media portal WhatsApp with its 450 million users was recently sold to Facebook for$19 billion - almost half a billion dollars was made per employee. "Big Data" is changing our world. The term, coined more than 15 years ago, means data sets so big that one can no longer cope with them with standard computational methods. Big Data is increasingly referred to as the oil of the 21st century. To benefit from it, we must learn to "drill" and "refine" data, i.e. to transform them into useful information and knowledge. The global data volume doubles every 12 months. Therefore, in just two years, we produce as much data as in the entire history of humankind.

Tremendous amounts of data have been created by four technological innovations:
  • the Internet, which enables our global communication
  • the World Wide Web, a network of globally accessible websites that evolved after the invention of hypertext protocol (HTTP) at CERN in Geneva
  • the emergence of social media such as Facebook, Google+, Whatsup, or Twitter, which have created social communication networks, and 
  • the emergence of the "Internet of Things'', which also allows sensors and machines to connect to the Internet. Soon there will be more machines than human users in the Internet.


Data sets bigger than the largest library

Meanwhile, the data sets collected by companies such as eBay, Walmart or Facebook, reach the size of petabytes (1 million billion bytes) - one hundred times the information content of the largest library in the world: the U.S. Library of Congress. The mining of Big Data opens up entirely new possibilities for process optimization, identification of interdependencies, and decision support. However, Big Data also comes with new challenges, which are often characterized by four criteria:
  • volume: the file sizes and number of records are huge,
  • velocity: the data evaluation has often to be done in real-time,
  • variety: the data is often very heterogeneous and unstructured,
  • veracity: the data is probably incomplete, not representative, and contains errors

Therefore, one had to develop completely new algorithms: new computational methods. Because it is inefficient for Big Data processing to load all relevant data into a shared memory, the processing must take place locally, where the data resides, on potentially, thousands of computers. This is accomplished with massively parallel computing approaches such as: MapReduce or Hadoop. Big Data algorithms detect interesting interdependencies in the data ("correlations"), which may be of commercial value, for example, between weather and consumption or between health and credit risks. Today, even the prosecution of crime and terrorism is based on the analysis of large amounts of behavioral data.


What do applications look like?

Big Data applications are spreading like wildfire. They facilitate personalized offers, services and products. One of the greatest successes of Big Data is automatic speech recognition and processing. Apple's Siri understands you when asking for a Spanish restaurant, and Google Maps can lead you there. Google Translate interprets foreign languages by comparing them with a huge collection of translated texts. IBM's Watson computer even understands human language. It can not only beat experienced quiz show players, but even take care of customer hotlines - often better than humans. IBM has recently decided to invest $1 billion to further develop and commercialize the system.

Of course, Big Data plays an important role in the financial sector. Approximately seventy percent of all financial market transactions are now made by automated trading algorithms. In just one day, the entire money supply of the world is traded. Such quantities of money also attract organized crime and financial transactions are scanned by Big Data algorithms for abnormalities to detect suspicious activities. The company Blackrock uses a similar software called "Aladdin", to successfully speculate with funds amounting to multiple times the gross domestic product (GDP) of Switzerland.

Box 1:
To get an overview of the ICT trends, it is worthwhile to look at Google with over 50 software platforms. The company invests nearly $6 billion in research and development annually. Within just one year, Google has introduced self-driving cars, invested heavily in robotics, and started a Google Brain project to add intelligence to the Internet. Through the purchase of Nest Labs, Google has also invested $3.2 billion in the "Internet of Things". Furthermore, Google X has been reported to have around 100 secret projects in the pipeline.


The potential is great...

No country today can afford to ignore the potentials of Big Data. The additional economic potential of Open Data alone - i.e. of data sets that are made ​​available to everyone - is estimated by McKinsey to be between 3,000 to 5,000 billion dollars globally each year [2]. This can benefit almost all sectors of society. For example, energy production and consumption can be better matched with "smart metering", and energy peaks can be avoided. More generally, new information and communication technologies allow us to build "smart cities". Resources can be managed more efficiently and the environment protected better. Risks can be better recognized and avoided, thereby reducing unintended consequences of decisions and identifying opportunities that would otherwise have been missed. Medicine can be better adapted to the patients, and disease prevention may become more important than curing diseases.


... but also the implicit risks

Like all technologies, Big Data also implies risks. The security of digital communication has been undermined. Cyber ​​crime, including data, identity and financial theft, quickly spread on ever greater dimensions. Critical infrastructures such as energy, financial and communication systems are threatened by cyber attacks. They could, in principle, be made dysfunctional for an extended time period.

Moreover, while common Big Data algorithms are used to reveal optimization potentials, their results may be unreliable or may not reflect causal relationships. Therefore, a naive application of Big Data algorithms can easily lead to wrong conclusions. The error rate in classification problems (e.g. the distinction between "good" and "bad" risks) is often relevant. Issues such as wrong decisions or discrimination must be seriously considered. Therefore, one much find effective procedures for quality control. In this connection, universities will likely play an important role. One must also find effective mechanisms to protect privacy and the right of informational self-determination, for example, by applying the Personal Data Purse [1] concept.


The digital revolution creates an urgency to act

Information and communication technologies are going to change most of our traditional institutions: our educational system (personalized learning), science (Data Science), mobility (self-driving cars), the transport of goods (drones), consumption (see amazon and ebay), production (3D printers), the health system (personalized medicine), politics (more transparency), and the entire economy (with co-producing consumers, so-called prosumers). Banks are losing more and more ground to algorithmic trading, alternative payment systems such as Bitcoins, Paypal and Google Wallet. Moreover, a substantial part of the insurance business takes place in financial products such as credit default swaps. For the economic and social transformation into a ``digital society'', we may perhaps just have 20 years. This is an extremely short time period, considering that the planning and construction of a road often requires 30 years or more.

The foregoing implies an urgent need for action on the technological, legal and socio-economic level. Some years ago, the United States started a Big Data research initiative amounting to 200 million dollars followed by further substantial investments. In Europe, the FuturICT project (www.futurict.eu) has developed concepts for the digital society within the context of the EU flagship competition. Other countries have already started to implement this concept, for example, Japan has recently launched a $100 million 10-year project at the Tokyo Institute of Technology. In addition, numerous other projects exist, particularly in the military and security sector, which often have multiples of the budgets mentioned above.


Switzerland can become a European driver of innovation for the digital era

Switzerland is well positioned to benefit from the digital age. However, it is insufficient to reinvent and build upon already existing technologies in Switzerland. New inventions that will shape the digital age must be invented. The World Wide Web was once invented in Switzerland, the largest civil Big Data competence in the world exists at CERN, however the USA and Asian countries have the lead in commercializing Big Data to date. With the NSA controversy, the ubiquity of wireless communication sensors as well as the "Internet of Things",  a new opportunity is emerging.

With targeted support of ICT activities at its universities, Switzerland could take the lead in Europe's research and development. Swiss academia has excelled with the scientific coordination of three out of six finalists of the EU FET flagship competition.
At the moment, however, there is only a focus on the digital modeling of the human brain and robotics. However from 2017 onwards, the ETH domain plans to increasingly invest into the area of Data Science, the emerging research field centered around the scientific analysis of data.

In view of the fast development of the ICT area, the huge economic potential as well as the transformative power of these technologies, a prioritized, broad and substantial financial support is a matter of Swiss national interest. With its basic democratic values, legal framework and ICT focus, Switzerland is well prepared to become Europe's innovation driver for the digital age.

Box 2:
How will the digital revolution change our economy and society? How can we use this as an opportunity for us and reduce the related risks? For illustration, it is helpful to recall the factors that enabled the success of the automobile age: the invention of cars and of systems of mass production; the construction of public roads, gas stations, and parking lots; the creation of driving schools and driver licenses; and last but not least, the establishment of traffic rules, traffic signs, speed controls, and traffic police.
What are the technological infrastructures and the legal, economic and societal institutions needed to make the digital age a big success? This question would set the agenda of the Innovation Alliance. A partial answer is already clear: we need trustworthy, transparent, open, and participatory ICT systems, which are compatible with our values. For example, it would make sense to establish the emergent "Internet of Things" as a Citizen Web. This would enable self-regulating systems through real-time measurements of the state of the world, which would be possible with a public information platform called the "Planetary Nervous System". It would also facilitate a real-time measurement and search engine: an open and participatory "Google 2.0."


To protect privacy, all data collected about individuals should be stored in a Personal Data Purse and, given informed consent, processed in a decentralized way by third-party Trustable Information Brokers, allowing everyone to control the use of their sensitive data. A Micro-Payment System would allow data providers, intellectual property right holders, and innovators to get rewards for their services. It would also encourage the exploration of new and timely intellectual property right paradigms ("Innovation Accelerator"). A pluralistic, User-centric Reputation System would promote responsible behavior in the virtual (and real) world. It would even enable the establishment of a new value exchange system called "Qualified Money," which would overcome weaknesses of the current financial system by providing additional adaptability.
A Global Participatory Platform would empower everyone to contribute data, computer algorithms and related ratings, and to benefit from the contributions of others (either free of charge or for a fee). It would also enable the generation of Social Capital such as trust and cooperativeness, using next-generation User-controlled Social Media. A Job and Project Platform would support crowdsourcing, collaboration, and socio-economic co-creation. Altogether, this would build a quickly growing Information and Innovation Ecosystem, unleashing the potential of data for everyone: business, politics, science, and citizens alike.

Further Reading

[1] Y.-A. de Montjoye, E. Shmueli, S. S. Wang, and A. S. Pentland (2014) openPDS: Protecting the Privacy of Metadata through SafeAnswers,

[2] McKinsey & Company (2013) Open data: Unlocking innovation and performance with liquid information,


2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Useful information shared. I am very happy to read this article. Thanks for giving us nice info. Fantastic walk through. I appreciate this post.
    new technology 2014

    ReplyDelete

Note: only a member of this blog may post a comment.