by Dirk Helbing (ETH Zurich, firstname.lastname@example.org)
(an almost identical version has been
forwarded to some Members of the European Parliament on April 7, 2013)
(an almost identical version has been forwarded to some Members of the European Parliament on April 7, 2013)
Some serious, fundamental problems to be solved
The first problem is that, when two or more anonymous data sets are being combined, this may allow deanonymization, i.e. the identification of the individuals of which the data have been recorded. Mobility data, in particular, can be easily deanonymized.
A second fundamental problem is that it must be assumed that the large majority of people in developed countries, including the countries of the European Union, have already been profiled in detail, given that individual devices can be identified with high accuracy through individual configurations (including software used and their configurations). There are currently about 700 Million commercial data sets about users specifying an estimated number of 1500 variables per user.
A third problem is that both, the CIA and the FBI have revealed that, besides publicly or semipublicly available data in the Web or Social Media, they are or will be storing or processing private data including Gmail and Dropbox data. The same applies to many secret services around the world. It has also become public that the NSA seems to collect all data they can get hold of.
A fourth fundamental problem is that Europe currently does not have the technical means, algorithms, software, data and laws to counter foreign dominance regarding Big Data and its potential misuse.
The age of information will only be sustainable, if people can trust that their data are being used in their interest. The spirit and goal of data regulations should be to ensure this.
Personal data are data characterizing individuals or data derived from them. People should be the primary owners of their personal data. Individuals, companies or government agencies, who gather, produce, process, store, or buy data should be considered secondary owners. Whenever personal data are from European citizens, or are being stored, processed, or used in a European country or by a company operating in a European country, European law should be applied.
Individuals should be allowed to use their own personal data in any way compatible with fundamental rights (including sharing them with others, for free or at least for a small monthly fee covering the use of ALL their personal data – like the radio and TV fee). [Note: This is important to unleash the power of personal data to the benefit of society and to close the data gap that Europe has.]
Individuals should have a right to access a full copy of all their personal data through a central service and be suitably protected from misuse of these data.
They should have a right to limit the use of their personal data any time and to request their correction or deletion in a simple and timely way and for free.
Fines should apply to any person or company or institution having or creating financial or other advantages by the misuse of personal data.
Misuse includes in particular sensitive use that may have a certain probability of violating human rights or justified personal interests. Therefore, it must be recorded what error rate the processing (and, in particular, the classification) of personal data has, specifying what permille of users feel disadvantaged.
A central institution (which might be an open Web platform) is needed to collect user complaints. Sufficient transparency and decentralized institutions are required to take efficient, timely and affordable action to protect the interest of users.
The execution of user rights must be easy, not time consuming, and cheap (essentially for free). For example, users must not be flooded with requests regarding their personal data. They must be able to effectively ensure a self-determined use of personal data with a small individual effort.
To limit misuse, transparency is crucial. For example, it should be required that large-scale processing of personal data (i.e. at least the queries that were executed) must be made public in a machine-readable form, such that public institutions and NGOs can determine how dangerous such queries might be for individuals.
As indicated above, there is practically no data that can not be deanonymized, if combined with other data. However, the following definition may be considered to be a practical definition of anonymity:
Anonymous data are data in which a person of interest can only be identified with a probability smaller than 1/2000, i.e. there is no way to find out which one among two thousand individuals has the property of interest.
Hence, the principles is that of diluting persons with a certain property of interest by 2000 persons with significantly other properties in order to make it unlikely to identify persons with the property of interest. This principle is guided by the way election data or other sensitive data are being used by public authorities. It also makes sure that private companies do not have a data processing advantage over public institutions (including research institutions).
I would propose to characterize pseudonymous data as data not suited to reveal or track the user and properties correlated with the user that he or she has not explicitly chosen to reveal in the specific context. I would furthermore suggest to characterize pseudonymous transactions as processing and storing the minimum amount of data required to perform a service requested by a user (which particularly implies not to process or store technical details that would allow one to identify the device and software of the user). Essentially, pseudonymous transactions should not be suited to identity the user or variables that might identify him or her. Typically, a pseudonym is a random or user-specified variable that allows one to sell a product or perform a service for a user anonymously, typically in exchange for an anonymous money transfer.
To allow users to check pseudonymity, the data processed and stored should be fully shared with the user via an encrypted webpage (or similar) that is accessible for a limited, but sufficiently long time period through a unique and confidential decryption key made accessible only to the respective user. It should be possible for the user to easily decrypt, view, copy, download and transfer the data processed and stored by the pseudonymous transaction in a way that is not being tracked.
Difficulty to anonymize data
- Researchers reverse Netflix anonymization, see www.securityfocus.com/news/11497
- Unique in the crowd: The privacy bounds of human mobility, see www.nature.com/srep/2013/130325/srep01376/full/srep01376.html
Danger of surveillance society
- Google as God? Opportunities and risks of the information age, see www.synthesisips.net/blog/google-as-god/
- Big data is opening doors, but maybe too many, see www.nytimes.com/2013/03/24/technology/big-data-and-a-renewed-debate-over-privacy.html?ref=stevelohr&_r=2&
- Future planet – future of surveillance, see www.international.to/index.php?option=com_content&view=category&id=94&layout=blog&Itemid=104
- CIA and FBI strategies to mine personal data, see www.businessinsider.com/cia-presentation-on-big-data-2013-3?op=1 and www.gigaom.com/2013/03/20/even-the-cia-is-struggling-to-deal-with-the-volume-of-real-time-social-data/2/ and http://www.slate.com/blogs/future_tense/2013/03/26/andrew_weissmann_fbi_wants_real_time_gmail_dropbox_spying_power.html
- US Consumer Privacy Bill of Rights, see www.whitehouse.gov/sites/default/files/privacy-final.pdf
- Personal data: The emergence of a new asset class, see www.weforum.org/reports/personal-data-emergence-new-asset-class
- HP software allowing personalized advertisement without revealing personal data to companies, contact: Prof. Dr. Bernardo Huberman: email@example.com
- FuturICT – The road towards ethical ICT, see http://link.springer.com/article/10.1140%2Fepjst%2Fe2012-01691-2#page-1
- From social data mining to forecasting socio-economic crises, see http://link.springer.com/article/10.1140%2Fepjst%2Fe2011-01401-8
- FuturICT Facebook page: www.facebook.com/FuturICT
- FuturICT twitter channel: https://twitter.com/FuturICT
Dirk Helbing is Professor of Sociology, in particular of Modeling and Simulation, and member of the Computer Science Department at ETH Zurich. He is also elected member of the German Academy of Sciences. He earned a PhD in physics and was Managing Director of the Institute of Transport & Economics at Dresden University of Technology in Germany. He is internationally well-known for his work on pedestrian crowds, vehicle traffic, and agent-based models of social systems. Furthermore, he is coordinating the FuturICT Initiative (www.futurict.eu), which focuses on the understanding of techno-socio-economic systems, using Big Data. His work is documented by hundreds of well-cited scientific articles, dozens of keynote talks and hundreds of media reports in all major languages. Helbing is also chairman of the Physics of Socio-Economic Systems Division of the German Physical Society, co-founder of ETH Zurich’s Risk Center, and elected member of the World Economic Forum’s Global Agenda Council on Complex Systems.