Is Open Data Good Enough?

Last week, on April 16th, the Knowledge Society Forum of the Eurocities group held its Beyond Data event in Eindhoven, the Netherlands.  The members of the KSF consists out of more than 50 policy makers focused on Open Data, from Europe.  They were joined by many other open data experts and advocates.

I led off with the keynote presentation.  The theme was simple: we need to go beyond merely opening (i.e., releasing) public data and there are a variety of new technologies that will make the Open Data movement more useful to the general public.

Since I was speaking in my role as Senior Fellow of the Intelligent Community Forum (ICF), I drew a parallel between that work and the current status of Open Data.  I pointed out that ICF has emphasized that an “intelligent city” is much more than a “smart city” with technology controlling its infrastructure.  What makes a community intelligent is if and how it uses that technology foundation to improve the experience of living there.

Similarly, to make the open data movement relevant to citizens, we need to go beyond merely releasing public data.   Even Hackathons and the encouragement of app developers has its limits in part because developers in private companies will try to find some way to monetize their work, but not all useful public problems have profit potential.

To create this value means focusing on data of importance to people (not just what’s easy to deliver), undertaking data analytics, following up with actions that have real impact on policies and programs and especially, engaging citizen in every step of the open data initiative.

image

I pointed out how future technology trends will improve every city’s use of its data in three ways:

1. Data collection, integration and quality

2. Visualization, anywhere it is needed

3. Analytics of the data to improve public policies and programs

For example, the inclusion of social data (like sentiment analysis) and the Internet of Things can be combined with data already collected by the government to paint a much richer picture of what is going on in a city.  In addition to drones, iBeacon, visual analyzers (like Placemeter), there are now also inexpensive, often open source, sensor devices that the public can purchase and use for more data collection.

Of course, all this data needs a different kind of management than businesses have used in the past.  So I pointed out NoSQL database management systems and Dat for real time data flow.  Some of the most interesting analytics is based on the merger of data from multiple sources, which poses additional difficulties that are beginning to be overcome through linked data and the new geospatial extension of the semantic web, GeoSPARQL.

If this data – and the results of its analysis – are to be useful, especially in real time, then data visualization needs to be everywhere.   That includes using augmented reality and even projecting results on surfaces, much like TransitScreen does.

And if all this data is to be useful, it must be analyzed so I discussed the key role of predictive analytics in going beyond merely releasing data.  But I emphasized the way that residents of a city can help in this task and cited the many people already involved in Zooniverse.  There are even tools to help people overcome their statistical immaturity, as you can see on Public Health Ontario.

Finally, the data can also be used by people to help envision – or re-envision – their cities through tools like Betaville.

Public officials have to go beyond merely congratulating themselves on being transparent by releasing data.  They need to take advantage of these technological developments and shift their focus to making the data useful to their residents – all in the overriding goal of improving the quality of life for their residents.  

image

© 2015 Norman Jacknis

[http://njacknis.tumblr.com/post/117084058588/is-open-data-good-enough]

Big Data, Big Egos?

By now, lots of
people have heard about Big Data, but the message often comes across as another
corporate marketing phrase and a message with multiple meanings.  That may be because people also hear from
corporate executives who eagerly anticipate big new revenues from the Big Data
world.

However, I
suspect that most people don’t know what Big Data experts are talking about,
what they’re doing, what they believe about the world, and the issues arising
from their work.

Although it was
originally published in 2013, the book “Big Data: A Revolution That Will
Transform How We Live, Work, And Think” by Viktor Mayer-Schönberger and Kenneth Cukier is perhaps the best recent
in-depth description of the world of Big Data.

For people like
me, with an insatiable curiosity and good analytical skills, having access to
lots of data is a treat.  So I’m very
sympathetic to the movement.  But like
all such movements, the early advocates can get carried away with their
enthusiasm.  After all, it makes you feel
so powerful as I recall some bad sci fi movies.

Here then is a
summary of some key elements of Big Data thinking – and some limits to that
thinking. 

Causation and
Correlation

When
presented with the result of some analysis, we’ve often been reminded that
“correlation is not causation”, implying we know less than we think if all we
have is a correlation.

For
many Big Data gurus, correlation is better than causation – or at least finding
correlations is quicker and easier than testing a causal model, so it’s not
worth putting the effort into building that model of the world.  They say that causal models may be an outmoded
idea or as Mayer-Schönberger
and Cukier say, “God is dead”.  They add
that “Knowing what, rather than why, is good enough” – good enough, at least,
to try to predict things.

This
isn’t the place for a graduate school seminar on the philosophy of science, but
there are strong arguments that models are still needed whether we live in a
world of big data or not.

All The Data, Not Just
Samples

Much
of traditional statistics dealt with the issue of how to draw conclusions about
the whole world when you could only afford to take a sample.  Big data experts say that traditional
statistics’ focus is a reflection of an outmoded era of limited data. 

Indeed, an example is a 1975 textbook that
was titled “Data Reduction: Analysing and Interpreting Statistical Data”. While
Big Data provides lots more opportunity for analysis, it doesn’t overcome all
the weaknesses that have been associated with statistical analysis and
sampling.  There can still be measurement
error.  Big Data advocates say the sheer
volume of data reduces the necessity of being careful about measurement error,
but can’t there still systematic error?

Big Data gurus say that they include all the data, not just a sample.  But, in a way, that’s clearly an
overstatement.  For example, you can
gather all the internal records a company has about the behavior and breakdowns
of even millions of devices it is trying to keep track of.  But, in fact, you may not have collected all
the relevant data.  It may also be a
mistake to assume that what is observed about even all people today will
necessarily be the case in the future – since even the biggest data set today
isn’t using tomorrow’s data.

More Perfect Predictions

The
Big Data proposition is that massive volumes of data allows for almost perfect
predictions, fine grain analysis and can almost automatically provide new
insights.  While these fine grain
predictions may indicate connections between variables/factors that we hadn’t
thought of, some of those connections may be spurious.  This is an extension of the issue of
correlation versus causation because there is likely an increase in spurious
correlations as the size of the data set increases.

If
Netflix recommends movies you don’t like, this isn’t a big problem.  You just ignore them.  In the public sector, when this approach to
predicting behavior leads to something like racial profiling, it raises legal
issues.

It
has actually been hard to find models that achieve even close to perfect
predictions – even the well-known stories about how Farecast predicted the best
time to buy air travel tickets or how Google searches predicted flu outbreaks.  For a more general review of these
imperfections, read Kaiser Fung’s “Why
Websites Still Can’t Predict Exactly What You Want
” published in Harvard
Business Review last year. 

Giving It All Away

Much
of the Big Data movement depends upon the use of data from millions – billions?
– of people who are making it available unknowingly, unintentionally or at
least without much consideration.

Slowly,
but surely, though, there is a developing public policy issue around who has
rights to that data and who owns it. 
This past November’s Harvard
Business Review
– hardly a radical fringe journal – had an article that
noted the problems if companies continue to assume that they own the
information about consumers’ lives.  In
that article, MIT Professor Alex Pentland proposes a “New Deal On Data”. 

So Where Does This Leave
Us?

Are
we much better off and learning much more with the availability of Big Data,
instead of samples of data, and the related ability of inexpensive computers
and software to handle this data? 
Absolutely, yes!

As
some of the big egos of Big Data claim, is Big Data perfect enough to withhold
some skepticism about its results? Has Big Data become the omniscient god? – Not
quite yet.

image
image

©
2015 Norman Jacknis

[http://njacknis.tumblr.com/post/110070952204/big-data-big-egos]

Can Prediction Markets Test Public Policy?

Prediction markets are hot again.  With the Presidential election next week and the relative accuracy of prediction markets about the election in 2008, I decided to finish this article that I first started in March 2009, but put aside for various reasons.

What is a prediction market?  James Surowiecki, the author of the book, “The Wisdom of Crowds” explains:

The premise is that under the right circumstances, the collective judgment of a large group of people will generally provide a better picture of what the future might look like than anything one expert or even a small group of experts will come up with. [Prediction markets] work much like a futures market, in which the price of a contract reflects the collective day-to-day judgment either on a straight numberfor instance, what level sales will reach over a certain periodor a probabilityfor example, the likelihood, measured as a percentage, that a product will hit a certain milestone by a certain date.

[F]or a crowd to be smart, it needs to satisfy certain criteria. It needs to be diverse, so that people are bringing different pieces of information to the table. It needs to be decentralized, so that no one at the top is dictating the crowds answer. It needs to summarize peoples opinions into one collective verdict. And the people in the crowd need to be independent, so that they pay attention mostly to their own information and dont worry about what everyone around them thinks.

These markets have been around for a few years and have a good, if not perfect, track record.  Various studies have found prediction markets to be better than experts or public opinion polling.

But my focus is not on predicting who wins the White House or the SuperBowl or the number of coins in a large bottle.  Rather there is a use of prediction markets for government leaders testing the likelihood that a particular public policy will achieve success, especially if the policy is intended to change the behavior of people.

Often it is difficult to assess how the public will react to a proposed policy will people actually sign up for program X; will people recycle; etc.  I’m suggesting that prediction markets be used to estimate the reaction ahead of time.

Implicit in the diversity of views that Surowiecki notes is that enough people need to care about the policy.  The reason they care may be to win money, in some cases, but thats not the only reason.  They might care because the market deals with something that affects their lives.

And, the nice thing about this is that if only a few people care about a policy that also tells you something about the policy or, at least, whether youll get into deep trouble proposing it.

You can learn from prediction markets and they may be more effective predictors of policy success than traditional tools.

So far I can tell this has not really been tried yet in the way I’m describing.  The closest was in 2003, when DARPA created a Policy Analysis Market to be “a market in the future of the Middle East”.  That market looked, to critics, like a betting parlor on political assassinations by terrorists and was withdrawn.

In a more futuristic vision, Robin D. Hanson, an economist at George Mason University and a fan of alternative institutions, writes about futarchy, “a form of government enhanced by prediction markets. Voters would decide broad goals of national welfare, but betting in speculative markets would determine the policy steps to achieve those goals.”

I’m not proposing anything so revolutionary as he is.  Instead, let’s try to use prediction markets on a more experimental basis.

More background on prediction markets can be found in:

  • Wikipedia

http://en.wikipedia.org/wiki/Prediction_market and

http://en.wikipedia.org/wiki/Policy_Analysis_Market

  • When the Crowd Isnt Wise

http://www.nytimes.com/2012/07/08/sunday-review/when-the-crowd-isnt-wise.html

  • Interpreting the Predictions of Prediction Markets

http://scholar.google.com/scholar_url?hl=en&q=http://citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.159.7574%26rep%3Drep1%26type%3Dpdf&sa=X&scisig=AAGBfm3lg0YqMPEHYfN6LogKNPFcL1GZww&oi=scholarr

© 2012 Norman Jacknis

[http://njacknis.tumblr.com/post/34709617211/can-prediction-markets-test-public-policy]