Too Many Unhelpful Search Results

This is a brief follow up to my last post about how librarians and artificial intelligence experts can
get us all beyond mere curation and our frustrations using web search

.

In their day-to-day Google searches many people end up frustrated. But they assume that the problem is their own lack of expertise in framing the search request.

In these days of advancing natural language algorithms that isn’t a very good explanation for users or a good excuse for Google.

We all have our own favorite examples, but here’s mine because it directly speaks to lost opportunities to use the Internet as a tool of economic development.

Imagine an Internet marketing expert who has an appointment with a local chemical engineering firm to make a pitch for her services and help them grow their business. Wanting to be prepared, she goes to Google with a simple search request: “marketing for chemical engineering firms”. Pretty simple, right?

Here’s what she’ll get:

She’s unlikely to live long enough to read all 43,100,000+ hits, never mind reading them before her meeting. And, aside from an ad on the right from a possible competitor, there’s not much in the list of non-advertising links that will help her understand the marketing issues facing a potential client.

This is not how the sum of all human knowledge – i.e., the Internet – is supposed to work. But it’s all too common.

This is the reason why, in a knowledge economy, I place such a great emphasis on deep organization, accessibility and relevance of information.

© 2017 Norman Jacknis, All Rights Reserved

What Comes After Curation?

[Note: I’m President of the board of the Metropolitan New York Library Council, but this post is only my own view.]

A few weeks ago, I wrote about the second chance given to libraries, as Google’s role in the life of web users slowly diminishes. Of course, for at least a few years, one of the responses of librarians to the growth of the digital world has been to re-envision libraries as curators of knowledge, instead of mere collectors of documents. It’s not a bad start in a transition.

Indeed, this idea has also been picked up by all kinds of online sites, not just libraries. Everyone it seems wants to aggregate just the right mix of articles from other sources that might interest you.

But, from my perspective, curation is an inadequate solution to the bigger problem this digital knowledge century has created – we don’t have time to read everything. Filtering out the many things I might not want to read at all doesn’t help me much. I still end up having too much to read.

And we end up in the situation summed up succinctly by the acronym TL;DR, too long, didn’t read. (Or my version in response to getting millions of Google hits – TMI, TLK “too much information, too little knowledge”.)

The AQAINT project (2010) of the US government’s National Institute of Standards and Technology (NIST) stated this problem very well:

“How do we find topically relevant, semantically related, timely information in massive amounts of data in diverse languages, formats, and genres? Given the incredible amounts of information available today, merely reducing the size of the haystack is not enough; information professionals … require timely, focused answers to complex questions.”

Like NIST, what I really want – maybe what you want or need too? – is someone to summarize everything out there and create a new body of work that tells me just what I need to know in as few words as possible.

Researchers call this abstractive summarization and this is not an easy problem to solve. But there has been some interesting work going on in various universities and research labs.

At Columbia University, Professor Kathleen McKeown and her research colleagues developed “NewsBlaster” several years ago to organize and summarize the day’s news.

Among other companies, Automated Insights has developed some practical solutions to the overall problem. Their Wordsmith software has been used, for example, by the Associated Press “to transform raw earnings data into thousands of publishable stories, covering hundreds more quarterly earnings stories than previous manual efforts”.

For all their clients, they claim to produce “over 1.5 billion narratives annually”. And these are so well done that the New York Times had an article about it that was titled “If An Algorithm Wrote This, How Would You Even Know?”.

The next step, of course, is to combine many different data sources and generate articles about them for each person interested in that combination of sources.

Just a few months ago, Salesforce’s research team announced a major advance in summarization. Their motivation, by the way, is the same as mine:

“In 2017, the average person is expected to spend 12 hours and 7 minutes every day consuming some form of media and that number will only go up from here… Today the struggle is not getting access to information, it is keeping up with the influx of news, social media, email, and texts. But what if you could get accurate and succinct summaries of the key points…?”

Maluuba, acquired by Microsoft, has been continuing earlier research too. As they describe their research on “information-seeking behaviour”:

“The research at Maluuba is tackling major milestones to create AI agents that can efficiently and autonomously understand the world, look for information, and communicate their findings to humans.”

Librarians have skills that can contribute to the development of this branch of artificial intelligence. While those skills are necessary, they aren’t sufficient and a joint effort between AI researchers and the library world is required.

However, if librarians joined in this adventure, they could also offer the means of delivering this focused knowledge to the public in a more useful way than just dumping it into the Internet.

As I’ve blogged a few months ago:

Librarians have many skills to add to the task of “organizing the world’s information, and making it universally accessible”. But as non-profit organizations interested in the public good, libraries can also ensure that the next generation of knowledge tools – surpassing Google search – is developed for non-commercial purposes.

So, what comes after everyone has tried curation? Abstractive summarization aided by artificial intelligence software, that’s what!

© 2017 Norman Jacknis, All Rights Reserved