[Note: I’m President of the board of the Metropolitan New York Library Council, but this post is only my own view.]
A few weeks ago, I wrote about the second chance given to libraries, as Google’s role in the life of web users slowly diminishes. Of course, for at least a few years, one of the responses of librarians to the growth of the digital world has been to re-envision libraries as curators of knowledge, instead of mere collectors of documents. It’s not a bad start in a transition.
Indeed, this idea has also been picked up by all kinds of online sites, not just libraries. Everyone it seems wants to aggregate just the right mix of articles from other sources that might interest you.
But, from my perspective, curation is an inadequate solution to the bigger problem this digital knowledge century has created – we don’t have time to read everything. Filtering out the many things I might not want to read at all doesn’t help me much. I still end up having too much to read.
And we end up in the situation summed up succinctly by the acronym TL;DR, too long, didn’t read. (Or my version in response to getting millions of Google hits – TMI, TLK “too much information, too little knowledge”.)
“How do we find topically relevant, semantically related, timely information in massive amounts of data in diverse languages, formats, and genres? Given the incredible amounts of information available today, merely reducing the size of the haystack is not enough; information professionals … require timely, focused answers to complex questions.”
Like NIST, what I really want – maybe what you want or need too? – is someone to summarize everything out there and create a new body of work that tells me just what I need to know in as few words as possible.
Researchers call this abstractive summarization and this is not an easy problem to solve. But there has been some interesting work going on in various universities and research labs.
At Columbia University, Professor Kathleen McKeown and her research colleagues developed “NewsBlaster” several years ago to organize and summarize the day’s news.
Among other companies, Automated Insights has developed some practical solutions to the overall problem. Their Wordsmith software has been used, for example, by the Associated Press “to transform raw earnings data into thousands of publishable stories, covering hundreds more quarterly earnings stories than previous manual efforts”.
For all their clients, they claim to produce “over 1.5 billion narratives annually”. And these are so well done that the New York Times had an article about it that was titled “If An Algorithm Wrote This, How Would You Even Know?”.
The next step, of course, is to combine many different data sources and generate articles about them for each person interested in that combination of sources.
Just a few months ago, Salesforce’s research team announced a major advance in summarization. Their motivation, by the way, is the same as mine:
“In 2017, the average person is expected to spend 12 hours and 7 minutes every day consuming some form of media and that number will only go up from here… Today the struggle is not getting access to information, it is keeping up with the influx of news, social media, email, and texts. But what if you could get accurate and succinct summaries of the key points…?”
Maluuba, acquired by Microsoft, has been continuing earlier research too. As they describe their research on “information-seeking behaviour”:
“The research at Maluuba is tackling major milestones to create AI agents that can efficiently and autonomously understand the world, look for information, and communicate their findings to humans.”
Librarians have skills that can contribute to the development of this branch of artificial intelligence. While those skills are necessary, they aren’t sufficient and a joint effort between AI researchers and the library world is required.
However, if librarians joined in this adventure, they could also offer the means of delivering this focused knowledge to the public in a more useful way than just dumping it into the Internet.
As I’ve blogged a few months ago:
Librarians have many skills to add to the task of “organizing the world’s information, and making it universally accessible”. But as non-profit organizations interested in the public good, libraries can also ensure that the next generation of knowledge tools – surpassing Google search – is developed for non-commercial purposes.
So, what comes after everyone has tried curation? Abstractive summarization aided by artificial intelligence software, that’s what!
© 2017 Norman Jacknis, All Rights Reserved