Subscribe via Email
Enter your Email Address:
Delivered by FeedBurner

Friday, November 4, 2011

Will Hybrid Search get you better mileage?

The recent news of the acquisition of Endeca by Oracle has triggered a number of research notes by analysts. In particular Sue Feldman of IDC talked of the rise of a Hybrid Search architecture. What is it? Is it good for you? Should you have one? And where does MuseGlobal stand?

Sue defined Hybrid Search: “Search vendors perceived this logical progression in information access a number of years ago, and several were at the forefront of creating new, hybrid architectures to enable access to both structured and unstructured information from a single access point.”

She went on to point out that the new hybrid architecture was more comprehensive; “The new hybrid architectures incorporate the speed and immediacy of search with the analysis and reporting features of BI.” and to find a justification for it – “the enterprise of the future will be information centered, and will require an agile,adaptable infrastructure to monitor and mine information as it flows into the company.”
Nick Patience and Brenon Daly of 451 Research went on to define The hybrid architecture’s capabilities in a bit more detail for Endeca’s version: “Endeca’s underlying technology is called MDEX, which is a hybrid search and analytics database used for the exploration, search and analysis of data. The MDEXengine is designed to handle unstructured data – e.g., text, semi-structured content involving data sources that have metadata such as XML files, and structured data – in a database or application.”
These definitions acknowledge the growing importance of information from everywhere, in unstructured as well as structured form, and the need to be able to access and analyze it in the modern enterprise. Information can, and does, come from anywhere – internal CRM systems, company independent blogs and forums, totally differentiated social media such as blogs and tweets, competitor websites, news services, and even raw data repositories. And it comes in the form of database records, blogs, emails, tweets, images and more. In the modern enterprise the need is to be able to analyze and use all this information immediately and easily.

Mining information from these disparate sources is not something that business analysts or product managers should be spending their time on. They need a reliable supply of the information where the semantics can be trusted, the information is up-to-date, and where the analyses can be set up easily. This is where the “plumbing” comes in. Two stages are involved: gathering the information, analyzing the information, then the user can take action on the intelligence provided. Companies like MuseGlobal take care of the first stage, and repository and BI companies take care of the second.

Some companies, like Endeca, take care of both stages, but then you are locked into both products from a single vendor, and it is not usual that they are both “best of breed”. So MuseGlobal concentrates on what it does best – gathering, normalizing, mining and performing simple analytics on data - and seamlessly passes the information on to your choice of Data Warehouse, Repository, BI, analytics engine – whatever best suit
s the company’s needs.

What this means is that your organization sets up a Muse harvesting and/or Federated Search system once, pointing to the desired Sources of data, configures authentication where needed, and determines how the results are to be delivered to the analysis engine, specifying a choice of standards based or proprietary protocols and formats. Adding new Sources (or removing unwanted ones) is a point and click operation, and the Muse Automatic Source Update mechanism (and our programmers and analysts) ensures the connections remain working even when the sources change their characteristics – or even their address! About as close to “set and forget” as you can get in this changing world.

On schedule, or when requested by users, the data pours out of Muse in a consistent standardized format, with normalized semantics and even added enrichments and extracted “entities” or “facets” (Endeca’s terminology) and heads to the next stage of the information stack. This raw and analysed data input means the BI system (or whatever is in use) can now deliver more comprehensive analyses so staff can now concentrate on the information they have in front of them, not on seeking it in bits and pieces from all over the place.

And the information is not only from many sources, it is in varied formats. Forrester have just released a report which asks the questions Have you noticed how search engine results pages are now filled with YouTube videos, images, and rich media links? Every day, the search experience is becoming more and more display-like, meaning marketers must align their search and display marketing strategies and tactics.” So the need to handle a complete range of media types and convoluted structures is becoming paramount or the received data will be just the small amount of text left over from the rich feast of the retrieved results. This is a topic for another blog, but suffice it to say that Muse can deliver the videos as well as the text.

Tuesday, August 23, 2011

HP to acquire Autonomy

The news

Hewlett-Packard announced on August 18th an agreement to purchase Autonomy. Autonomy has moved beyond its original enterprise search capabilities by utilizing its IDOL (Integrated Data Operating Layer) as the forward looking platform to provide an information bus to integrate other activities. It now handles content management, analytics, and disparate connectors, as well as advanced searching, to provide users access to data gathered from multiple sources and fitting to their needs. It has also moved aggressively to the cloud and currently nearly 2/3 of its sales are for cloud services.

The impact on MuseGlobal

This is a major endorsement for MuseGlobal’s technology, with its functionality to break down the barriers between silos of information in the enterprise as well as elsewhere. In the words of the IDC analysts (ref below):

“…to a new IT infrastructure that integrates both unstructured and structured information. These newer technologies enable enterprises to forage for relationships in information that exist in separate silos…”

They call this integration a “tipping point” and see that it is a means for a new lease of life for HP in the data management and services area. Again according to IDC it provides:

“A modular platform that can aggregate, normalize, index, search and query, analyze, visualize and deliver all types of information from legacy and current information sources will support a new kind of software application”

Although Autonomy will bring significant revenue and a large cloud footprint to HP, the major imagined benefit is seem in its ability to aggregate, normalize, analyze and distribute information across an enterprise. This is an area where MuseGlobal’s Muse system with its ICE “bus” provides a very similar set of functionality with its Connectors (6,000+ and growing), Data Model and semantically aware record conversion, and entity extraction analysis, providing similar functionality – if not content management or enterprise search. Muse is also very strong in record enrichment so that virtual records can be provided both ad hoc and on a regular “harvested” basis to connected processing systems – such as content management or enterprise search.

Various commentators suggest that this move may “encourage” the other big players who HP competes against to have a look at acquisitions of their own. OpenText is the most noted possibility, though Endeca and Vivisimo get a mention. MuseGlobal is certainly in the same functional ballpark providing functionality for enterprises, universities, libraries, public safety, and news media.

HP to Acquire Autonomy: Bold Move Supports Leo Apotheker's Shift to Software

Wednesday, March 16, 2011

Social is taking Search in a more Democratic direction...

Totally agree with this article published on

We definitely saw this trend over the last few quarters as "social communities" emerged and relevant content started to show up in these communities without a lot of effort by an individual member.

Our nRich product offers relevant content from trusted sources as well as its social rank. Gaming - like what JC Penny or Overstock tried - is not a big factor because other community members have already done the initial filtering.

Check out the demo here.

Monday, March 14, 2011

Why the Basis of the Universe Isn’t Matter or Energy—It’s Data

Enjoyed reading this interview in Wired magazine with noted science author James Gleick.

He quoted Claude Shannon's views on information:
A string of bits has a quantity, whether it represents something that’s true, something that’s utterly false, or something that’s just meaningless nonsense.
He also made a very succint comment about how he (and perhaps more of us) should look at new technology:

When people say that the Internet is going to make us all geniuses, that was said about the telegraph. On the other hand, when they say the Internet is going to make us stupid, that also was said about the telegraph. I think we are always right to worry about damaging consequences of new technologies even as we are empowered by them. History suggests we should not panic nor be too sanguine about cool new gizmos. There’s a delicate balance.

Here's the link to the interview.

Monday, February 14, 2011

NLP based Search in the middle of Man vs Machine battle

IBM's Watson takes on Jeopardy champions tonight.

Interesting article by Bruce Upbin

Here's a brief video to give you the background.

Looking forward to it.

Tuesday, February 1, 2011

IDC - Desperately Seeking Differentiation

I got a chance to attend an IDC event last week where they shared their predictions for the Software market and discussed how companies can establish differentiation in 2011 and beyond.

Key themes:
  1. Cloud/Near-Cloud - Differentiation by delivery model. [Not surprising. This is a well establish trend now.]
  2. Mobile apps/Platform - Differentiation by Platform. [This is a significant trend and one whose impact will be felt for a number of years as a lot of applications will be re-built for the mobile platform. And not necessarily by the original owners. This will be very disruptive.]
  3. Social Business - Differentiation by business process. [This is very interesting one as long as you are willing to open your mind to ideas from other people. Very disruptive internally, because you cannot control what/when/why conversations are happening - good or bad. So for paranoid people - this isn't good. But for open, progressive orgaizations, this can be very productive.]
  4. Analytics/Big Data - Differentiation by Content/Information. [This is a logical progression. With content overload a well established issue, context-sensitive information packaged appropriately will definitely be well received by employees - especially senior executives.]
We'll share our perspectives in coming weeks and in the meantime would love to hear your predictions for 2011-2014.

Saturday, January 29, 2011

Gartner publishes "Top 10 Technology Trends in Information Infrastructure in 2011"

Key Trends that are more germane to areas where we can help our customers:

- Social Search
- Content analytics
- Content Integration

You can download the entire report here.

2010 was very interesting. Companies started to realize that the "Voice of the Customer" can be heard in forums not controlled by the company. Can't control the format, can't control the timing, can't control the veracity - but - cannot ignore the content. If you want a leading indicator of customer sentiments about your products, brand or people - keep tabs on the social media forums - blogs, facebook, twitter, etc. The unstructured nature of this content has put a big question mark around the efficacy of existing IT strategy and investments in Master Data Management. These systems will need to evolve in 2011.

As the content continues to grow exponentially, context-aware applications will become more useful for customers. In addition, content analytics will provide better guidelines for content producers, aggregators and consumers.

Should make for an interesting 2011.