Connectors are the heart and soul of Federated Search (FS)
engines and with the rise in importance of FS in today’s fast paced, Big Data,
analyze everything world, they are crucial to smooth and efficient data
virtualization and flow. MuseGlobal
has been building Connectors, and the architecture to use them (the Muse/ICE
platform) and maintain and support them (the Muse Source Factory) for over 12
years. The people who design and build Connectors must be both computer savvy,
and also have a deep understanding of data and information and its myriad
formulations.
This second in the series of posts looks at the problems
arising as data is needed from outside the enterprise, and the complexities of
access and extraction that result. Not surprisingly, as a leading FS platform
Muse and its ecosystem are in the forefront of providing solutions to data
complexity problems in the modern world. (The first post considers the growing
importance of being able to access data from inside an organization.)
Part 2 A
broader perspective
All of this speaks to the volume and velocity (of change)
of the data – two of the trio of defining “v”s of Big Data. The third v is
variety and this is now encompassing much more than the internal data silos of
the enterprise. Increasingly decisions need to take account of the outside
world: competitors, news media, commentators and analysts, customer feedback,
social postings and tweets.
Most of these sources are also fleeting. Customer records
will last for years, a tweet is gone in 9 days. Even product reviews are only
relevant until the next version of the product is released. And there are
another couple of additional hurdles to jump to get this valuable “perspective”
data.
This data lives outside the enterprise. Some other person
or organization has control of it. And that means the old ETL trick of grabbing
everything is likely to be severely frowned on – especially if it is tried
every night. Commercial considerations mean that, if this data is valuable to
you, then it is valuable to others, and the owners will not let you have it all
for free. This means the strategy of asking for exactly what is needed is the
way to go. It takes less time everywhere, will cost less in processing and
transmission, will cost less in data license fees, and will not alienate
valuable data sources. So “sipping gently” is the way to go.
Yes, in the paragraph above you saw “fees” mentioned.
Once the commercial details have been sorted out, there is still the tricky
technical matter of getting access through the paywall to the data you need,
and are entitled to. Some services will provide some of the data you want for
free, but most will require authenticated access even of there is no charge
Those who are selling their data will certainly want to know that you are a
legitimate user, and be sure you are getting what you have paid for – and no
more.
For both of these considerations Federated Search
engines, especially in their harvesting mode allow all the “virtual data” to
become yours when you need it. Access control is one of the mainstays of the
better FS systems to ensure just this fair use of data. And gentle sipping for
just the required data is their whole purpose. Again a tool for the task
arises. MuseGlobal runs a Content Partner Program to ensure we deal fairly and
accurately with the data we retrieve from the thousands of sources we can
connect to, both technically and as a matter of respecting the contractual
relationship between the provider and consumer. We are the Switzerland of data
access – totally neutral and scrupulously fair, and secure.
Complexity everywhere
So now you are accessing internal and external data for
your BI reports. Unfortunately, while you might have a nice clean Master Data Managed
situation in your company, it is not the one the external data sources are
using (not unless you are Walmart or GM and can impose your will on your
suppliers, that is). And this means the analysis will be pretty bad unless you
can get internal product codes to match to popular names in posts and tweets.
There is a world of semantic hurt lurking here.
You need tools. Fortunately the Federated Search engine
you are now employing to gather your virtual data is able to help. Data
re-formatting, field level semantics, content level semantics, controlled
ontologies, normalized forms, content merging and de-merging, enumeration,
duplicate control, all these are tools within the FS system. They are powerful
tools and they are very precise, and they come with a health warning: “This
Connector is for use with this source only”.
Connectors are built, and maintained, very specifically
for a single Target. They know all about that target, from its communications
protocol to the abbreviations it uses in the data. Thus they produce the
deepest possible data extraction possible. And can deliver that data in a
consistent format suited to the Data Model and systems which are going to use
it. They are data transformers extraordinaire. This contrasts with crawlers at
the other end of the scale where the aim is to get a simple sufficiency of data
to handle keyword indexing.
This precision means that they are in need of “tuning”
whenever their target changes in some way. Major changes like access protocols
are rare, but a website changing the layout of its reviews is common and
frequent. Complexity like this is handled by a “tools infrastructure” for the
FS engine whereby testing, modification, testing again, and deployment are
highly automated actions, reducing the human input to the problem solving, not
the rote.
And now another wrinkle: some of the data needed for the
analysis is not contained in the records you retrieve, and the only way to
determine this is to examine those records and then go and get it. As a simple
example think of a tweet which references a blog post. The tweet has the link,
but not the content of the post. For a meaningful analysis, you need that
original post. Fortunately the better FS systems have a feature called
enhancement which allows for just this possibility. It allows the system to
build completely virtual records from the content of others. Think more deeply
of a hospital patient record. This will have administrative details, but no
financial data, no medical history notes, not results of blood tests, no scans,
no operation reports, no list of past and current drugs. And even if you gather
all this, the list of drugs will not include their interactions, so there could
be more digging to do. A properly configured and authenticated FS system will
deliver this complete record.
Analysis these days is more than just a list of what
people said about your product. It involves demographics and sentiment, and timeliness
and location. All these can come from a good analysis engine – if it has the
raw data to work from. Enhanced virtual records from a wide spectrum of sources
will give a lot, but making the connections may not be that simple. We
mentioned above “official” and popular product names and the need to reconcile
them. Think for a moment of drug names. Fortunately a good FS system can do a
lot of this thinking for you, and your analytics engine. Extraction of entities
by mining the unstructured text of reviews and posts and news article and
scientific literature allows them to be tagged so that the analysis recognizes
the sameness of them. Good FS engines will allow this to a degree. Better ones
will also allow that a specialist text miner can be incorporated in the
workflow and give each record its special treatment – all invisibly to the BI
system asking for the data.
Partnership at last
There is a lot of data out there, and a great deal of it
is probably very useful to you and your company. Using the correct analysis
engines and Federated Search “feeding” tools enables that data to be brought
together in a flexible, efficient, and accurate manner to give the information
needed for informed decisions.
Federated Search is still a very powerful and effective
way to search for humans, but it has grown up to be one of the most effective
tools for systems integration, the breaking down of corporate silos of data,
and the incorporation of data from the whole Internet into a unified, useable
data set to create real knowledge.
Muse is one of those tools which can supply the complete
range from end user fed search portals, to embedded data virtualization, and we
intend to keep up with the next turn of data events.
206 comments:
«Oldest ‹Older 201 – 206 of 206Are you getting QuickBooks Error 15311? This error mostly occurs when the users of the software can’t refresh QuickBooks properly. This error appears through application establishment when QB related software is running while Windows is starting or shutting down or during the installation of QuickBooks accounting software.
startup, new projects aiming to design, develop and manufacture their products with ISO, DIN WEb :- Interlocking PVC tiles
IVC-Services is a private Consulting Agency, which caters various professional services related to various Countries and Charges a Consulting fee.
Web :- emirates visa online
Reyada Medical Center in Doha, Qatar is a JCI accredited multi-specialty hospital and also known as dental clinic in Qatar
Are you seeking a dynamic and skilled workforce to drive your business to new heights? Look no further than Balaji Manpower Recruitment, Manpower Consultancy In India
best gynaecologist in gurgaon
Post a Comment