Name: DeepHarvester,

Text: BrightPlanetm




- '


Harvest "unknown and hidden"
content from the Deep Web.
For over ten years, BrightPlanet has been the
pioneer in harvesting high quality unstructured
content from the Deep Web and then making
it accessible for those who need the valuable,
untapped resources that lie beneath the
Surface Web for deep research and analysis.
The company has more than six years of
experience working with the United States
Intelligence Community (IC) in its
'War on Terror' to target and access data
hidden beyond the reach of typical Web
search engines. Now, BrightPlanet is bringing
its patented Deep Web Hdng
and Deep Analytics Sdutions to the commercial

Deep Web
Deep Web Content Silos

BrightPknet harvests )opic-spedfic content R Deep Web Silos
for research, +

monitoring and tracking.

Visit and learn more about:
Deep Web Silos - Find, harvest and normalize "unknown and hidden", unstructured

content from the Deep Web for research and analytics.
Deep Web Monitor - Target, Monitor, Track & get Alerts from targeted web sites, blogs,

tweets, reports, newsfeeds and more.
Dashboard - Employ 3rd party technology options from the OpenRanet

Platform for enriched visualization, entity extraction, storage, etc., to create full scope
analytic solutions.
~ e e ~ ~ a t v e sWorkbench
- Integrate our standalone, lightweight DeepHarvester
Workbench behind your firewall as part of an enterprise solution for high security


License a Workbench for
Hands-on Control of your
Deep Web ~awestini.
BrightPlanet's ~ e e ~ ~ a r v e sWorkbench
provides the most comprehensive
and content normalization system on the
market today-at the scale of the internet. I
BrightPlanet offers the DeepHarvester
Workbench for those who need to be in
control of their own harvesting needs
within their own infrastructure.
The Workbench can be used as a standalone
lightweight Web user interface or can be tightly
integrated into a custom or enterprise solution
through the open~lanet@



Deep Web


Proprietary Silos

For added security, Eccnse the Workbench
for use behind your firmmu.

BrightPlanet has developed a patented, heuristic, rule-based expert system for automatically communicating
with Deep Web sources that does not require one-off scripts to be built by hand. The DeepHawester
Workbench is wrapped around Brightplanet's flagship DeepHarvester Platform, which has been
developed and refined over the past 10 years. The DeepHarvester features include:

+ Harvesting from sites that require the use of a query + Harvesting inline content options:
images, CSS and JavaScript files
+ Integratingwith internally Deep Web sources
+ Supporting proxy servers:
+ Leveraging existing surface web search engines
+ Harvesting using traditional crawl or surface web techniques anonymization through 3rd party solutions
+ Providing a multi-thread harvest engine
+ Harvesting links through RSS feeds
built on a distributed platform
+ Supportingstandard crawler features:
+ Accessing the openplanet@Platform:
Optionally honoring Robots.txt rules
custom normalization, andytics and storage
Customizable user-agent tags
with Brightplanet's Deep Web
Timeout and redirect limit settings
Support for session cookies
+ Harvesting and profile management:
+ Integratingwith internal Deep Web soarces
Java, Web Services or RMI API
+ Scripting custom source option
BrighPhnet has provided thee-year licenses ofthe ~ c e p ~ a r v e s t e rto@US.Governmentagencies, &allowingthem
to harvest behind their$rewaILr. While BnjghtPhnct highly recommmdr using its Content Navigators, our experienced
personnel, to navigate and harvest the Deep Web as well as optimia Deep Web Content Silos, it is nmu phased to make
the DeepHarvester Workbenchoption avaiLble to the commercial market.

in Harvesting h.-I!:'

A -

Access Data Haweskd from
the Deep Web for Research
and Analysis.
Deep Web Silos are repositories for topic-specific
content harvested horn the Deep Web, available for
either lease or purchase for Deep analytics. Each Silo
is organized
to hold documents relevant to a specific
topic area and all unstructured content is normalized,
for Deep Web analytics.
tagged (enriched) and or@
Using the Brightplanet developed standard portal
interface, you can dccm volumes of new datafir research
purposes or integration with your existing analytic
technology solutions. Unlike a traditional search engine
that undermines its results with links merely based on
popularity, each Deep Web Silo is filled with high-quality
content from topic-specific Deep Web sources that lie
beyond the reach of traditional search engines.

Deep Web Content Sil


~ ham*
h ~ topicspecific
h ~ ~ ~
web silos
for research, analysis, monitoring and tracking.

Content Navigators

BrightPlanet's experienced Content Navigators can create, manage, and maintain Deep Web Silos as a service, thereby
leveraging for its customers the complexity of harvesting, normalizing, enriching and qualifying content. Working with
your subject matter experts, BrightManet's Content Navigators will identify and configure the necessary Deep Web sources,
construct all harvest profiles, schedule automated harvests, and tune analytic enrichment technologies to your specific project
needs. Our refined process makes the best quality content readily available for your analysts and analytic technologies as
our Brightplanet's Content Navigators move quickly to stand up a custom Deep Web Silo specific to your business needs.
Afrer creating Deep Web Silos, BrightPlanet offers two options for accessing enriched content:
l) lease of Silo access, or
2) purchase of Silo as a proprietary resource
Brightplanet will host content for all Deep Web Silos or install the Silo at the customer's facility.
Multiple Uses
Deep Web Silo usages vary by customer. Some customers may want a simple enterprise search interface to easily leverage
harvested Web content. Some may choose to integrate the Silo with proprietary content to offer an even richer analytic
opportunity. Still others may need hrther enriched content to rune or train analytic technologies on a topic-specific
training set. No matter what the use, BrightPlanet will build the right Deep Web Silo for your content requirements.
flexible, Scalable Technology
Deep Web Silos can be accessed using industry standard protocols, making them easy to integrate with existing content or
custom solutions. Common solutions include a standard enterprise search portal interface, JDBC access, Apache Solr interface,
direct access through MySQL Schema, MarkLogic@,Saffron Sierram,and even standard XML.

For many customers, the standard enterprise search portal interface will offer enough flexibility to begin leveraging the value
of Deep Web content without extensive integration or IT costs. Our standard search enterprise portal interface indudes
keyword search, faceted searching, clustering and various optional modules through a simple-to-use, Google-like interface.
Situations that require a more robust storage, enriched visualization, analytics, or non-standard protocols can always be
implemented through options within Brightplanet's ~ ~ e n ~ & n Dasbboardsolution.









Pioneers in Harvehg -'-L

The Deep Web Monitor is a user-driven solution
that allows you to set up your own monitoring harvests
through a simple-to-use Web interface. If your research
requires constant attention, the Deep Web Monitor will
harvest content from sources YOU target. You'll always
be assured of updates to the latest information from
websites, message boards, blogs, tweets, reports,
and news feeds from wherever it is on the Web.
Deep Web Monitor will run your profiles on an
automated schedule to make sure you have constant
access to new information for deep intelligence,
competitive analysis, intellectual property, reputation
management, company activities, people profiles,
organizational intrigue and the latest trends.

Set up your own Web monitoring


Target Organizations or Companies -Stay on top of any company, organization, or group what they're doing, what they're saying and what's being said about them. Know the biases of all
the players by deploying additional analytics to detect their sentiments, relationships, and associations.
Target Products, Technologies or Competitive Analysis -Using product names, models or other
unique characteristics to scrutinize their trend-lines, market share, enhancements, R&D and pitfalls.
Additional analytics are available to help identify the sentiments or relationships of the relevant sources.
Target Intellectual Property Research -The Monitor can be a sentinel for monitoring the competition,
product and technology updates, patent applications, invalidity reports, prior art, inventors, news, blogs,
third party threats, trade rags, and more.

Set-it-and-forget-it scheduling
Harvest from specific sites of interest
Focus on mission-critical business topics
View content trends over time
Filter only new or modified content
Display statistical tracking reports for changes over time

Harvesting the Deep Web for Analytics


Deep Web

Tag, Tune, Normalize &
Enrich "Unstructured"
Content to
"Semi-Structured" for

Beyond the Surface
Any Content Type
Any Source
Any Language

Faceted Search
Topic Clustering
Semantic Categorization
Link Analysis
(3d Party Options)

Document Path: ["brochure516.pdf"]


Click to send permalink to address bar, or right-click to copy permalink.

Un-highlight all Un-highlight selectionu Highlight selectionh