Corpus feature list
|
|
In short
Corpus is
a search and structured information engine that provide:
-
XML interface to information queries
suitable for most Web and application needs
-
Easy to use administration
via the built in Corpus web server
-
Platform independence by
using Java 1.4 as platform and HTTP/XML as interface!
|
The big list
This is one
list of functions provided with Corpus ? any suggestions or questions can be
posted to: info@illuminet.se
And please
do ask us, since we are constantly working on new functions and features to
meet your needs!
Queries
To better understand your search needs, we offer language adaptive and
context sensitive searching - more sensitive than Google(tm) or Altavista(tm).
-
Near phrase matching finds
pages with word sequences near the query
-
Boolean search with +/- for
excluding or exclusively finding pages with words
-
Metadata queries with boolean
expressions such as ?author:Andrew?
-
Filtering for regular
expressions or date/metadata ranges
-
Sorting by property/metadata in
alphabetical or numerical order (and reverse)
-
Using conceptual networks for
statistical searching
-
API and open infrastructure to
explore dictionary and index information
Language and linguistics
-
Stemming for increased
precision on word-forms (Swedish, English etc..)
-
Spelling alternatives and
suggestions sorted by word-frequency
-
Frequency and inference
statistics
-
Dictionaries for more than 15
languages
Document formats
-
HTML-2,3,4/XHTML And some
scripts
-
XML with open
XSLT-infrastructure for easy access
-
Microsoft Office formats using
MS-Components or wvWare with structured OLE2-format parser
-
PDF parsing using PDF-to-HTML
converter
-
Add any other converter by
command prompt extension
-
Add any other converter by open
Corpus API
Agents
-
Regular harvesting spider
(multi thread)
-
Web log (Used with WEB-DAV)
-
File system (act on changes)
-
Summarizer - automatic short descriptions
-
LDAP-Spider - create links connecting people, roles and documents
-
Ref Parser (heuristics agent)
-
Remote Warehouse (version control of
remote information)
-
Error-reporting - keep track of links and syntax (using mail or web-page)
-
Console (for interfacing with
corpus output or debug)
-
API for any new agents
Interfaces
-
HTTP/XML
-
HTTP/RSS-Query (includes
overviews, spelling and concept hints)
-
HTTP/RDF-Query
-
HTTP/Dictionary
-
RMI/Query
-
RMI/RemoteCorpus
-
Java/CorpusService
(Servlet-like-runtime-pluggable-services)
-
Command line
-
Application Programmers
Interface (java doc etc)
Runtime environment
We seek to accommodate stability, robustness, platform independence and performance with our server. All of this is included in the environment.
-
Java 1.4 server environment
(some Corpus versions also support earlier versions of java)
-
Nt-service or Unix sys-V script
-
Adjustable system logging
-
Access control and
users
-
Massive built in storage and
database solution
-
Optimized cache and object persistence
Administration
-
Command line interaction
-
Web-based services
view screen-shots!
-
Dictionary control
(list/add/remove)
-
Domain control
(list/add/remove)
-
Filter control
(list/add/remove)
-
Agents control
(list/add/remove)
-
Web-explorer