aboutsummaryrefslogtreecommitdiffhomepage
path: root/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt
diff options
context:
space:
mode:
Diffstat (limited to 'data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt')
-rw-r--r--data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt600
1 files changed, 0 insertions, 600 deletions
diff --git a/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt b/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt
deleted file mode 100644
index f8803be8..00000000
--- a/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt
+++ /dev/null
@@ -1,600 +0,0 @@
-SISU - SEARCH,
-RALPH AMISSAH
-*****************************
-
-SISU SEARCH
-===========
-
-1. SISU SEARCH - INTRODUCTION
------------------------------
-
-*SiSU* output can easily and conveniently be indexed by a number of standalone
-indexing tools, such as Lucene, Hyperestraier.
-
-
-Because the document structure of sites created is clearly defined, and the
-text object citation system is available hypothetically at least, for all forms
-of output, it is possible to search the sql database, and either read results
-from that database, or just as simply map the results to the html output, which
-has richer text markup.
-
-
-In addition to this *SiSU* has the ability to populate a relational sql type
-database with documents at an object level, with objects numbers that are
-shared across different output types, which make them searchable with that
-degree of granularity. Basically, your match criteria is met by these documents
-and at these locations within each document, which can be viewed within the
-database directly or in various output formats.
-
-
-2. SQL
-------
-
-2.1 POPULATING SQL TYPE DATABASES
-.................................
-
-*SiSU* feeds sisu markupd documents into sql type databases PostgreSQL[^1]
-and/or SQLite[^2] database together with information related to document
-structure.
-
-
-- [1]: <http://www.postgresql.org/>
-
-- <http://advocacy.postgresql.org/>
-
-- <http://en.wikipedia.org/wiki/Postgresql>
-
-- [2]: <http://www.hwaci.com/sw/sqlite/>
-
-- <http://en.wikipedia.org/wiki/Sqlite>
-
-This is one of the more interesting output forms, as all the structural data of
-the documents are retained (though can be ignored by the user of the database
-should they so choose). All site texts/documents are (currently) streamed to
-four tables:
-
-
- * one containing semantic (and other) headers, including, title, author,
- subject, (the Dublin Core...);
-
-
- * another the substantive texts by individual "paragraph" (or object) - along
- with structural information, each paragraph being identifiable by its
- paragraph number (if it has one which almost all of them do), and the
- substantive text of each paragraph quite naturally being searchable (both in
- formatted and clean text versions for searching); and
-
-
- * a third containing endnotes cross-referenced back to the paragraph from
- which they are referenced (both in formatted and clean text versions for
- searching).
-
-
- * a fourth table with a one to one relation with the headers table contains
- full text versions of output, eg. pdf, html, xml, and ascii.
-
-
-There is of course the possibility to add further structures.
-
-
-At this level *SiSU* loads a relational database with documents chunked into
-objects, their smallest logical structurally constituent parts, as text
-objects, with their object citation number and all other structural information
-needed to construct the document. Text is stored (at this text object level)
-with and without elementary markup tagging, the stripped version being so as to
-facilitate ease of searching.
-
-
-Being able to search a relational database at an object level with the *SiSU*
-citation system is an effective way of locating content generated by *SiSU*. As
-individual text objects of a document stored (and indexed) together with object
-numbers, and all versions of the document have the same numbering, complex
-searches can be tailored to return just the locations of the search results
-relevant for all available output formats, with live links to the precise
-locations in the database or in html/xml documents; or, the structural
-information provided makes it possible to search the full contents of the
-database and have headings in which search content appears, or to search only
-headings etc. (as the Dublin Core is incorporated it is easy to make use of
-that as well).
-
-
-3. POSTGRESQL
--------------
-
-3.1 NAME
-........
-
-*SiSU* - Structured information, Serialized Units - a document publishing
-system, postgresql dependency package
-
-
-3.2 DESCRIPTION
-...............
-
-Information related to using postgresql with sisu (and related to the
-sisu_postgresql dependency package, which is a dummy package to install
-dependencies needed for *SiSU* to populate a postgresql database, this being
-part of *SiSU* - man sisu).
-
-
-3.3 SYNOPSIS
-............
-
- sisu -D [instruction] [filename/wildcard if required]
-
-
- sisu -D --pg --[instruction] [filename/wildcard if required]
-
-
-3.4 COMMANDS
-............
-
-Mappings to two databases are provided by default, postgresql and sqlite, the
-same commands are used within sisu to construct and populate databases however
--d (lowercase) denotes sqlite and -D (uppercase) denotes postgresql,
-alternatively --sqlite or --pgsql may be used
-
-
-*-D or --pgsql* may be used interchangeably.
-
-
-3.4.1 CREATE AND DESTROY DATABASE
-.................................
-
-*--pgsql --createall*
-initial step, creates required relations (tables, indexes) in existing
-(postgresql) database (a database should be created manually and given the same
-name as working directory, as requested) (rb.dbi)
-
-
-*sisu -D --createdb*
-creates database where no database existed before
-
-
-*sisu -D --create*
-creates database tables where no database tables existed before
-
-
-*sisu -D --Dropall*
-destroys database (including all its content)! kills data and drops tables,
-indexes and database associated with a given directory (and directories of the
-same name).
-
-
-*sisu -D --recreate*
-destroys existing database and builds a new empty database structure
-
-
-3.4.2 IMPORT AND REMOVE DOCUMENTS
-.................................
-
-*sisu -D --import -v [filename/wildcard]*
-populates database with the contents of the file. Imports documents(s)
-specified to a postgresql database (at an object level).
-
-
-*sisu -D --update -v [filename/wildcard]*
-updates file contents in database
-
-
-*sisu -D --remove -v [filename/wildcard]*
-removes specified document from postgresql database.
-
-
-4. SQLITE
----------
-
-4.1 NAME
-........
-
-*SiSU* - Structured information, Serialized Units - a document publishing
-system.
-
-
-4.2 DESCRIPTION
-...............
-
-Information related to using sqlite with sisu (and related to the sisu_sqlite
-dependency package, which is a dummy package to install dependencies needed for
-*SiSU* to populate an sqlite database, this being part of *SiSU* - man sisu).
-
-
-4.3 SYNOPSIS
-............
-
- sisu -d [instruction] [filename/wildcard if required]
-
-
- sisu -d --(sqlite|pg) --[instruction] [filename/wildcard if required]
-
-
-4.4 COMMANDS
-............
-
-Mappings to two databases are provided by default, postgresql and sqlite, the
-same commands are used within sisu to construct and populate databases however
--d (lowercase) denotes sqlite and -D (uppercase) denotes postgresql,
-alternatively --sqlite or --pgsql may be used
-
-
-*-d or --sqlite* may be used interchangeably.
-
-
-4.4.1 CREATE AND DESTROY DATABASE
-.................................
-
-*--sqlite --createall*
-initial step, creates required relations (tables, indexes) in existing
-(sqlite) database (a database should be created manually and given the same
-name as working directory, as requested) (rb.dbi)
-
-
-*sisu -d --createdb*
-creates database where no database existed before
-
-
-*sisu -d --create*
-creates database tables where no database tables existed before
-
-
-*sisu -d --dropall*
-destroys database (including all its content)! kills data and drops tables,
-indexes and database associated with a given directory (and directories of the
-same name).
-
-
-*sisu -d --recreate*
-destroys existing database and builds a new empty database structure
-
-
-4.4.2 IMPORT AND REMOVE DOCUMENTS
-.................................
-
-*sisu -d --import -v [filename/wildcard]*
-populates database with the contents of the file. Imports documents(s)
-specified to an sqlite database (at an object level).
-
-
-*sisu -d --update -v [filename/wildcard]*
-updates file contents in database
-
-
-*sisu -d --remove -v [filename/wildcard]*
-removes specified document from sqlite database.
-
-
-5. INTRODUCTION
----------------
-
-5.1 SEARCH - DATABASE FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES,
-INCLUDING OBJECT CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL)
-..............................................................................
-
-Sample search frontend [link:] <http://search.sisudoc.org> [^3] A small
-database and sample query front-end (search from) that makes use of the
-citation system, _object citation numbering_ to demonstrates functionality.[^4]
-
-
-- [3]: <http://search.sisudoc.org>
-
-- [4]: (which could be extended further with current back-end). As regards scaling
- of the database, it is as scalable as the database (here Postgresql) and
- hardware allow.
-
-*SiSU* can provide information on which documents are matched and at what
-locations within each document the matches are found. These results are
-relevant across all outputs using object citation numbering, which includes
-html, XML, LaTeX, PDF and indeed the SQL database. You can then refer to one of
-the other outputs or in the SQL database expand the text within the matched
-objects (paragraphs) in the documents matched.
-
-
-Note you may set results either for documents matched and object number
-locations within each matched document meeting the search criteria; or display
-the names of the documents matched along with the objects (paragraphs) that
-meet the search criteria.[^5]
-
-
-- [5]: of this feature when demonstrated to an IBM software innovations evaluator
- in 2004 he said to paraphrase: this could be of interest to us. We have large
- document management systems, you can search hundreds of thousands of documents
- and we can tell you which documents meet your search criteria, but there is no
- way we can tell you without opening each document where within each your
- matches are found.
-
-*sisu -F --webserv-webrick*
-builds a cgi web search frontend for the database created
-
-
-The following is feedback on the setup on a machine provided by the help
-command:
-
-
- sisu --help sql
-
-
-
- Postgresql
- user: ralph
- current db set: SiSU_sisu
- port: 5432
- dbi connect: DBI:Pg:database=SiSU_sisu;port=5432
- sqlite
- current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db
- dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db
-
-Note on databases built
-
-
-By default, [unless otherwise specified] databases are built on a directory
-basis, from collections of documents within that directory. The name of the
-directory you choose to work from is used as the database name, i.e. if you are
-working in a directory called /home/ralph/ebook the database SiSU_ebook is
-used. [otherwise a manual mapping for the collection is necessary]
-
-
-5.2 SEARCH FORM
-...............
-
-*sisu -F*
-generates a sample search form, which must be copied to the web-server cgi
-directory
-
-
-*sisu -F --webserv-webrick*
-generates a sample search form for use with the webrick server, which must be
-copied to the web-server cgi directory
-
-
-*sisu -Fv*
-as above, and provides some information on setting up hyperestraier
-
-
-*sisu -W*
-starts the webrick server which should be available wherever sisu is properly
-installed
-
-
-The generated search form must be copied manually to the webserver directory as
-instructed
-
-
-6. HYPERESTRAIER
-----------------
-
-See the documentation for hyperestraier:
-
-
- <http://hyperestraier.sourceforge.net/>
-
-
- /usr/share/doc/hyperestraier/index.html
-
-
- man estcmd
-
-
-on sisu_hyperestraier:
-
-
- man sisu_hyperestraier
-
-
- /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html
-
-
-NOTE: the examples that follow assume that sisu output is placed in the
-directory /home/ralph/sisu_www
-
-
-(A) to generate the index within the webserver directory to be indexed:
-
-
- estcmd gather -sd [index name] [directory path to index]
-
-
-the following are examples that will need to be tailored according to your
-needs:
-
-
- cd /home/ralph/sisu_www
-
-
- estcmd gather -sd casket /home/ralph/sisu_www
-
-
-you may use the 'find' command together with 'egrep' to limit indexing to
-particular document collection directories within the web server directory:
-
-
- find /home/ralph/sisu_www -type f | egrep
- '/home/ralph/sisu_www/sisu/.+?.html$' |estcmd gather -sd casket -
-
-
-Check which directories in the webserver/output directory (~/sisu_www or
-elsewhere depending on configuration) you wish to include in the search index.
-
-
-As sisu duplicates output in multiple file formats, it it is probably
-preferable to limit the estraier index to html output, and as it may also be
-desirable to exclude files 'plain.txt', 'toc.html' and 'concordance.html', as
-these duplicate information held in other html output e.g.
-
-
- find /home/ralph/sisu_www -type f | egrep
- '/sisu_www/(sisu|bookmarks)/.+?.html$' | egrep -v '(doc|concordance).html$'
- |estcmd gather -sd casket -
-
-
-from your current document preparation/markup directory, you would construct a
-rune along the following lines:
-
-
- find /home/ralph/sisu_www -type f | egrep '/home/ralph/sisu_www/([specify
- first directory for inclusion]|[specify second directory for
- inclusion]|[another directory for inclusion? ...])/.+?.html$' | egrep -v
- '(doc|concordance).html$' |estcmd gather -sd /home/ralph/sisu_www/casket -
-
-
-(B) to set up the search form
-
-
-(i) copy estseek.cgi to your cgi directory and set file permissions to 755:
-
-
- sudo cp -vi /usr/lib/estraier/estseek.cgi /usr/lib/cgi-bin
-
-
- sudo chmod -v 755 /usr/lib/cgi-bin/estseek.cgi
-
-
- sudo cp -v /usr/share/hyperestraier/estseek.* /usr/lib/cgi-bin
-
-
- [see estraier documentation for paths]
-
-
-(ii) edit estseek.conf, with attention to the lines starting 'indexname:' and
-'replace:':
-
-
- indexname: /home/ralph/sisu_www/casket
-
-
- replace: ^file:///home/ralph/sisu_www{!} [link:] http://localhost
-
-
- replace: /index.html?${{!}}/
-
-
-(C) to test using webrick, start webrick:
-
-
- sisu -W
-
-
-and try open the url: <http://localhost:8081/cgi-bin/estseek.cgi>
-
-
-DOCUMENT INFORMATION (METADATA)
-*******************************
-
-METADATA
---------
-
-Document Manifest @
-<http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html>
-
-
-*Dublin Core* (DC)
-
-
-/DC tags included with this document are provided here./
-
-
-DC Title: _SiSU - Search_
-
-
-DC Creator: _Ralph Amissah_
-
-
-DC Rights: _Copyright (C) Ralph Amissah 2007, part of SiSU documentation,
-License GPL 3_
-
-
-DC Type: _information_
-
-
-DC Date created: _2002-08-28_
-
-
-DC Date issued: _2002-08-28_
-
-
-DC Date available: _2002-08-28_
-
-
-DC Date modified: _2007-09-16_
-
-
-DC Date: _2007-09-16_
-
-
-*Version Information*
-
-
-Sourcefile: _sisu_search._sst_
-
-
-Filetype: _SiSU text insert 0.58_
-
-
-Sourcefile Digest, MD5(sisu_search._sst)= _c085c2eb6d68f1b7d50435f673ede407_
-
-
-Skin_Digest:
-MD5(/home/ralph/grotto/theatre/dbld/builds/sisu/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)=
-_20fc43cf3eb6590bc3399a1aef65c5a9_
-
-
-*Generated*
-
-
-Document (metaverse) last generated: _Tue Sep 25 02:54:29 +0100 2007_
-
-
-Generated by: _SiSU_ _0.59.1_ of 2007w39/2 (2007-09-25)
-
-
-Ruby version: _ ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]_
-
-
-
-==============================================================================
-
- title: SiSU - Search
-
- creator: Ralph Amissah
-
- rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation,
- License GPL 3
-
- type: information
-
- subject: ebook, epublishing, electronic book, electronic publishing,
- electronic document, electronic citation, data structure,
- citation systems, search
-
- date.created: 2002-08-28
-
- date.issued: 2002-08-28
-
- date.available: 2002-08-28
-
- date.modified: 2007-09-16
-
- date: 2007-09-16
-
-
-
-
-
-==============================================================================
-nil
-
-Other versions of this document:
-manifest:
- http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html
-html:
- http://www.jus.uio.no/sisu/sisu_search/toc.html
-pdf:
- http://www.jus.uio.no/sisu/sisu_search/portrait.pdf
- http://www.jus.uio.no/sisu/sisu_search/landscape.pdf
-plaintext (plain text):
- http://www.jus.uio.no/sisu/sisu_search/plain.txt
-at:
- http://www.jus.uio.no/sisu
-* Generated by: SiSU 0.59.1 of 2007w39/2 (2007-09-25)
-* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
-* Last Generated on: Tue Sep 25 02:54:30 +0100 2007
-* SiSU http://www.jus.uio.no/sisu