From 50d45c6deb0afd2e4222d2e33a45487a9d1fa676 Mon Sep 17 00:00:00 2001 From: Ralph Amissah Date: Sun, 23 Sep 2007 05:16:21 +0100 Subject: primarily todo with sisu documentation, changelog reproduced below: * start documenting sisu using sisu * sisu markup source files in data/doc/sisu/sisu_markup_samples/sisu_manual/ /usr/share/doc/sisu/sisu_markup_samples/sisu_manual/ * default output [sisu -3] in data/doc/manuals_generated/sisu_manual/ /usr/share/doc/manuals_generated/sisu_manual/ (adds substantially to the size of sisu package!) * help related edits * manpage, work on ability to generate manpages, improved * param, exclude footnote mark count when occurs within code block * plaintext changes made * shared_txt, line wrap visited * file:// link option introduced (in addition to existing https?:// and ftp://) a bit arbitrarily, diff here, [double check changes in sysenv and hub] * minor adjustments * html url match refinement * css added tiny_center * plaintext * endnotes fix * footnote adjustment to make more easily distinguishable from substantive text * flag -a only [flags -A -e -E dropped] controlled by modifiers --unix/msdos --footnote/endnote * defaults, homepage * renamed homepage (instead of index) implications for modifying skins, which need likewise to have any homepage entry renamed * added link to sisu_manual in homepage * css the css for the default homepage is renamed homepage.css (instead of index.css) [consider removing this and relying on html.css] * ruby version < ruby1.9 * place stop on installation and working with for now [ruby String.strip broken in ruby 1.9.0 (2007-09-10 patchlevel 0) [i486-linux], 2007-09-18:38/2] * debian/control restrict use to ruby > 1.8.4 and ruby < 1.9 * debian * debian/control restrict use to ruby > 1.8.4 and ruby < 1.9 * sisu-doc new sub-package for sisu documentation debian/control and sisu-doc.install --- .../sisu_manual/sisu_search/scroll.xhtml | 905 +++++++++++++++++++++ 1 file changed, 905 insertions(+) create mode 100644 data/doc/manuals_generated/sisu_manual/sisu_search/scroll.xhtml (limited to 'data/doc/manuals_generated/sisu_manual/sisu_search/scroll.xhtml') diff --git a/data/doc/manuals_generated/sisu_manual/sisu_search/scroll.xhtml b/data/doc/manuals_generated/sisu_manual/sisu_search/scroll.xhtml new file mode 100644 index 00000000..9dd7445f --- /dev/null +++ b/data/doc/manuals_generated/sisu_manual/sisu_search/scroll.xhtml @@ -0,0 +1,905 @@ + + + + + + + + Title: + + SiSU - SiSU information Structuring Universe - Search [0.58] + +
+ Creator: + + Ralph Amissah + +
+ Rights: + + Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3 + +
+ Type: + + information + +
+ Subject: + + ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search + +
+ Date created: + + 2002-08-28 + +
+ Date issued: + + 2002-08-28 + +
+ Date available: + + 2002-08-28 + +
+ Date modified: + + 2007-09-16 + +
+ Date: + + 2007-09-16 + +
+ + + + + SiSU - SiSU information Structuring Universe - Search [0.58],
+Ralph Amissah +
+ 1 +
+ + + SiSU Search + + 2 + + + + 1. SiSU Search - Introduction + + 3 + + + + SiSU output can easily and conveniently be indexed by a number +of standalone indexing tools, such as Lucene, Hyperestraier. + + 4 + + + + Because the document structure of sites created is clearly defined, and +the text object citation system is available hypothetically at least, +for all forms of output, it is possible to search the sql database, and +either read results from that database, or just as simply map the +results to the html output, which has richer text markup. + + 5 + + + + In addition to this SiSU has the ability to populate a +relational sql type database with documents at an object level, with +objects numbers that are shared across different output types, which +make them searchable with that degree of granularity. Basically, your +match criteria is met by these documents and at these locations within +each document, which can be viewed within the database directly or in +various output formats. + + 6 + + + + 2. SQL + + 7 + + + + 2.1 populating SQL type databases + + 8 + + + + SiSU feeds sisu markupd documents into sql type databases +PostgreSQL1 and/or SQLite2 database together with +information related to document structure. + + + 1. <http://www.postgresql.org/> +
<http://advocacy.postgresql.org/> +
<http://en.wikipedia.org/wiki/Postgresql> +
+ + 2. <http://www.hwaci.com/sw/sqlite/> +
<http://en.wikipedia.org/wiki/Sqlite> +
+ 9 +
+ + + This is one of the more interesting output forms, as all the structural +data of the documents are retained (though can be ignored by the user +of the database should they so choose). All site texts/documents are +(currently) streamed to four tables: + + 10 + + + + one containing semantic (and other) headers, including, title, +author, subject, (the Dublin Core...); + + 11 + + + + another the substantive texts by individual "paragraph" (or +object) - along with structural information, each paragraph being +identifiable by its paragraph number (if it has one which almost all of +them do), and the substantive text of each paragraph quite naturally +being searchable (both in formatted and clean text versions for +searching); and + + 12 + + + + a third containing endnotes cross-referenced back to the +paragraph from which they are referenced (both in formatted and clean +text versions for searching). + + 13 + + + + a fourth table with a one to one relation with the headers table +contains full text versions of output, eg. pdf, html, xml, and ascii. + + 14 + + + + There is of course the possibility to add further structures. + + 15 + + + + At this level SiSU loads a relational database with documents +chunked into objects, their smallest logical structurally constituent +parts, as text objects, with their object citation number and all other +structural information needed to construct the document. Text is stored +(at this text object level) with and without elementary markup tagging, +the stripped version being so as to facilitate ease of searching. + + 16 + + + + Being able to search a relational database at an object level with the +SiSU citation system is an effective way of locating content +generated by SiSU. As individual text objects of a document +stored (and indexed) together with object numbers, and all versions of +the document have the same numbering, complex searches can be tailored +to return just the locations of the search results relevant for all +available output formats, with live links to the precise locations in +the database or in html/xml documents; or, the structural information +provided makes it possible to search the full contents of the database +and have headings in which search content appears, or to search only +headings etc. (as the Dublin Core is incorporated it is easy to make +use of that as well). + + 17 + + + + 3. Postgresql + + 18 + + + + 3.1 Name + + 19 + + + + SiSU - Structured information, Serialized Units - a document +publishing system, postgresql dependency package + + 20 + + + + 3.2 Description + + 21 + + + + Information related to using postgresql with sisu (and related to the +sisu_postgresql dependency package, which is a dummy package to install +dependencies needed for SiSU to populate a postgresql database, +this being part of SiSU - man sisu). + + 22 + + + + 3.3 Synopsis + + 23 + + + + sisu -D [instruction] [filename/wildcard if required] + + 24 + + + + sisu -D --pg --[instruction] [filename/wildcard if required] + + 25 + + + + 3.4 Commands + + 26 + + + + Mappings to two databases are provided by default, postgresql and +sqlite, the same commands are used within sisu to construct and +populate databases however -d (lowercase) denotes sqlite and -D +(uppercase) denotes postgresql, alternatively --sqlite or --pgsql may +be used + + 27 + + + + -D or --pgsql may be used interchangeably. + + 28 + + + + 3.4.1 create and destroy database + + 29 + + + + --pgsql --createall
initial step, creates required +relations (tables, indexes) in existing (postgresql) database (a +database should be created manually and given the same name as working +directory, as requested) (rb.dbi) +
+ 30 +
+ + + sisu -D --createdb
creates database where no database +existed before +
+ 31 +
+ + + sisu -D --create
creates database tables where no database +tables existed before +
+ 32 +
+ + + sisu -D --Dropall
destroys database (including all its +content)! kills data and drops tables, indexes and database associated +with a given directory (and directories of the same name). +
+ 33 +
+ + + sisu -D --recreate
destroys existing database and builds a +new empty database structure +
+ 34 +
+ + + 3.4.2 import and remove documents + + 35 + + + + sisu -D --import -v [filename/wildcard]
populates database +with the contents of the file. Imports documents(s) specified to a +postgresql database (at an object level). +
+ 36 +
+ + + sisu -D --update -v [filename/wildcard]
updates file +contents in database +
+ 37 +
+ + + sisu -D --remove -v [filename/wildcard]
removes specified +document from postgresql database. +
+ 38 +
+ + + 4. Sqlite + + 39 + + + + 4.1 Name + + 40 + + + + SiSU - Structured information, Serialized Units - a document +publishing system. + + 41 + + + + 4.2 Description + + 42 + + + + Information related to using sqlite with sisu (and related to the +sisu_sqlite dependency package, which is a dummy package to install +dependencies needed for SiSU to populate an sqlite database, +this being part of SiSU - man sisu). + + 43 + + + + 4.3 Synopsis + + 44 + + + + sisu -d [instruction] [filename/wildcard if required] + + 45 + + + + sisu -d --(sqlite|pg) --[instruction] [filename/wildcard if +required] + + 46 + + + + 4.4 Commands + + 47 + + + + Mappings to two databases are provided by default, postgresql and +sqlite, the same commands are used within sisu to construct and +populate databases however -d (lowercase) denotes sqlite and -D +(uppercase) denotes postgresql, alternatively --sqlite or --pgsql may +be used + + 48 + + + + -d or --sqlite may be used interchangeably. + + 49 + + + + 4.4.1 create and destroy database + + 50 + + + + --sqlite --createall
initial step, creates required +relations (tables, indexes) in existing (sqlite) database (a database +should be created manually and given the same name as working +directory, as requested) (rb.dbi) +
+ 51 +
+ + + sisu -d --createdb
creates database where no database +existed before +
+ 52 +
+ + + sisu -d --create
creates database tables where no database +tables existed before +
+ 53 +
+ + + sisu -d --dropall
destroys database (including all its +content)! kills data and drops tables, indexes and database associated +with a given directory (and directories of the same name). +
+ 54 +
+ + + sisu -d --recreate
destroys existing database and builds a +new empty database structure +
+ 55 +
+ + + 4.4.2 import and remove documents + + 56 + + + + sisu -d --import -v [filename/wildcard]
populates database +with the contents of the file. Imports documents(s) specified to an +sqlite database (at an object level). +
+ 57 +
+ + + sisu -d --update -v [filename/wildcard]
updates file +contents in database +
+ 58 +
+ + + sisu -d --remove -v [filename/wildcard]
removes specified +document from sqlite database. +
+ 59 +
+ + + 5. Introduction + + 60 + + + + 5.1 Search - database frontend sample, utilising database and SiSU +features, including object citation numbering (backend currently +PostgreSQL) + + 61 + + + + Sample search frontend +3 A small database and sample query front-end (search from) +that makes use of the citation system, object citation numbering +to demonstrates functionality.4 + + + 3. <http://search.sisudoc.org> + + + 4. (which could be extended further with current back-end). As regards +scaling of the database, it is as scalable as the database (here +Postgresql) and hardware allow. + + 62 + + + + SiSU can provide information on which documents are matched and +at what locations within each document the matches are found. These +results are relevant across all outputs using object citation +numbering, which includes html, XML, LaTeX, PDF and indeed the SQL +database. You can then refer to one of the other outputs or in the SQL +database expand the text within the matched objects (paragraphs) in the +documents matched. + + 63 + + + + Note you may set results either for documents matched and object number +locations within each matched document meeting the search criteria; or +display the names of the documents matched along with the objects +(paragraphs) that meet the search criteria.5 + + + 5. of this feature when demonstrated to an IBM software innovations +evaluator in 2004 he said to paraphrase: this could be of interest to +us. We have large document management systems, you can search hundreds +of thousands of documents and we can tell you which documents meet your +search criteria, but there is no way we can tell you without opening +each document where within each your matches are found. + + 64 + + + + sisu -F --webserv-webrick
builds a cgi web search frontend +for the database created +
+ 65 +
+ + + The following is feedback on the setup on a machine provided by the +help command: + + 66 + + + + sisu --help sql + + 67 + + + 68 + +      Postgresql
       user:             ralph
       current db set:   SiSU_sisu
       port:             5432
       dbi connect:      DBI:Pg:database=SiSU_sisu;port=5432

     sqlite
       current db set:   /home/ralph/sisu_www/sisu/sisu_sqlite.db
       dbi connect       DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db     +
+
+ + + Note on databases built + + 69 + + + + By default, [unless otherwise specified] databases are built on a +directory basis, from collections of documents within that directory. +The name of the directory you choose to work from is used as the +database name, i.e. if you are working in a directory called +/home/ralph/ebook the database SiSU_ebook is used. [otherwise a manual +mapping for the collection is necessary] + + 70 + + + + 5.2 Search Form + + 71 + + + + sisu -F
generates a sample search form, which must be +copied to the web-server cgi directory +
+ 72 +
+ + + sisu -F --webserv-webrick
generates a sample search form +for use with the webrick server, which must be copied to the web-server +cgi directory +
+ 73 +
+ + + sisu -Fv
as above, and provides some information on +setting up hyperestraier +
+ 74 +
+ + + sisu -W
starts the webrick server which should be +available wherever sisu is properly installed +
+ 75 +
+ + + The generated search form must be copied manually to the webserver +directory as instructed + + 76 + + + + 6. Hyperestraier + + 77 + + + + See the documentation for hyperestraier: + + 78 + + + + <http://hyperestraier.sourceforge.net/> + + 79 + + + + /usr/share/doc/hyperestraier/index.html + + 80 + + + + man estcmd + + 81 + + + + on sisu_hyperestraier: + + 82 + + + + man sisu_hyperestraier + + 83 + + + + /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html + + 84 + + + + NOTE: the examples that follow assume that sisu output is placed in the +directory /home/ralph/sisu_www + + 85 + + + + (A) to generate the index within the webserver directory to be indexed: + + 86 + + + + estcmd gather -sd [index name] [directory path to index] + + 87 + + + + the following are examples that will need to be tailored according to +your needs: + + 88 + + + + cd /home/ralph/sisu_www + + 89 + + + + estcmd gather -sd casket /home/ralph/sisu_www + + 90 + + + + you may use the 'find' command together with 'egrep' to limit indexing +to particular document collection directories within the web server +directory: + + 91 + + + + find /home/ralph/sisu_www -type f | egrep +'/home/ralph/sisu_www/sisu/.+?.html$' |estcmd gather -sd casket - + + 92 + + + + Check which directories in the webserver/output directory +(~/sisu_www or elsewhere depending on configuration) you wish to +include in the search index. + + 93 + + + + As sisu duplicates output in multiple file formats, it it is probably +preferable to limit the estraier index to html output, and as it may +also be desirable to exclude files 'plain.txt', 'toc.html' and +'concordance.html', as these duplicate information held in other html +output e.g. + + 94 + + + + find /home/ralph/sisu_www -type f | egrep +'/sisu_www/(sisu|bookmarks)/.+?.html$' | egrep -v +'(doc|concordance).html$' |estcmd gather -sd casket - + + 95 + + + + from your current document preparation/markup directory, you would +construct a rune along the following lines: + + 96 + + + + find /home/ralph/sisu_www -type f | egrep +'/home/ralph/sisu_www/([specify first directory for inclusion]|[specify +second directory for inclusion]|[another directory for inclusion? +...])/.+?.html$' | egrep -v '(doc|concordance).html$' |estcmd gather +-sd /home/ralph/sisu_www/casket - + + 97 + + + + (B) to set up the search form + + 98 + + + + (i) copy estseek.cgi to your cgi directory and set file permissions to +755: + + 99 + + + + sudo cp -vi /usr/lib/estraier/estseek.cgi /usr/lib/cgi-bin + + 100 + + + + sudo chmod -v 755 /usr/lib/cgi-bin/estseek.cgi + + 101 + + + + sudo cp -v /usr/share/hyperestraier/estseek.* /usr/lib/cgi-bin + + 102 + + + + [see estraier documentation for paths] + + 103 + + + + (ii) edit estseek.conf, with attention to the lines starting +'indexname:' and 'replace:': + + 104 + + + + indexname: /home/ralph/sisu_www/casket + + 105 + + + + replace: ^file:///home/ralph/sisu_www{{!}}http://localhost + + 106 + + + + replace: /index.html?${{!}}/ + + 107 + + + + (C) to test using webrick, start webrick: + + 108 + + + + sisu -W + + 109 + + + + and try open the url: <http://localhost:8081/cgi-bin/estseek.cgi> + + 110 + + + + Endnotes + + 0 + + +
-- cgit v1.2.3