From a72e66db913de3a2e508080c8b1fc8d1342a899b Mon Sep 17 00:00:00 2001 From: Ralph Amissah Date: Tue, 25 Sep 2007 23:23:03 +0100 Subject: remove generated output from main package --- .../sisu_manual/sisu_search/sax.xml | 918 --------------------- 1 file changed, 918 deletions(-) delete mode 100644 data/doc/manuals_generated/sisu_manual/sisu_search/sax.xml (limited to 'data/doc/manuals_generated/sisu_manual/sisu_search/sax.xml') diff --git a/data/doc/manuals_generated/sisu_manual/sisu_search/sax.xml b/data/doc/manuals_generated/sisu_manual/sisu_search/sax.xml deleted file mode 100644 index 4b75e8d7..00000000 --- a/data/doc/manuals_generated/sisu_manual/sisu_search/sax.xml +++ /dev/null @@ -1,918 +0,0 @@ - - - - - - - Title: - - SiSU - Search - -
- Creator: - - Ralph Amissah - -
- Rights: - - Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3 - -
- Type: - - information - -
- Subject: - - ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search - -
- Date created: - - 2002-08-28 - -
- Date issued: - - 2002-08-28 - -
- Date available: - - 2002-08-28 - -
- Date modified: - - 2007-09-16 - -
- Date: - - 2007-09-16 - -
- - - - 1 - - SiSU - Search,
Ralph Amissah -
-
- - 2 - - SiSU Search - - - - 3 - - 1. SiSU Search - Introduction - - - - 4 - - SiSU output can easily and conveniently be indexed by a number -of standalone indexing tools, such as Lucene, Hyperestraier. - - - - 5 - - Because the document structure of sites created is clearly defined, and -the text object citation system is available hypothetically at least, -for all forms of output, it is possible to search the sql database, and -either read results from that database, or just as simply map the -results to the html output, which has richer text markup. - - - - 6 - - In addition to this SiSU has the ability to populate a -relational sql type database with documents at an object level, with -objects numbers that are shared across different output types, which -make them searchable with that degree of granularity. Basically, your -match criteria is met by these documents and at these locations within -each document, which can be viewed within the database directly or in -various output formats. - - - - 7 - - 2. SQL - - - - 8 - - 2.1 populating SQL type databases - - - - 9 - - SiSU feeds sisu markupd documents into sql type databases -PostgreSQL1 and/or SQLite2 database together with -information related to document structure. - - - 1 - - <http://www.postgresql.org/> -
<http://advocacy.postgresql.org/> -
<http://en.wikipedia.org/wiki/Postgresql> -
-
- - 2 - - <http://www.hwaci.com/sw/sqlite/> -
<http://en.wikipedia.org/wiki/Sqlite> -
-
-
- - 10 - - This is one of the more interesting output forms, as all the structural -data of the documents are retained (though can be ignored by the user -of the database should they so choose). All site texts/documents are -(currently) streamed to four tables: - - - - 11 - - one containing semantic (and other) headers, including, title, -author, subject, (the Dublin Core...); - - - - 12 - - another the substantive texts by individual "paragraph" (or -object) - along with structural information, each paragraph being -identifiable by its paragraph number (if it has one which almost all of -them do), and the substantive text of each paragraph quite naturally -being searchable (both in formatted and clean text versions for -searching); and - - - - 13 - - a third containing endnotes cross-referenced back to the -paragraph from which they are referenced (both in formatted and clean -text versions for searching). - - - - 14 - - a fourth table with a one to one relation with the headers table -contains full text versions of output, eg. pdf, html, xml, and ascii. - - - - 15 - - There is of course the possibility to add further structures. - - - - 16 - - At this level SiSU loads a relational database with documents -chunked into objects, their smallest logical structurally constituent -parts, as text objects, with their object citation number and all other -structural information needed to construct the document. Text is stored -(at this text object level) with and without elementary markup tagging, -the stripped version being so as to facilitate ease of searching. - - - - 17 - - Being able to search a relational database at an object level with the -SiSU citation system is an effective way of locating content -generated by SiSU. As individual text objects of a document -stored (and indexed) together with object numbers, and all versions of -the document have the same numbering, complex searches can be tailored -to return just the locations of the search results relevant for all -available output formats, with live links to the precise locations in -the database or in html/xml documents; or, the structural information -provided makes it possible to search the full contents of the database -and have headings in which search content appears, or to search only -headings etc. (as the Dublin Core is incorporated it is easy to make -use of that as well). - - - - 18 - - 3. Postgresql - - - - 19 - - 3.1 Name - - - - 20 - - SiSU - Structured information, Serialized Units - a document -publishing system, postgresql dependency package - - - - 21 - - 3.2 Description - - - - 22 - - Information related to using postgresql with sisu (and related to the -sisu_postgresql dependency package, which is a dummy package to install -dependencies needed for SiSU to populate a postgresql database, -this being part of SiSU - man sisu). - - - - 23 - - 3.3 Synopsis - - - - 24 - - sisu -D [instruction] [filename/wildcard if required] - - - - 25 - - sisu -D --pg --[instruction] [filename/wildcard if required] - - - - 26 - - 3.4 Commands - - - - 27 - - Mappings to two databases are provided by default, postgresql and -sqlite, the same commands are used within sisu to construct and -populate databases however -d (lowercase) denotes sqlite and -D -(uppercase) denotes postgresql, alternatively --sqlite or --pgsql may -be used - - - - 28 - - -D or --pgsql may be used interchangeably. - - - - 29 - - 3.4.1 create and destroy database - - - - 30 - - --pgsql --createall
initial step, creates required -relations (tables, indexes) in existing (postgresql) database (a -database should be created manually and given the same name as working -directory, as requested) (rb.dbi) -
-
- - 31 - - sisu -D --createdb
creates database where no database -existed before -
-
- - 32 - - sisu -D --create
creates database tables where no database -tables existed before -
-
- - 33 - - sisu -D --Dropall
destroys database (including all its -content)! kills data and drops tables, indexes and database associated -with a given directory (and directories of the same name). -
-
- - 34 - - sisu -D --recreate
destroys existing database and builds a -new empty database structure -
-
- - 35 - - 3.4.2 import and remove documents - - - - 36 - - sisu -D --import -v [filename/wildcard]
populates database -with the contents of the file. Imports documents(s) specified to a -postgresql database (at an object level). -
-
- - 37 - - sisu -D --update -v [filename/wildcard]
updates file -contents in database -
-
- - 38 - - sisu -D --remove -v [filename/wildcard]
removes specified -document from postgresql database. -
-
- - 39 - - 4. Sqlite - - - - 40 - - 4.1 Name - - - - 41 - - SiSU - Structured information, Serialized Units - a document -publishing system. - - - - 42 - - 4.2 Description - - - - 43 - - Information related to using sqlite with sisu (and related to the -sisu_sqlite dependency package, which is a dummy package to install -dependencies needed for SiSU to populate an sqlite database, -this being part of SiSU - man sisu). - - - - 44 - - 4.3 Synopsis - - - - 45 - - sisu -d [instruction] [filename/wildcard if required] - - - - 46 - - sisu -d --(sqlite|pg) --[instruction] [filename/wildcard if -required] - - - - 47 - - 4.4 Commands - - - - 48 - - Mappings to two databases are provided by default, postgresql and -sqlite, the same commands are used within sisu to construct and -populate databases however -d (lowercase) denotes sqlite and -D -(uppercase) denotes postgresql, alternatively --sqlite or --pgsql may -be used - - - - 49 - - -d or --sqlite may be used interchangeably. - - - - 50 - - 4.4.1 create and destroy database - - - - 51 - - --sqlite --createall
initial step, creates required -relations (tables, indexes) in existing (sqlite) database (a database -should be created manually and given the same name as working -directory, as requested) (rb.dbi) -
-
- - 52 - - sisu -d --createdb
creates database where no database -existed before -
-
- - 53 - - sisu -d --create
creates database tables where no database -tables existed before -
-
- - 54 - - sisu -d --dropall
destroys database (including all its -content)! kills data and drops tables, indexes and database associated -with a given directory (and directories of the same name). -
-
- - 55 - - sisu -d --recreate
destroys existing database and builds a -new empty database structure -
-
- - 56 - - 4.4.2 import and remove documents - - - - 57 - - sisu -d --import -v [filename/wildcard]
populates database -with the contents of the file. Imports documents(s) specified to an -sqlite database (at an object level). -
-
- - 58 - - sisu -d --update -v [filename/wildcard]
updates file -contents in database -
-
- - 59 - - sisu -d --remove -v [filename/wildcard]
removes specified -document from sqlite database. -
-
- - 60 - - 5. Introduction - - - - 61 - - 5.1 Search - database frontend sample, utilising database and SiSU -features, including object citation numbering (backend currently -PostgreSQL) - - - - 62 - - Sample search frontend -3 A small database and sample query front-end (search from) -that makes use of the citation system, object citation numbering -to demonstrates functionality.4 - - - 3 - - <http://search.sisudoc.org> - - - - 4 - - (which could be extended further with current back-end). As regards -scaling of the database, it is as scalable as the database (here -Postgresql) and hardware allow. - - - - - 63 - - SiSU can provide information on which documents are matched and -at what locations within each document the matches are found. These -results are relevant across all outputs using object citation -numbering, which includes html, XML, LaTeX, PDF and indeed the SQL -database. You can then refer to one of the other outputs or in the SQL -database expand the text within the matched objects (paragraphs) in the -documents matched. - - - - 64 - - Note you may set results either for documents matched and object number -locations within each matched document meeting the search criteria; or -display the names of the documents matched along with the objects -(paragraphs) that meet the search criteria.5 - - - 5 - - of this feature when demonstrated to an IBM software innovations -evaluator in 2004 he said to paraphrase: this could be of interest to -us. We have large document management systems, you can search hundreds -of thousands of documents and we can tell you which documents meet your -search criteria, but there is no way we can tell you without opening -each document where within each your matches are found. - - - - - 65 - - sisu -F --webserv-webrick
builds a cgi web search frontend -for the database created -
-
- - 66 - - The following is feedback on the setup on a machine provided by the -help command: - - - - 67 - - sisu --help sql - - - - 68 - -      Postgresql
       user:             ralph
       current db set:   SiSU_sisu
       port:             5432
       dbi connect:      DBI:Pg:database=SiSU_sisu;port=5432

     sqlite
       current db set:   /home/ralph/sisu_www/sisu/sisu_sqlite.db
       dbi connect       DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db     -
-
- - 69 - - Note on databases built - - - - 70 - - By default, [unless otherwise specified] databases are built on a -directory basis, from collections of documents within that directory. -The name of the directory you choose to work from is used as the -database name, i.e. if you are working in a directory called -/home/ralph/ebook the database SiSU_ebook is used. [otherwise a manual -mapping for the collection is necessary] - - - - 71 - - 5.2 Search Form - - - - 72 - - sisu -F
generates a sample search form, which must be -copied to the web-server cgi directory -
-
- - 73 - - sisu -F --webserv-webrick
generates a sample search form -for use with the webrick server, which must be copied to the web-server -cgi directory -
-
- - 74 - - sisu -Fv
as above, and provides some information on -setting up hyperestraier -
-
- - 75 - - sisu -W
starts the webrick server which should be -available wherever sisu is properly installed -
-
- - 76 - - The generated search form must be copied manually to the webserver -directory as instructed - - - - 77 - - 6. Hyperestraier - - - - 78 - - See the documentation for hyperestraier: - - - - 79 - - <http://hyperestraier.sourceforge.net/> - - - - 80 - - /usr/share/doc/hyperestraier/index.html - - - - 81 - - man estcmd - - - - 82 - - on sisu_hyperestraier: - - - - 83 - - man sisu_hyperestraier - - - - 84 - - /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html - - - - 85 - - NOTE: the examples that follow assume that sisu output is placed in the -directory /home/ralph/sisu_www - - - - 86 - - (A) to generate the index within the webserver directory to be indexed: - - - - 87 - - estcmd gather -sd [index name] [directory path to index] - - - - 88 - - the following are examples that will need to be tailored according to -your needs: - - - - 89 - - cd /home/ralph/sisu_www - - - - 90 - - estcmd gather -sd casket /home/ralph/sisu_www - - - - 91 - - you may use the 'find' command together with 'egrep' to limit indexing -to particular document collection directories within the web server -directory: - - - - 92 - - find /home/ralph/sisu_www -type f | egrep -'/home/ralph/sisu_www/sisu/.+?.html$' |estcmd gather -sd casket - - - - - 93 - - Check which directories in the webserver/output directory -(~/sisu_www or elsewhere depending on configuration) you wish to -include in the search index. - - - - 94 - - As sisu duplicates output in multiple file formats, it it is probably -preferable to limit the estraier index to html output, and as it may -also be desirable to exclude files 'plain.txt', 'toc.html' and -'concordance.html', as these duplicate information held in other html -output e.g. - - - - 95 - - find /home/ralph/sisu_www -type f | egrep -'/sisu_www/(sisu|bookmarks)/.+?.html$' | egrep -v -'(doc|concordance).html$' |estcmd gather -sd casket - - - - - 96 - - from your current document preparation/markup directory, you would -construct a rune along the following lines: - - - - 97 - - find /home/ralph/sisu_www -type f | egrep -'/home/ralph/sisu_www/([specify first directory for inclusion]|[specify -second directory for inclusion]|[another directory for inclusion? -...])/.+?.html$' | egrep -v '(doc|concordance).html$' |estcmd gather --sd /home/ralph/sisu_www/casket - - - - - 98 - - (B) to set up the search form - - - - 99 - - (i) copy estseek.cgi to your cgi directory and set file permissions to -755: - - - - 100 - - sudo cp -vi /usr/lib/estraier/estseek.cgi /usr/lib/cgi-bin - - - - 101 - - sudo chmod -v 755 /usr/lib/cgi-bin/estseek.cgi - - - - 102 - - sudo cp -v /usr/share/hyperestraier/estseek.* /usr/lib/cgi-bin - - - - 103 - - [see estraier documentation for paths] - - - - 104 - - (ii) edit estseek.conf, with attention to the lines starting -'indexname:' and 'replace:': - - - - 105 - - indexname: /home/ralph/sisu_www/casket - - - - 106 - - replace: ^file:///home/ralph/sisu_www{{!}}http://localhost - - - - 107 - - replace: /index.html?${{!}}/ - - - - 108 - - (C) to test using webrick, start webrick: - - - - 109 - - sisu -W - - - - 110 - - and try open the url: <http://localhost:8081/cgi-bin/estseek.cgi> - - - - 0 - - Endnotes - - - -
-- cgit v1.2.3