aboutsummaryrefslogtreecommitdiffhomepage
path: root/data/doc/sisu/markup-samples/sisu_manual/sisu_sql.ssi
blob: 6e54275b6b99c0531a43a4eaebf3e273de559114 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
% SiSU insert 2.0

@title: SiSU
 :subtitle: SQL and Search

@creator:
 :author: Amissah, Ralph

@date:
 :created: 2002-08-28
 :issued: 2002-08-28
 :available: 2002-08-28
 :published: 2007-09-16
 :modified: 2011-02-07

@rights:
 :copyright: Copyright (C) Ralph Amissah 2007
 :license: GPL 3 (part of SiSU documentation)

@classify:
 :subject: ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search

:A~? @title @creator

:B~? SiSU Search

:C~? Search

1~search_sql SQL

2~ populating SQL type databases

SiSU feeds sisu markupd documents into sql type databases PostgreSQL~{ http://www.postgresql.org/ <br> http://advocacy.postgresql.org/ <br> http://en.wikipedia.org/wiki/Postgresql }~ and/or SQLite~{ http://www.hwaci.com/sw/sqlite/ <br> http://en.wikipedia.org/wiki/Sqlite }~ database together with information related to document structure.

This is one of the more interesting output forms, as all the structural data of the documents are retained (though can be ignored by the user of the database should they so choose). All site texts/documents are (currently) streamed to four tables:

_1* one containing semantic (and other) headers, including, title, author, subject, (the Dublin Core...);

_1* another the substantive texts by individual "paragraph" (or object) - along with structural information, each paragraph being identifiable by its paragraph number (if it has one which almost all of them do), and the substantive text of each paragraph quite naturally being searchable (both in formatted and clean text versions for searching); and

_1* a third containing endnotes cross-referenced back to the paragraph from which they are referenced (both in formatted and clean text versions for searching).

_1* a fourth table with a one to one relation with the headers table contains full text versions of output, eg. pdf, html, xml, and ascii.

There is of course the possibility to add further structures.

At this level SiSU loads a relational database with documents chunked into objects, their smallest logical structurally constituent parts, as text objects, with their object citation number and all other structural information needed to construct the document. Text is stored (at this text object level) with and without elementary markup tagging, the stripped version being so as to facilitate ease of searching.

Being able to search a relational database at an object level with the SiSU citation system is an effective way of locating content generated by SiSU. As individual text objects of a document stored (and indexed) together with object numbers, and all versions of the document have the same numbering, complex searches can be tailored to return just the locations of the search results relevant for all available output formats, with live links to the precise locations in the database or in html/xml documents; or, the structural information provided makes it possible to search the full contents of the database and have headings in which search content appears, or to search only headings etc. (as the Dublin Core is incorporated it is easy to make use of that as well).