index.txt markup.txt %% Summary --------------------------------------------------------------------- sisu takes an ASCII (UTF-8) document and abstracts the document into its structure and smaller constituent parts, objects. This abstraction is then used in subsequent processing to reconstitute the document into disparate document representations, many quite different from one another, e.g. HTML, ODF, LaTeX and to populate SQL SiSU identifies the document structure (headings and their levels) and pulls the document apart in to its constituent parts, objects (paragraphs, headings, tables, images etc.) to which it assigns an object number if substantive content. From the marked up document SiSU need to be able to determine a documents structure, and the objects that a document contains The first line of a SiSU marked up document can identify itself with %% Identify SiSU Document ------------------------------------------------------ % SiSU %% The Basic SiSU Markup Document ---------------------------------------------- SiSU documents are divided into two parts, (i) the document header and (ii) substantive content. (i) the document header, which contains (a) metadata and (b) processing instructions if any. Document headers take the form of a tag and the related related information. The Document header, metadata, should contain at least: @title: @creator: :author: Processing instructions are grouped under the @make: tag. In the absence of any program (or configured) defaults will be used. (ii) for the substantive content the document structure must be defined, here structure equates to the headings and their relative levels (this can be done either by explicit markup where each heading occurs, or in the header @make: section of the, or both). The basic document objects are headings and paragraphs. Paragraphs are identified automatically, and headings must be defined (with respect to document structure), so sisu is able to determine the basic objects without anything further. sample_1.sst %% Document Structure (heading levels) ----------------------------------------- Document structure (heading levels) are determined from information provided in the markup of the document. There are two ways to identify document structure: (i) manual markup of headings with their level; (ii) in the sisu header, under @make: :heading: provide a regex, in the manner understood by sisu, that identifies what to look for in headings of various levels. Document structure is the different headings in a document, and their relative levels. There are two sets of docment level markers :A~ and an optional :B~ & :C~ and beneath that 1~, 2~, 3~. For the first set of document level markers the document Title being the top level in the hierarchy; beneath that book titles if the document contains more than one book followed by sections %% Document Objects (paragraphs, headings, tables, verse etc.) ----------------- Document objects are units of text that are identified, stored and processed as a block. The most usual document objects would be paragraphs and headings. A more complete list of objects includes: paragraphs; headings; tables; code blocks; verse (the poem is identified, but each verse is an object); grouped text...