≅ SiSU for documents - structuring, publishing in multiple formats & search

A short description

SiSU is a lightweight markup based, document structuring, publishing and search tool for document collections. It is command line oriented and generates static content that is also made searchable at an object level through an sql database

SiSU markup helps define text objects which are numbered sequentially for object citation. Breaking the document into objects provides interesting possibilities. These object numbers provide the possibility of citing/locating text precisely across different document formats and different languages (assuming the document has been translated). For search it also makes it possible to identify precisely where within in each document search criteria is met in the form of an index. Additionally this frees the possibility to represent the document in the manner considered most suitable to a specific document format (whilst retaining its structural and citation integrity).

SiSU project source

SiSU projects repo (git)
- https://git.sisudoc.org

SiSU: document publishing (multiple formats + search)
- https://git.sisudoc.org/sisu

SiSU markup samples in document pods for sisu
- https://git.sisudoc.org/sisu-markup

SiSU Spine markup sample output

To give an idea of how this works here is a small collection of documents marked up for and generated by the software. The curation of a collection of specialized related documents would benefit from a consistently applied bespoke ontology or thesaurus.
The documents presented are documents that have been released under various creative commons licences, in the public domain, or the author's work, with the exception of one that is under GPL and the old abandoned Debian live-manual

≅ Authors (software curated from provided document header metadata)
- https://sisudoc.org/spine/authors.html

≅ Topics (software curated from provided document header metadata)
- https://sisudoc.org/spine/topics.html

SiSU Spine search

≅ Search (granular search of text objects)
- https://sisudoc.org/spine_search

SiSU description

Here is a description that has been used for the original sisu:

With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax in your text editor of choice, SiSU can generate various document formats, most of which share a common object numbering system for locating content, including plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF files, and populate an SQL database with objects (roughly paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity. Think of being able to finely match text in documents, using common object numbers, across different output formats and across languages if you have translations of the same document. For search, your criteria is met by these documents at these locations within each document (equally relevant across different output formats and languages). To be clear (if obvious) page numbers provide none of this functionality. Object numbering is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content. Document outputs can also share provided semantic meta-data.

SiSU Spine

SiSU Spine is the new generator for documents prepared in sisu markup, written in D as opposed to the original sisu which was first shared in Ruby.

Spine code has not as yet been made publicly available.

As compared with the original sisu generator sisu spine:

- Spine uses the same document markup for the document body, but uses yaml for document headers (which contains document metadata and configuration details), the original sisu has a bespoke markup for headers.

- Spine (written in D) is considerably faster than sisu (written in Ruby) in generating native output on last test at least 60 times (1 minute becomes 1 second; 1 hour a minute :-) (admittedly some time ago, ruby has been getting faster, hopefully this is not over over promising).

- Spine produces fewer document outputs types than sisu (html, epub, (odt, latex) and populates sql db for search)

- (where both produce the same output type, generally) Spine produces more up to date output format representations.

ralph.amissah www since 1993 ;-)