diff options
author | Ralph Amissah <ralph@amissah.com> | 2008-12-02 23:54:23 -0500 |
---|---|---|
committer | Ralph Amissah <ralph@amissah.com> | 2008-12-02 23:54:23 -0500 |
commit | 0e6fc15ada3c5d9a86b227163f35a54993b32529 (patch) | |
tree | 90ac98f2dadf8a2731fac4921fb5d9263eeedeb9 /lib/sisu/v0/harvest.rb | |
parent | sha256 for 0.69.4 (diff) |
sisu harvest, introduce module along with header syntax addition & modification
* sisu markup, additional header and new format rule:
* @creator: / @author: header field, introduced author name format rules
for more usable metadata harvesting: surname comma other names, additional
authors separated by semi-colon
* param added meta-tag, @topic_register: formatting topic levels are
separated from sub-levels by a colon, a semi-colon separates main topics
if there are multiple topics at lowest sub-level, a pipe can be used to
create multiple headings
* harvest module, harvests metadata from document set currently extracts: (i)
authors and their writings from document set; (ii) topics and associated
writings from document set (topics use topic_register header). harvest
(when run against documents common to a directory of a site) extracts
metadata and organises the documents on a site by author and topic
information provided (there is a new "topic_register" header, with
formatting rules similar to those of the book index), results are placed in
[output_path]/sisu_site_metadata.
sisu --harvest *.sst
* by author (see change in param @creator: / @author: header field)
* by topic / subject index (see addition in param of @topic_register:
header field)
initially there should be an example samples here:
http://www.jus.uio.no/sisu/sisu_site_metadata/harvest_authors.html
http://www.jus.uio.no/sisu/sisu_site_metadata/harvest_topics.html
together with update markup source files
The authors and their writings list will be made to take on a more
biblographical form, with the use of additional fields as required.
(concept example, suitable for medium sized sites [to remove size
constraint: implement SQL equivalent]) make feature more robust
* css, for harvest output added
* remote placement of sisu_site_metadata (output produced by metadata harvest)
* sisu markup, update document samples accordingly
* tidy copyright marks in program headers, remove repetition of dates
[version bump because formatting rule introduced to author / creator header -
where new site metadata harvest feature is used, (at present changes changes
should not be noticed except when using metadata harvest)]
Diffstat (limited to 'lib/sisu/v0/harvest.rb')
-rw-r--r-- | lib/sisu/v0/harvest.rb | 102 |
1 files changed, 102 insertions, 0 deletions
diff --git a/lib/sisu/v0/harvest.rb b/lib/sisu/v0/harvest.rb new file mode 100644 index 00000000..e8609a93 --- /dev/null +++ b/lib/sisu/v0/harvest.rb @@ -0,0 +1,102 @@ +# coding: utf-8 +=begin + + * Name: SiSU + + * Description: a framework for document structuring, publishing and search + harvest metadata from document corpus (suitable for medium sized sites) + (concept example, [to remove size constraint: implement SQL equivalent]) + + * Author: Ralph Amissah + + * Copyright: (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, + 2007, 2008 Ralph Amissah All Rights Reserved. + + * License: GPL 3 or later: + + SiSU, a framework for document structuring, publishing and search + + Copyright (C) Ralph Amissah + + This program is free software: you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the Free + Software Foundation, either version 3 of the License, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + You should have received a copy of the GNU General Public License along with + this program. If not, see <http://www.gnu.org/licenses/>. + + If you have Internet connection, the latest version of the GPL should be + available at these locations: + <http://www.fsf.org/licensing/licenses/gpl.html> + <http://www.gnu.org/copyleft/gpl.html> + + <http://www.jus.uio.no/sisu/gpl.fsf/toc.html> + <http://www.jus.uio.no/sisu/gpl.fsf/doc.html> + <http://www.jus.uio.no/sisu/gpl.fsf/plain.txt> + + * SiSU uses: + * Standard SiSU markup syntax, + * Standard SiSU meta-markup syntax, and the + * Standard SiSU object citation numbering and system + + * Hompages: + <http://www.jus.uio.no/sisu> + <http://www.sisudoc.org> + + * Download: + <http://www.jus.uio.no/sisu/SiSU/download.html> + + * Ralph Amissah + <ralph@amissah.com> + <ralph.amissah@gmail.com> + + ** Description: system environment, resource control and configuration details + +=end +def help + puts <<WOK + harvest --harvest extracts document index metadata + +WOK +end +def css(opt) + require "#{SiSU_lib}/css" + css=SiSU_Style::CSS.new + fn_css=SiSU_Env::CSS_default.new + style=File.new("#{@env.path.pwd}/#{fn_css.harvest}",'w') + #style=File.new("#{@env.path.pwd}/harvest.css",'w') + style << css.harvest + style.close +end +def cases(opt) + case opt.mod.inspect + when/--harvest/i + css(opt) if opt.cmd.inspect =~/M/ + HARVEST_authors::Songsheet.new(opt).songsheet + HARVEST_topics::Songsheet.new(opt).songsheet + else + help + end +end +branch='v0' +SiSU_lib="sisu/#{branch}" +require "#{SiSU_lib}/options" +require "#{SiSU_lib}/harvest_topics" +require "#{SiSU_lib}/harvest_authors" +require "#{SiSU_lib}/sysenv" +include SiSU_Env +@env=SiSU_Env::Info_env.new +@@the_idx_topics,@@the_idx_authors={},{} +argv=$* +opt=SiSU_commandline::Options.new(argv) +argv.shift +#instruct = 'help' if opt.mod.nil? or instruct == '' +mkdir_p(@env.path.output_md_harvest) unless FileTest.directory?(@env.path.output_md_harvest) +cases(opt) +__END__ |