Differences between revisions 3 and 4
Revision 3 as of 2005-05-24 14:32:32
Size: 5083
Comment:
Revision 4 as of 2009-09-20 22:02:39
Size: 5083
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

Slide.SDA

Slide Simple Document Archive subproject.

Jakarta-Slide is designed and built to be a library/component to something larger. In and of itself jakarta-slide is not useful to most people that are 'users'. In addition, jakarta-slide is rather large, complex, and at times rather abstract making it difficult to implement a solution based on jakarta-slide.

Slide.SDA is a suggested subproject whereby an application is built on top of Jakarta-Slide that shows its immediate usefulness as well as be an example application.

Simple Document Archive - Why Document Archive instead of 'Document Management'? Because you get 5 people to talk about Document Management you will get 7 different versions of what 'Document Management' entails, and it will usually entail some type of workflow/signoff/checkin/checkout that most base usecases do not need. Document Archive, in my opinion, consists of:

  • Storing a binary file (a TIFF or PDF is the often use-case).
  • Storing searchable data related to that file (such as names, client id's, etc).
  • A graphical UI for a user to search for that file and view it, retrieve it, etc.

Some more detailed back-end explanations include:

  • Digital preservation - keep the data in such a fashion that 10 years from now it could be easily retrieved (i.e. file-based storage in XML metadata and binary file).
  • Allow for archival off and retrieval from CD/DVD.
  • Allow for cross-searching of multiple 'stores' (usecase: 10 CD's of data).

Sample Usecase: Fax Images

This usecase is based on a company recieving 2,000 faxes a day - enough that you would want a better way to manage and archive the data. Whether it's Sarb-Ox, HIPAA, or some other regulation, you are required to keep every fax you ever get and keep some data related to the fax for easy retrieval. Also, you have an Accounting Department, a Marketing Department, and a couple others that recieve a good portion of these faxes and rather than handing paper around would like a better way to search and view faxes they get (no workflow, just search and view).

Since most fax software (assume hylafax, Rightfax, Telcom, or a dozen other ones) allow for exporting a TIFF image format and some type of CSV or similar datafile, we will move forward that there are TIFF images and already DATA we can use.

DATA:

  • faxreceivenum it was received from.
  • faxtonum it was called into (or DID number, or other identifier saying 'go to this department' or 'to this person').
  • company that made the fax based on caller-id or DB/file-lookup based on faxreceivenum.
  • receivedate is the date the fax came in.

Now, going back to the use-case, you want to keep all this DATA with each fax TIFF image. Hmm....what free software out there can do this already with an open, documented protocol that both the protocol and the software are not OS-specific and will keep this data in a non-proprietary format so that 10 years from now I can get back to it? Jakarta-Slide!

Implementation ideas:

  • File-based XML for metadata repository, use Lucence to speed up searching.
  • Allow for a 'FAX' Namespace to hold that related data as well as the default 'DAV' for the DASL searching.
  • File-based object storage for the files.
  • Archive a 'store' by Month/Year based on receivedate (JAN04, FEB04, etc) to CD. Maybe sub-CD's for each department if they need their own. 'archive store' = readonly.
  • Allow for searching across multiple stores that are mounted in Slide as a single group (all faxes) or by department (Accounting faxes).
  • Optional: versioning (not useful in this usecase, but would be for others).

User requirements for SDA - this is where it really matters. Allow the user to utilize Jakarta Slide in a way that makes sense.

  • For new Namespaces like 'FAX', require a DTD or Schema if want to restrict/enforce attributes/propertys.
  • Export based on range of data to another store (for archive?).
  • SDA default setup has good default roles, such as 'searcher', 'writer', 'admin' that can be mapped to Realm-based security.
  • Default UI for a 'searcher' user via web that is useful and adaptive to namespaces (allows to search in 'FAX' namespace based on 'receivedate' attribute defined by DTD/Schema). Also allow user to quickly identify search range, whether single store or a defined group.
  • Allow for easy addition of existing archived stores (i.e. on CD/DVD, or on remote or local file systems).
  • Allow for easy grouping of stores for cross-search context.

User Responibilities:

  • User is responsible for 'injecting' the binary file and related data into SDA. SDA simply allows them to do so and handles storing and searching for documents. Recommend WebDAV protocol, WCK for Java, sample in .NET, etc.
  • User is responsible for backup/restore of the filesystem where SDA archives data.
  • User is responsible for setting up Realm security (tomcat/jboss), map to roles. However, SDA should have some examples for common use-cases (ActiveDirectory, Tomcat configured users, etc.).

Slide.SDA (last edited 2009-09-20 22:02:39 by localhost)