Save formats

Among other things, two were really annoying in "old" Sipina : (a) there was no possibility to keep track of the analyses you were working on ; (b) and even if you had written it on some paper, you always had to re-create it by selecting the good commands in the same order.

With the introduction of stream diagrams in order to describe the successive processes, these problems easily became out of date due to a new save format : now we can conserv the chain of analyses built, with the associated parameters ( !) ; then, after having loaded the diagram, launch it again by a single mouse click.

Only the diagram description - the program in other words - is saved, results are not in any way. Two formats can be used, they fulfill different requirements.

Binary description of the stream diagram (*.bdm)

The data imported are included in the saved file with this format. The resulting file can only be exploited by TANAGRA.

The main advantage is that, since the data are imported once and only once, loading in memory the diagram at the next execution is very quick. For example, with file COVTYPE, containing 581,102 records with 55 attributes, loading takes 2 seconds (Pentium IV - 1,5 Ghz - 256 MB).

On the other hand, the main inconvenience is that the analyses composing the diagram are definitively defined on these imported data. So if data change, by adding some records for example, data must be imported again, and diagram must be redefined.

So prefer to use this format if : (a) data should no longer change ; (b) if loading time is an important factor in the context of your work.

Textual description of the stream diagram (*.tdm)

This format, based on the Windows INI file format, describes in a text file the analyses composing the diagram. So this file can be opened and examined under any text editor.

Example of textual diagram file (*.tdm) on IRIS data
[Diagram]
Title=Default title
Database=D:\DataMining\Databases_for_mining\Iris\iris.txt

[Dataset]
MLClassGenerator=TMLGenDataset
successors=1
succ_1=Define status 1

[Define status 1]
MLClassGenerator=TMLGenFSDefStatus
target_count=0
input_count=4
input_1=sep_length
input_2=sep_width
input_3=pet_length
input_4=pet_width
illus_count=0
successors=1
succ_1=Principal Component Analysis 1

[Principal Component Analysis 1]
MLClassGenerator=TMLGenCompFactPCA
nb_axis=2
successors=0

There are many advantages : (a) there is only a reference to the data in the saved file, so if they change, the next execution would work on the new version of data, and produce updated results ; (b) the file respects the INI specification, so it is possible to define new diagrams, without opening Tanagra.

One may also see the tdm file as a sort of script, describing the operations to do. However that was not the purpose yet, so controls on the structure are not implemented. One possible extension would actually be to use XML files to describe the diagrams. The XML tree structure is a good one for it, and if we had a well-defined DTD, we could validate the diagrams described without loading them in Tanagra. (if someone takes interest in it...)

The main inconvenience of this save format is the need to import data each time you execute the stream diagram. If we go on with COVTYPE example, it takes 4 minutes to import the data, before executing any of the operators.


Last modification : January 21st, 2004.