GUISince its introduction by SPAD (DECISIA) at the beginning of the 90s, the modelisation of a chain of process as a "stream diagram" has now been adopted by many softwares . Consciously or not, most of the "big" entreprises in DATA MINING industry have re-used this notion of visual programming to describe the successive operations applied on data.
TANAGRA keeps going on this way, so the interface is classically composed of three parts : the description of the stream diagram, a "treeview" in our project ; the set of nodes (operators, components), in the bottom frame ; and finally, the results report, in HTML format.
Data accessThe access and analysis of the data source are made at the defining of a new diagram. The data are put in memory after having been internally encoded. Fastness is an aim in the importing of data.
Presently, only textfiles with tab separator are imported, either the ones of UNIX origin or DOS. Variable names
figure on the first line, their type (discrete or continuous) is deduced from the next line.
Operators (components)cependant The bottom frame contains the data mining operators (also designated as icons, nodes, components). All of them use data as input, perform analyses and produce results ; nethertheless, only a few of them make predictions. In this case, one or more variables are added to the data, which would be transmitted to the following operator.
Operators are arranged in categories. Some of these categories are commonly accepted (description / structuration / explication-prediction / association for example), others are more arguable. Actually, one underlying constraint was to have not too many categories of methods...
Data mining diagramAs with the other softwares of this domain, the Data Miner can define analysis by starting with data and adding operators one after each other. To experiment various assumptions, and compare the results obtained with, the user can explore several branchs of the diagram.
Choosing a tree structure (Treeview) facilitates the managing of diagrams, as much at a programming level
than at a end-user level. Complex analyses can so be easily represented and achieved. On the other hand, there is not
possibility to accomplish some merging in the analysis diagram, as with other graphical softwares.
For example, it is not possible to automatically group various data sources.
ResultsMore often, Tanagra operators produce HTML format output. So it is simple to export the results towards an edition software, like EXCEL(C), for subsequent processing.
Generally, output is composed from two parts : the decription of the analysis parameters, and the results.
Choosing HTML format has also the advantage of easily export results, to look at it after closing the software. As far as to print it.
When necessary, results can be displayed in a window, in which the user can interact. So it is for the "X-Y graph" operator, the user can use the mouse to select the variables used for horizontal and vertical axis, in order to understand better how points are distributed.