Publications du laboratoire
AIGLE

(117) Production(s) de BOUSSAID O.

|
|
ETL-Text: Extract-Transform-Load Processes for Textual Data Warehousing
Auteur(s): AKNOUCHE R. , ASFARI O. , BENTAYEB F., BOUSSAID O.
Actes de conférence: Conference: EPIA 2013 (16th Portuguese Conference on Artificial Intelligence) (, PT, 2013-09-09)
Publié: Proceedings. Lecture Notes in Computer Science (LNCS), Springer, vol. (2013) p.to appear
Résumé: The construction of the ETL (Extract-Transform-Load) process is one of the biggest tasks of building a warehouse. ETL processes area has little research, because of its difficulty and lack of formal model for representing ETL activities that map the incoming data from different sources to be in a suitable format for loading into the warehouse. A main problem in data warehousing of multidimensional text databases is to deal with the content in its text cells. In this paper, we propose a model for textual data warehouse ETL processes called ETL-Text. It combines classical data warehousing tasks, information retrieval (IR) techniques, and information processing in particular the language modeling. Our approach is based on Wikipedia as external knowledge source to extract the semantics of the textual documents. To validate our approach, we develop a prototype composed of several processing modules that illustrate the different ETL-Text processes. Also, we use the 20 Newsgroups corpus to perform our experimentation.
|

|
|
Cube de textes et opérateur d'agrégation basé sur un modèle vectoriel adapté
Auteur(s): Oukid Lamia, ASFARI O. , BENTAYEB F., BOUSSAID O.
Actes de conférence: Conference: 9èmes Journées Francophones sur les Entrepôts de données et Analyse en Ligne (Blois, FR, 2013-06-13)
Publié: RNTI, vol. (2013) p.to appear
Résumé: Les technologies d'entreposage de données et d'analyse en ligne OLAP (On-Line Analytical Processing) ont largement fait leurs preuves pour l'analyse de données structurées, mais elles sont inadaptées pour l'analyse des données textuelles, faute d'outils et de méthodes adaptés. Nous proposons dans cet article, un modèle de cube textuel nommé TCube, qui comporte plusieurs dimensions sémantiques, pour une meilleure prise en charge de la sémantique des données textuelles. Les attributs de chaque dimension sémantique sont regroupés dans une hiérarchie de concepts, extraite d'une ontologie de domaine comme une ressource externe. Notre cube de textes comprend une mesure d'analyse textuelle qui s'appuie sur un modèle vectoriel adapté à l'analyse OLAP et une technique de propagation de pertinence. Il est également associé à un nouvel opérateur d'agrégation permettant d'agréger les données textuelles dans un environnement OLAP. Les résultats préliminaires de notre étude expérimentale montrent l'intérêt de notre approche.
|

|
|
Les entrepôts de données pour les nuls... ou pas ! 
Auteur(s): FAVRE C. , BENTAYEB F., BOUSSAID O., DARMONT J., GAVIN G., HARBI N., KABACHI N., LOUDCHER S.
Actes de conférence: Conference: 2ème Atelier aIde à la Décision à tous les Etages (EGC/AIDE 13) (Toulouse, FR, 2013-01-29)
Publié: 2ème Atelier aIde à la Décision à tous les Etages (EGC/AIDE 13), vol. (2013) p.-
Ref HAL: hal-00783638_v1
Résumé: Dans cet article, nous portons notre regard sur l'aide à la décision du point de vue des systèmes décisionnels au sens des entrepôts de données et de l'analyse en ligne. Après avoir défini les concepts qui sous-tendent ces systèmes, nous nous proposons d'aborder les problématiques de recherche qui leur sont liées selon quatre points de vue : les données, les environnements de stockage, les utilisateurs et la sécurité. Nous abordons finalement les problèmes qui restent ouverts dans le domaine des entrepôts de données.
|

|
|
Communitiy Extraction based on Topic-Driven-Model for Clustering Users Tweets
Auteur(s): Hannachi Lilia, ASFARI O. , BENTAYEB F., KABACHI N., BOUSSAID O.
Actes de conférence: Conference: The 8th International Conference on Advanced Data Mining and Applications (ADMA 2012) (Nanjing, CN, 2012-12-15)
Publié: Springer, Lecture Notes in Artificial Intelligence (LNAI)., vol. (2012) p.39-51
Résumé: Twitter have become a significant means by which people communicate with the world and describe their current activities, opinions and status in short text snippets. Tweets can be analyzed automatically in order to derive much potential information such as, interesting topics, social influence, user's communities, etc. Extraction communities within social networks has been a focus of recent work in several areas. Different from the most community discovery methods focused on the relations between users, we aim to derive user's communities based on common topics from user's tweets. For instance, if two users always talk about politic in their tweets, thus they can be grouped in the same community which is related to politic topic. To achieve this goal, we propose a new approach called CETD: Community Extraction based on Topic-Driven-Model. This approach combines our proposed model used to detect topics of the user's tweets based on a semantic taxonomy together with a community extraction method based on the hierarchical clustering technique. Our experimentation on the proposed approach shows the relevant of the users communities extracted based on their common topics and domains.
---------
|

|
|
Particle swarm optimisation for data warehouse logical design 
Auteur(s): Hacène Derrar, Mohamed Ahmed-Nacer, BOUSSAID O.
(Article) Publié:
International Journal of Bio-Inspired Computation, vol. (2012) p.In press
Ref HAL: hal-00712141_v1
Résumé: Particle swarm optimisation for data warehouse logical design
|

|
|
Active XML-based Web Data Integration 
Auteur(s): SALEM R., DARMONT J., BOUSSAID O.
(Article) Accepté:
Information Systems Frontiers, vol. (2013) p.In press
Ref HAL: hal-00712547_v1
DOI: 10.1007/s10796-012-9405-6
Résumé: Today, the Web is the largest source of information in the world. There is currently an increasing demand that decision-making applications such as Data Warehousing (DW) and Business Intelligence (BI) move onto the Web, especially in the cloud. Integrating data into the DW/BI applications is still the most critical and time-consuming task. To make better decisions in DW/BI applications, next generation data integration poses new requirements to data integration systems, over those posed by traditional data integration. From these requirements is to integrate data in real-time and autonomously with minimal user intervention.
Therefore, we propose in this paper a metadata-based, event-driven, and service-oriented framework for integrating real-time data autonomously. The framework utilizes Web standards to integrate data, from heterogeneous and distributed sources. It exploits the XML language to address data heterogeneity, Web services to tackle data distribution, and Active XML (AXML) to store integrated data. The framework is also subscribed to Web services for real-time change data capture. Moreover, beside logging different framework events into a specified repository for on-line analysis and reporting, we propose a novel Frequency XML-based Tree (FXT) structure for mining association rules from logged event streams using XQuery. The framework is also incorporates active rules to automate and activate different integration services. Finally, we have a web application prototype as a proof of concept.
Commentaires: Special issue on Business Intelligence and the Web
|

|
|
Spatial OLAP and Map Generalization: Model and Algebra
Auteur(s): Bimonte Sandro, Bertolotto Michela, Gensel Jérôme, BOUSSAID O.
(Article) Publié:
International Journal of Data Warehousing and Mining, vol. 8 (2012) p.52-92
Résumé: Map generalization can be used as a central component of Spatial Decision Support Systems to provide a simplified and more readable cartographic visualization of geographic information. Indeed, it supports the user mental process for discovering important and unknown geospatial relations, trends and patterns. Spatial OLAP (SOLAP) integrates spatial data into OLAP and data warehouse systems. SOLAP models and tools are based on the concepts of spatial dimensions and measures that represent the axes and the subjects of the spatio-multidimensional analysis. Although powerful under some respect, current SOLAP models cannot support map generalization capabilities. This paper provides the first effort to integrate Map Generalization and OLAP. Firstly the authors define all modeling and querying requirements to do this integration, and then present a SOLAP model and algebra that support map generalization concepts. The approach extends SOLAP spatial hierarchies introducing multi-association relationships, supports imprecise measures, and it takes into account spatial dimensions constraints generated by multiple map generalization hierarchies.
|