Processing Data Streams

Toon Calders

Résumé: 

Sometimes data is generated unboundedly and at such a fast pace that it is no longer possible to store the complete data in a database. The development of techniques for handling and processing such streams of data is very challenging as the streaming context imposes severe constraints on the computation:

  • We are often not able to store the whole data stream and making multiple passes over the data is no longer possible
  • As the stream is never finished we need to be able to continuously provide, upon request, up-to-date answers to analysis queries

Even problems that are highly trivial in an off-line context, such as: “How many different items are there in my database?“ become very hard in a streaming context.
Nevertheless, in the past decades several clever algorithms were developed to deal with streaming data. This talk covers several of these indispensable tools that should be present in every big data scientists’ toolbox.

Biographie: 

Toon Calders graduated in 1999 from the University of Antwerp with a diploma in Mathematics. He received his PhD in Computer Science from the same university in May 2003, in the database research group ADReM, and continued working in the ADReM group as a postdoc until 2006. From 2006 till 2012 he was an assistant professor in the Information Systems group at the Eindhoven Technical University. In 2012 he joined the CoDE department at the ULB as a “Chargé de Cours” (associate professor). His main research interests include data mining and machine learning. Toon Calders published over 60 conference and journal papers in this research area and received several scientific awards for his works, including the recent “10 Year most influential paper” award for papers published in ECMLPKDD 2002. Toon Calders regularly serves in the program committees of important data mining conferences, including ACM SIGKDD, IEEE ICDM, ECMLPKDD, SIAM DM, was conference chair of the BNAIC 2009, EDM 2011, and ECML/PKDD 2014 conferences and is an editor for Springer Data Mining journal.