2014-04-16

Gregor Štiglic: Comprehensible Predictive Modelling in Healthcare

With the increased acceptance of electronic health records, we can observe the increasing interest in the application of data mining approaches in this field. One of the challenges where predictive modelling can play an important role is prediction of hospital readmissions shortly after the patient has been discharged from the hospital. In some cases, readmission classification on patient-level is not enough, because of national level regulations that demand tracking of readmissions on hospital level. Additionally, the structure of the collected data is constantly changing in terms of variable coding or changes in medical disease or procedure classifications. One should not forget the issue of data privacy that further limits the performance of the developed predictive models due to limited sample sizes. This talk will address the above-mentioned issues and limitations through an overview of our work in hospital readmission classification and visualization of trends from large hospital discharge datasets.

Gregor Štiglic is an Associate Professor and Head of Research Institute at the Faculty of Health Sciences, University of Maribor (FHS UM). He worked as a Visiting Researcher at the Data Analysis and Biomedical Analytics (DABI) Center at Temple University (2012) and as a Visiting Assistant Professor at Shah Lab, Stanford School of Medicine (2013). His research interests encompass application of knowledge discovery and predictive modelling techniques to large healthcare datasets. Specific areas of his technical interest include comprehensibility of classifiers, human interaction based classification, stability of feature selection algorithms, meta-learning and longitudinal rule discovery. His work was published in multiple conferences, journals and book chapters. Gregor gave talks on his knowledge discovery in healthcare and bioinformatics research at renowned research institutions such as IBM Watson Research Center, Stanford University and University of Tokyo. He co-organized the 2nd and 3rd Workshop on Data Mining for Medicine and Healthcare, that was organized in conjunction with the SIAM International Conference on Data Mining. Currently, he serves as a guest co-editor for Special Issue on Data Mining in Medicine and Healthcare in Springer’s Data Mining and Knowledge Discovery journal.

IJS, Orange Room, May 15th 2014, 13:00

2013-11-26

Martin Žnidaršič, IJS: Sampling large pattern statistics in large graphs

I will present an approach to approximate estimation of statistics (e.g. frequencies) of patterns in graphs, which is suitable also for large patterns with many embeddings. Although initially motivated by a task in statistical relational learning, this method is potentially useful in various situations. Theoretical characteristics and limitations of the approach will be presented, along with the results and practical insights gained by experimental evaluation.

Presentation is based on joint work with Jan Ramon and Jesse Davis from KU Leuven.

Thursday, 28.11., 13:00, Orange room

2013-10-23

Aljaž Osojnik: Modeling dynamical systems with data stream mining


The presentation presents the task of modeling dynamical systems in discrete time using regression trees, model trees, option trees and ensembles. Some challenges of mining data streams are presented. The algorithm FIMT-DD for mining data streams with regression or model trees is introduced, and, additionally, several other FIMT-DD based algorithms are described, i.e., ORTO, which uses option trees, and OBag / ORF, both of which learn an ensemble of model trees. Then the experimental setup and the datasets of dynamical systems are described. In the last part the experimental results are presented and the described algorithms are compared.

Thursday, 12:00, Orange Room

2013-05-18

Discussion: ClowdFlows and MUSE


The workflow construction and execution environment ClowdFlows will be used in the EU MUSE project (Machine Understanding for interactive StorytElling), which brings with it questions of interoperability of data formats, including third-party tools, and better organisation of the widget library.
This brown bag seminar is meant as a discussion forum for these issues, focusing on MUSE-produced HLT tools.

Tuesday, 21.5.2013, 13:00, Orange room

2012-10-03

Petra Kralj Novak and Anže Vavpetič, JSI: Risk forecasting analysis

Within the First and FOC projects, news articles from 305 news sites (3457 RSS feeds) have been collected, processed and semantically annotated for the period of one year, thus resulting in a rich playground for data miners.

 In this brown bag seminar, we will present the data and the results of experiments we have done so far. We seek for interesting (feasible) problem formulations to come closer to predicting financial systemic risks.

Thursday, 4.10.2012, E7 Meeting room

2012-06-20

Darko Čerepnalkoski, IJS: The influence of parameter fitting methods on model structure selection in automated modeling of aquatic ecosystems

I will present a journal publication that just got accepted. I will talk about ProBMoT, a tool for automated modeling of dynamical systems that addresses both structure identification and parameter estimation.

It takes into account domain knowledge formalized as templates for components of the process-based models: entities and processes. Taking a conceptual model of the system, the library of domain knowledge, and measurements of a particular dynamical system, it identifies both the structure and numerical parameters of the appropriate process-based model. ProBMoT has two main components corresponding to the two subtasks of modeling. The first component is concerned with generating candidate model structures that adhere to the conceptual model specified as input. The second subsystem uses the measured data to find suitable values for the constant parameters of a given model by using parameter estimation methods. ProBMoT uses model error to rank model structures and select the one that fits measured data best.

I will also present the analysis of the influence of the selection of the parameter estimation methods on the structure identification. I will discuss one local (derivative-based) and one global (meta-heuristic) parameter estimation method. As opposed to other comparative studies of parameter estimation methods that focus on identifying parameters of a single model structure, this presentation will compare the parameter estimation  methods in the context of repetitive parameter estimation for a number of candidate model structures.

The results confirm the superiority of the global optimization methods over the local ones in the context of structure identification.

Thursday, 21.6.2012, 13:00, Orange room, IJS

2012-04-10

Janez Kranjc, IJS: Cloud data mining: constructing and executing data mining workflows using a service oriented web application

I will present a freshly launched browser-based platform for construction and execution of data mining workflows with an interface similar to those of Orange, RapidMiner, and Weka. What differentiates my platform from the aforementioned is its cloud computing nature - executions of workflows are handled by the server and may utilize (but are not limited to) web services. The non local nature of the workflows and experiments also makes it easier for sharing workflows as no installation is required (apart from a semi-decent web browser).
The presentation will include a brief description of the development of the platform, a use case for novice users, and an example of an advanced user adding features that suit his own need.

Friday, 13.4.2012, 14:00, Orange room (note the different time!)

2012-03-30

Matic Perovšek, IJS: Visual divisive hierarchical clustering using k-means

In this presentation I will show a browser-based semi-automatic taxonomy construction tool Vdhcuk, similar to Ontogen. The Vdhcuk system incorporates text and data-mining algorithms into an user-friendly ontology construction interface. The main features of the presented system are: browser-based, usefulness for textual and numerical data, usage of unsupervised learning for concept suggestion and visualization.

Thursday, 5.4.2012, 13:00, Orange room

2012-03-13

Biljana Mileva Boshkoska, Applications of qualitative option ranking with copulas

Copulas are joint cumulative functions that have been widely used in finance, hydrology and biology and recently they attract popularity in machine learning. In this presentation I will show how copulas may be used for decision making, in particular for multi-attribute qualitative option ranking. First I will give a short introduction to copulas, then I will explain the process of performing regression using copulas and finally I will demonstrate their applicability for option ranking on two real examples:
1. Ranking of workflows;
2. Ranking of 840 real case EC motors.

Thursday, 15.3.2012 13:00, Orange room

2012-02-15

Dragana Miljković, IJS - Constructing signalling network topology for modelling plant-virus interaction

The presentation will address my work that will be submitted as a journal publication. It addresses the issue of developing a topology for global defence response model in plants. This model still does not exist even though the biologists are motivated to develop it since decades. As the experimental data is still lacking in this research field, we have started to manually build the model topology of plant-virus interaction by knowledge elicitation from biology experts and the literature. To accelerate this time-consuming approach, we employed additionally the automatic method to extract information from biological literature. To sum up, this work results in two major contributions to the scientific community:

1. A workflow that extracts the relations between compounds from biological text in the form of triplets: .
2. An augmented topology of the plant defence response model that is accomplished by combining manual and automatic approach.

Thursday, 16.2.2012 13:00, Orange room