2013-05-18
Discussion: ClowdFlows and MUSE
The workflow construction and execution environment ClowdFlows will be used in the EU MUSE project (Machine Understanding for interactive StorytElling), which brings with it questions of interoperability of data formats, including third-party tools, and better organisation of the widget library.
This brown bag seminar is meant as a discussion forum for these issues, focusing on MUSE-produced HLT tools.
Tuesday, 21.5.2013, 13:00, Orange room
2012-10-03
Petra Kralj Novak and Anže Vavpetič, JSI: Risk forecasting analysis
Within the First and FOC projects, news articles from 305 news sites (3457 RSS feeds) have been collected, processed and semantically annotated for the period of one year, thus resulting in a rich playground for data miners.
In this brown bag seminar, we will present the data and the results of experiments we have done so far. We seek for interesting (feasible) problem formulations to come closer to predicting financial systemic risks.
Thursday, 4.10.2012, E7 Meeting room
In this brown bag seminar, we will present the data and the results of experiments we have done so far. We seek for interesting (feasible) problem formulations to come closer to predicting financial systemic risks.
Thursday, 4.10.2012, E7 Meeting room
2012-06-20
Darko Čerepnalkoski, IJS: The influence of parameter fitting methods on model structure selection in automated modeling of aquatic ecosystems
I will present a journal publication that just got accepted. I will talk about ProBMoT, a tool for automated modeling
of dynamical systems that addresses both structure identification and parameter estimation.
It takes into account domain knowledge formalized as templates for components of the process-based models: entities and processes. Taking a conceptual model of the system, the library of domain knowledge, and measurements of a particular dynamical system, it identifies both the structure and numerical parameters of the appropriate process-based model. ProBMoT has two main components corresponding to the two subtasks of modeling. The first component is concerned with generating candidate model structures that adhere to the conceptual model specified as input. The second subsystem uses the measured data to find suitable values for the constant parameters of a given model by using parameter estimation methods. ProBMoT uses model error to rank model structures and select the one that fits measured data best.
I will also present the analysis of the influence of the selection of the parameter estimation methods on the structure identification. I will discuss one local (derivative-based) and one global (meta-heuristic) parameter estimation method. As opposed to other comparative studies of parameter estimation methods that focus on identifying parameters of a single model structure, this presentation will compare the parameter estimation methods in the context of repetitive parameter estimation for a number of candidate model structures.
The results confirm the superiority of the global optimization methods over the local ones in the context of structure identification.
Thursday, 21.6.2012, 13:00, Orange room, IJS
It takes into account domain knowledge formalized as templates for components of the process-based models: entities and processes. Taking a conceptual model of the system, the library of domain knowledge, and measurements of a particular dynamical system, it identifies both the structure and numerical parameters of the appropriate process-based model. ProBMoT has two main components corresponding to the two subtasks of modeling. The first component is concerned with generating candidate model structures that adhere to the conceptual model specified as input. The second subsystem uses the measured data to find suitable values for the constant parameters of a given model by using parameter estimation methods. ProBMoT uses model error to rank model structures and select the one that fits measured data best.
I will also present the analysis of the influence of the selection of the parameter estimation methods on the structure identification. I will discuss one local (derivative-based) and one global (meta-heuristic) parameter estimation method. As opposed to other comparative studies of parameter estimation methods that focus on identifying parameters of a single model structure, this presentation will compare the parameter estimation methods in the context of repetitive parameter estimation for a number of candidate model structures.
The results confirm the superiority of the global optimization methods over the local ones in the context of structure identification.
Thursday, 21.6.2012, 13:00, Orange room, IJS
2012-04-10
Janez Kranjc, IJS: Cloud data mining: constructing and executing data mining workflows using a service oriented web application
I will present a freshly launched browser-based platform for construction and execution of data mining workflows with an interface similar to those of Orange, RapidMiner, and Weka. What differentiates my platform from the aforementioned is its cloud computing nature - executions of workflows are handled by the server and may utilize (but are not limited to) web services. The non local nature of the workflows and experiments also makes it easier for sharing workflows as no installation is required (apart from a semi-decent web browser).
The presentation will include a brief description of the development of the platform, a use case for novice users, and an example of an advanced user adding features that suit his own need.
Friday, 13.4.2012, 14:00, Orange room (note the different time!)
The presentation will include a brief description of the development of the platform, a use case for novice users, and an example of an advanced user adding features that suit his own need.
Friday, 13.4.2012, 14:00, Orange room (note the different time!)
2012-03-30
Matic Perovšek, IJS: Visual divisive hierarchical clustering using k-means
In this presentation I will show a browser-based semi-automatic taxonomy construction tool Vdhcuk, similar to Ontogen. The Vdhcuk system incorporates text and data-mining algorithms into an user-friendly ontology construction interface. The main features of the presented system are: browser-based, usefulness for textual and numerical data, usage of unsupervised learning for concept suggestion and visualization.
Thursday, 5.4.2012, 13:00, Orange room
Thursday, 5.4.2012, 13:00, Orange room
2012-03-13
Biljana Mileva Boshkoska, Applications of qualitative option ranking with copulas
Copulas are joint cumulative functions that have been widely used in finance, hydrology and biology and recently they attract popularity in machine learning. In this presentation I will show how copulas may be used for decision making, in particular for multi-attribute qualitative option ranking. First I will give a short introduction to copulas, then I will explain the process of performing regression using copulas and finally I will demonstrate their applicability for option ranking on two real examples:
1. Ranking of workflows;
2. Ranking of 840 real case EC motors.
Thursday, 15.3.2012 13:00, Orange room
1. Ranking of workflows;
2. Ranking of 840 real case EC motors.
Thursday, 15.3.2012 13:00, Orange room
2012-02-15
Dragana Miljković, IJS - Constructing signalling network topology for modelling plant-virus interaction
The presentation will address my work that will be submitted as a journal publication. It addresses the issue of developing a topology for global defence response model in plants. This model still does not exist even though the biologists are motivated to develop it since decades. As the experimental data is still lacking in this research field, we have started to manually build the model topology of plant-virus interaction by knowledge elicitation from biology experts and the literature. To accelerate this time-consuming approach, we employed additionally the automatic method to extract information from biological literature. To sum up, this work results in two major contributions to the scientific community:
1. A workflow that extracts the relations between compounds from biological text in the form of triplets:.
2. An augmented topology of the plant defence response model that is accomplished by combining manual and automatic approach.
Thursday, 16.2.2012 13:00, Orange room
1. A workflow that extracts the relations between compounds from biological text in the form of triplets:
2. An augmented topology of the plant defence response model that is accomplished by combining manual and automatic approach.
Thursday, 16.2.2012 13:00, Orange room
2012-01-24
Tomaž Erjavec: Language Resources and Tools for Historical Slovene, 2.2.2012
Recent years have seen an explosive growth in the number of texts that are available in digital libraries, such as Google books and the dLib.si library of the National and University Library of Slovenia. Most of these books are old - they promote our cultural heritage and are out of copyright, which makes publishing them on the Web much easier. However, digital historical texts bring with them a number of problems. It is difficult to do full-text search on them, as spelling of words has changed over time and there is no support for lemmatisation. Furthermore, such books are typically available only as PDF scans, and automatic OCR is of poor quality, esp. for materials older than 1900.
This talk presents work of the last two years in producing a set of language resources for historical Slovene, and associated tools, aimed at alleviating these problems, as well as enabling computer supported studies of historical Slovene. I will present an annotated reference corpus of 1,000 pages of historical texts, a text collection of a few million words, a computational lexicon, and a tool for text annotation. For the resources and tools each word is first modernised, and then tagged and lemmatised. The modernisation relies on a transcription rules, and has the benefit of making the text easier to read by today's speakers, as well as enabling standard tagging and lemmatisation models to be used on the text. We present the workflow and tools used in developing the resources, and show the results.
This work in progress, to be finished in 2012, is supported by the EU project IMPACT "Improving Access to Text" and the Google Humanities Research Award "Developing computational models for historical Slovene".
This talk presents work of the last two years in producing a set of language resources for historical Slovene, and associated tools, aimed at alleviating these problems, as well as enabling computer supported studies of historical Slovene. I will present an annotated reference corpus of 1,000 pages of historical texts, a text collection of a few million words, a computational lexicon, and a tool for text annotation. For the resources and tools each word is first modernised, and then tagged and lemmatised. The modernisation relies on a transcription rules, and has the benefit of making the text easier to read by today's speakers, as well as enabling standard tagging and lemmatisation models to be used on the text. We present the workflow and tools used in developing the resources, and show the results.
This work in progress, to be finished in 2012, is supported by the EU project IMPACT "Improving Access to Text" and the Google Humanities Research Award "Developing computational models for historical Slovene".
2011-02-22
Daniela Stojanova: Global and Local Spatial Autocorrelation in Predictive Clustering Trees
Spatial autocorrelation is the correlation among data values, which is strictly due to the relative location proximity of the objects that the data refer to. This statistical property clearly indicates a violation of the assumption of observation independence - a pre-condition assumed by most of the data mining and statistical models. Inappropriate treatment of data with spatial dependencies could obfuscate important insights when spatial autocorrelation is ignored:.
We propose a data mining method that explicitly considers autocorrelation when building the models.
The proposed approach combines the possibility of capturing both global and local effects (common to top-down model tree learners) and detecting / dealing with positive spatial autocorrelation (common to spatial statistical methods). As a consequence, the discovered models adapt to local properties of the data, providing at the same time spatially smoothed predictions.
Thursday, 24 February 2011, 11:00, MPŠ predavalinica
We propose a data mining method that explicitly considers autocorrelation when building the models.
The proposed approach combines the possibility of capturing both global and local effects (common to top-down model tree learners) and detecting / dealing with positive spatial autocorrelation (common to spatial statistical methods). As a consequence, the discovered models adapt to local properties of the data, providing at the same time spatially smoothed predictions.
Thursday, 24 February 2011, 11:00, MPŠ predavalinica
2011-01-31
Brownbag seminar: Primož Škraba: Topological Data Analysis
Topology is the branch of mathematics which studies spaces by how they are locally connected. Recently, quantatative topology,
specifically the notion of perisistence has found numerous applications in computer science. In this talk, I will introduce three applications of these ideas to data analysis. Specifically, I will discuss: 1.) clustering/unsupervised learning, 2.) study of periodic systems, 3.) robustness of maps.
The notion of stability is crucial if we are to have any hope of working with real data sets. Using these examples, I will define the stability of persistence and show how it helps overcome noise and limited knowledge of the underlying system/space. Finally, I present preliminary results on extending stability using statistical tools.
Thursday, 3 February 2011, 13:00, Orange room
specifically the notion of perisistence has found numerous applications in computer science. In this talk, I will introduce three applications of these ideas to data analysis. Specifically, I will discuss: 1.) clustering/unsupervised learning, 2.) study of periodic systems, 3.) robustness of maps.
The notion of stability is crucial if we are to have any hope of working with real data sets. Using these examples, I will define the stability of persistence and show how it helps overcome noise and limited knowledge of the underlying system/space. Finally, I present preliminary results on extending stability using statistical tools.
Thursday, 3 February 2011, 13:00, Orange room
Naročite se na:
Objave (Atom)