2012-02-14

Dragana Miljković, IJS - Constructing signalling network topology for modelling plant-virus interaction

The presentation will address my work that will be submitted as a journal publication. It addresses the issue of developing a topology for global defence response model in plants. This model still does not exist even though the biologists are motivated to develop it since decades. As the experimental data is still lacking in this research field, we have started to manually build the model topology of plant-virus interaction by knowledge elicitation from biology experts and the literature. To accelerate this time-consuming approach, we employed additionally the automatic method to extract information from biological literature. To sum up, this work results in two major contributions to the scientific community:

1. A workflow that extracts the relations between compounds from biological text in the form of triplets: .
2. An augmented topology of the plant defence response model that is accomplished by combining manual and automatic approach.

Thursday, 16.2.2012 13:00, Orange room

2012-01-24

Tomaž Erjavec: Language Resources and Tools for Historical Slovene, 2.2.2012

Recent years have seen an explosive growth in the number of texts that are available in digital libraries, such as Google books and the dLib.si library of the National and University Library of Slovenia. Most of these books are old - they promote our cultural heritage and are out of copyright, which makes publishing them on the Web much easier. However, digital historical texts bring with them a number of problems. It is difficult to do full-text search on them, as spelling of words has changed over time and there is no support for lemmatisation. Furthermore, such books are typically available only as PDF scans, and automatic OCR is of poor quality, esp. for materials older than 1900.

This talk presents work of the last two years in producing a set of language resources for historical Slovene, and associated tools, aimed at alleviating these problems, as well as enabling computer supported studies of historical Slovene. I will present an annotated reference corpus of 1,000 pages of historical texts, a text collection of a few million words, a computational lexicon, and a tool for text annotation. For the resources and tools each word is first modernised, and then tagged and lemmatised. The modernisation relies on a transcription rules, and has the benefit of making the text easier to read by today's speakers, as well as enabling standard tagging and lemmatisation models to be used on the text. We present the workflow and tools used in developing the resources, and show the results.

This work in progress, to be finished in 2012, is supported by the EU project IMPACT "Improving Access to Text" and the Google Humanities Research Award "Developing computational models for historical Slovene".

2011-02-22

Daniela Stojanova: Global and Local Spatial Autocorrelation in Predictive Clustering Trees

Spatial autocorrelation is the correlation among data values, which is strictly due to the relative location proximity of the objects that the data refer to. This statistical property clearly indicates a violation of the assumption of observation independence - a pre-condition assumed by most of the data mining and statistical models. Inappropriate treatment of data with spatial dependencies could obfuscate important insights when spatial autocorrelation is ignored:.

We propose a data mining method that explicitly considers autocorrelation when building the models.
The proposed approach combines the possibility of capturing both global and local effects (common to top-down model tree learners) and detecting / dealing with positive spatial autocorrelation (common to spatial statistical methods). As a consequence, the discovered models adapt to local properties of the data, providing at the same time spatially smoothed predictions.

Thursday, 24 February 2011, 11:00, MPŠ predavalinica

2011-01-31

Brownbag seminar: Primož Škraba: Topological Data Analysis

Topology is the branch of mathematics which studies spaces by how they are locally connected.  Recently, quantatative topology,
specifically the notion of perisistence has found numerous applications in computer science. In this talk, I will introduce three applications of these ideas to data analysis. Specifically, I will discuss: 1.) clustering/unsupervised learning, 2.) study of periodic systems, 3.) robustness of maps.

The notion of stability is crucial if we are to have any hope of working with real data sets. Using these examples, I will define the stability of persistence and show how it helps overcome noise and limited knowledge of the underlying system/space. Finally, I present preliminary results on extending stability using statistical tools.

Thursday, 3 February 2011, 13:00, Orange room



2011-01-10

Rok Piltaver: Generating accurate AND understandable hybrid classification trees

Most algorithms that induce classifiers primarily aim for high accuracy while they take understandability into account as a secondary measure of classifier's quality. The algorithm that will be presented efficiently generates a range of hybrid classification trees ranging from the most understandable to the most accurate ones by combining an easy to understand decision tree with a black-box classifier with a high accuracy. That enables the user to choose how much of accuracy he/she is willing to sacrifice for a higher understandability or vice versa.


Thursday, 13.1.2011, 13:00, Orange room

2010-12-22

Mitja Trampuš: Graph pattern mining for article template discovery

We will have a look at a generalized variant of frequent tree pattern mining where each pattern's instances don't need to match exactly, but rather have to be "similar enough", where similarity is measured with the help of a taxonomy of node labels.
We will then apply the approach to discovering article templates for a set of related articles (e.g. news articles on bombings; or wikipedia biographies of physicists).

Thursday, 23.12.2010, 13:00, Orange room

2010-12-14

Andreea Bizau: Determining Adjective Semantic Orientation. The beginnings of TweetMDb

This Thursday, Andreea Bizau will present her work on sentiment mining, focused on constructing a map of interesting adjectives, related to expressing sentiment orientation in social media.

Abstract:
There is a huge volume of user-generated content expressing sentiments, preferences, opinions. We are interested in how people express opinions in a certain domain (e.g. movies) and how can opinion indicators determined from one opinion source (reviews) be used to analyze another opinion source (twitter)?
 


Thursday, 16.12.2010, 13:00, Orange room

2010-10-26

Laura Langohr: Biomine and beyond - a biomedical data warehouse and related graph mining research

Information is often modeled as a network of objects or concepts. In the talk I will introduce Biomine, a biomedical network and data warehouse, which consists of about one million of nodes (biological concepts) and about nine million edges (relationships). Further, I will discuss some related graph mining research, especially, finding representative nodes as well as retrieving relevant and non-redundant objects.

Thursday, 11.11.2010, 13:00 Orange room

2010-10-06

Nada Lavrač: Advances in data mining for biomedical research (7.10.2010)

This Thursday, Nada Lavrač will present her keynote talk from the Medinfo 2010 Conference with the title Advances in data mining for biomedical research.

Thursday 7.10.2010, Orange room, 13:00

2010-09-16

Blaž Fortuna: best of KDD 2010

Today on 16.9.2010, we are opening a new season of Brown Bag seminars with Blaž Fortuna, who will give a talk on the highlights of the ACM SIGKDD conference on knowledge discovery and data mining. Everyone welcome.

Thursday, 16.9.2010, 13:00, Orange Room