In the Austrian parlament, all speeches of politicians are recorded and journalized. Not too long ago, this data has been made publicly available: parlament speeches.
This project will focus on analyzing these parlament speeches by combining methods of text and visual analysis, with a focus on how political communication has changed over time. One idea, for instance, is to use sentiment analysis to identify and categorize opinions expressed in the speeches. A visual interface could then not only allow to see how sentiments have changed over time, but also reveal potential differences between parties, individual candidates, specific time periods (e.g. before an election), or known polarized topics. An additional idea is to take a more closer look at the offenses (dt: Verwarnungen), which are documented in the data.
In 2012, the Austrian parliament passed a law that made it mandatory for governmental organizations to disclose their expenses for advertisements in different media (TV, radio, print, as well as online). This so-called “media transparency database” is publicly available: data. The data contains the accumulated amount of money transferred in a certain quarter of the year for each governmental organization and media company. It is interesting to visualize this data as this will allow to investigate (direct and indirect) governmental advertisement in media, a common way to influence press opinion.
In this project, you will build an interactive visualization tool that allows to explore and understand this data, and how it has changed over time. The data itself can be modeled as a graph with organizations as the nodes, and money flow as edges. The graph is dynamic, that is, it changes over time. The tool should not only allow to get an overview over the data and its temporal changes, but also to interactively drill down and investigate certain hypotheses in more detail.
Gaia is an ambitious ESA mission to chart a three-dimensional map of our Galaxy, with a first data release towards the end of 2016. Gaia will provide unprecedented positional and radial velocity measurements with the accuracies needed to produce a stereoscopic and kinematic census of about one billion stars in our Galaxy. The Astronomical community has never been faced with such a exciting but challenging task.
The goal of this project is to get a first contact with the newly released astronomical data base and perform a clustering analysis towards a search for new types of astronomical objects or groups of objects. This will be done on a sub-set of the full data, for which distances are known (about 2 million stars). We will first identify known groups of stars, extract parameters, and search for so far non identified, or previously ill-defined objects or structures.
Spreadsheets are arguably one of the most used computer tools in todays world -- not just in a financial and budget setting, but also for scientific as well as personal purposes. However, one of the most crucial consideration in understanding and predicting budgets is the uncertainty of the outcomes. Whether it is the financial forecast of a company, the budgeting of conferences and workshops, ... incorporating the variance of possible scenarios is cumbersome to impossible. In this project you will create FuzzySpreadSheets which will incorporate an elaborate uncertainty analysis in a common spreadsheet with a focus on usability, proper visual encoding, computational concerns, as well as cognitive aspects. Currently a spreadsheet cell holds just one numerical value, a FuzzySpreadSheet cell will hold a number, a set of numbers, an interval or more generally a probability distribution function. This requires some thoughts on how to specify the complex cell contents in an effective and user-friendly way.
Histograms are often used as the first method to gain a quick overview over the statistical distribution of a collection of values, such as the pixel intensities in an image.
Depending for example on the datatype of the underlying data (categorical, ordinal or continuous) and the number of data values that are available, several visualization parameters can be considered in constructing a histogram:
The perception of a histogram might vary quite a bit depending on the exact parameters chosen, and this might also influence the interpretation. On some of the above points, you should be able to find literature already.
Come up with visualisations of Dota 2 Match results (or better tools/UI for observing games)! The whole world is talking about Data, ahem Dota. Dota 2 is a popular and the probably most complex MOBA out there. It also involves several big tournaments, the so called Majors and the International that all have price moneys in the millions of Dollars. Visualise trends, patterns and give the user the chance to visualise his performance development.
References:
Related Work: https://arxiv.org/ftp/arxiv/papers/1603/1603.07738.pdf
The lecture on Algorithm and Data Structures includes an assignment in which each student has to implement a data structure for sorting. At the end of the course, it is possible to compare one's own results with everybody else. However, there is a large set of graphs that are being produced and one looses the overview very quickly. The task is to create an interactive program to create a better overview of all the data as well as good interaction techniques to quickly drill down into the relevant details a user would like to see. (download data, 365KB)
Latex is a very common document formatting language used for writing academic papers, grants, and reports. The default layout parameters of LaTeX result in very nicely formatted text but do not take into account final page length. Often one of the major limiting factors of these publications is page length. However, often papers are first written and refined without considering this limitation and then edited down to fit in the length requirement very close to the deadline. Because LaTeX's layout algorithm is somewhat unclear how it will adjust document length to changes in the source text and formatting paramters. Therefore, writers must resort to a long process of trial and error changing document content, adjusting formatting parameters, and recompiling the document in order to see if the document is within the page limits and still aesthetically acceptable. With some heuristics of document layout plus parameter space exploration and visualization of the final results it would be much clearer how the document will change this would allow multiple changes to be made at once and fewer recompilations. This process could be made much faster.
The idea is to support your fellow students in the Computer Graphics class. One of the most difficult things here is to debug your code. One way to do this is to try a lot of different parameters in an efficient way. E.g. lots of different material parameters, different light intensities, different camera placements and so on. Imagine you had such a tool which would create lots of images with these different setting and quickly lets you browse through your results. Wouldn't that be great? Well, you could create one such tool!
One of the difficulties with machine learning is to really understand how an algorithm/algorithm family works. The goal here would be to pick a particular algorithm/algorithm family and help the user to better understand it by visualizing their behaviour. One way (but a promising way) is to expose their parameters and create lots of different results of the algorithms by varying these parameters. The summary of the results gives an overview of what this "black box" is capable of. Pick your favourite algorithm/algorithm family (SVM, clustering, Deep Learning, Neural networks, etc.) and develop such a tool.
Of particular interest to us are
In the last years Deep Networks, a special kind of artificial neural network with many layers, have revolutionised many fields such as Natural Language Processing or Computer Vision.
For image classification the Deep Networks are able to distinguish 1000s of different classes, unfortunately it’s not always clear for which type of class (e.g. dogs) the network works better and for which it doesn’t. In classic machine learning there’s the concept of confusion matrices which are a way to organise classification and mis-classification results in a simple matrix. While standard visualizations of these matrices are still usable up to about 12 classes, they unfortunately won’t scale up to matrices of size 1000x1000 as encountered in modern Computer Vision datasets.
Your job is to create new visualisations that scale to very large confusion matrices and enable an computer vision expert to understand the classification accuracy of his current algorithm, i.e, a convolutional neural network.
Description: The recent advent of high-capacity flash has initiated a revolution in the design of data-center storage software and hardware. Modern designs use a hybrid approach, combining a high-performance flash tier with a high-capacity, much slower spinning disk tier. Effective designs of these hybrid storage systems require understanding how user workloads interact with the underlying system. Using the concept of workload locality is a powerful way to measure this interaction. The Counter Stack is a derived, low-space representation of a workload trace that preserves locality and provides useful measures of such as miss-ratio curves and histograms of the hotness and coldness of the underlying data stream.
Analysis tasks in computer storage are ripe for assistance from modern visualization tools. Any contributions your project can make to accelerating the task of understanding of a set of workloads will likely have high impact in this field.
Task ideas:
Task 1: Build interactive visualizations that provide a useful overview of the entire workload history in the system as well as having the facility to drill down into smaller time periods. Where are periods with lots of new data?
Task 2: Build a visualization to understand the temporal characteristics of miss-ratio curves over time. Do these curves exhibit periodic behavior? Are there periods of stability, in which the miss-ratio curve doesn't change much over time?
Task 3: Build a visualization to understand how workloads interact with each other over time. Given a set of workloads, is there one which "pollutes" the high-speed tier, pushing out useful data in the process? Is there an interactive way to schedule a set of workloads to reduce these interactions?
Info/Code/Data:
Background Information: A good visualization design for this project will require a high-level understanding of miss-ratio-curves and LRU caches. Below is a link to a paper and video and a presentation video describing the concepts necessary to building this understanding.
Code: A set of binaries and scripts for running queries against the counter stack database. These binaries are provided under a non-commercial license exclusive to the University of Vienna. They are accompanied by a API description for integrating the database into a project. Please see the instructor about getting this code.
Data: We will provide a counter-stack database derived from week-long set of storage workloads published by Microsoft Research.
High-frequency trading is becoming more and more dominant at stock exchanges. Understanding and analyzing the market of high-frequency trading is a hot research area in financial statistics. The overall goal of this project is to create a visual analysis tool that helps to explore a larger number of stocks and their trading behavior throughout a certain time frame.
The data: The basis of the analysis will be high-frequency trades as collected and distributed by the lobster web service. The data consists of a time stamp, event type, size of an order, price, as well as the type (buy or sell). Further, so called order books are available that register all ask and bid prices and volumes.
A selection of (basic) tasks: The overall challenge is in visually displaying the orderbooks for different / several stocks in order to
Different higher-level goals:
For the purposes of this project we can help you gain access to the lobster database.
There has been a deluge of open data by various government and governmental organization over the last few years. While this is admirable, what good is all this data doing if the common citizen is not being able to understand, explore, nor learn from this data. Hence, the goal is to develop a tool (ideally) web based that helps people to explore such data. One of the challenges will be to gear this tool toward a broad set of people, hence you cannot assume a great visual literacy (a problem the New York times has been struggling with and perhaps is providing some ideas for). Further, it is unrealistic to provide a universal tool where all types of data can be explored with and all questions can be answered with. Hence, it'll be important to narrow your focus on specific aspect of civic life. There are quite a number of open data sources that you can choose from:
While we are not expecting you to create winning entries to these visualization challenges, these are often well thought out problems that are fun and solvable. See whether any tickle your interest.