Till startsida
Sitemap
To content Read more about how we use cookies on gu.se

Data science has emerged as a new research paradigm

"Data science" is about how to process, analyse and extract knowledge from very large quantities of data, "big data". The area is growing with the speed of the data sets themselves and the IT Faculty is going to start a new master's programme in data science in 2017.

How do we use all the data that are generated in our society? What are the opportunities and risks from a social perspective? What can the data bring to research? How can we combine data sets in order to derive greater value? And how are we avoiding misinterpretations of the data?

The area "data science", large-scale data processing, has emerged as a result of the extended access to an increasing amount of complex data. The datasets mean new opportunities for diverse fields, from mapping of a genome to business analysis and predicting climate scenarios. Data science is also used in many areas to facilitate decision-making, where the patterns that are identified in existing datasets will be a basis for forecasting the future.

Data science is affecting all areas where data volumes are generated

Data science affects all areas where large amounts of data are generated – and almost every area generates data today. Public transport, Internet searches, medical records, access cards, security cameras, intrusion detection, EAN codes, social insurance statistics, GPS systems, call statistics, financial operations, environmental stations, recording equipment, embedded computers in consumer electronics and in our cars, incident reporting, motion detectors… The list is endless.

One thing that is a bit special is that it is technology that largely controls development in the field of data science. First: the amount of data generated, second: the possibility of storing the data – and finally that there are computer programs that make analysis possible.

Complex combination of technology, interdisciplinary work and analysis

Developments in data science means great demands on the computer scientists and analysts, since the area is at the intersection of statistics, artificial intelligence and database management. To get something useful out of the huge amounts of data the right questions must be asked, very well-defined datasets must be combined in a measurable way - and the people involved must have very good analytical knowledge to interpret the results obtained and be able to understand in a thorough way exactly which variable influence others.

Data science requires a good knowledge of the area to explore, whether it is about biological data, web statistics or data generated from the financial market. This requires interdisciplinary work, the biologist or stockbroker needs to have insights into the conditions when analysing data sets and vice versa – computer scientists need to master and understand the conditions and relations in the examined area.

New opportunities for research – and also new demands for new research

Data science also means new opportunities for research, where the data sets can extract research materials that were previously not accessible. This involves both more established research like mapping of DNA for example, but also the generation of new research areas that have arisen since the data sets make them possible.

Another aspect is that it is now necessary to do some research on this field as such - how should one handle huge amounts of data?

 



Some of our researchers’ view on the area:

What does data science mean to you and how is your research related to the area?


Graham Kemp, Department of Computer Science and Engineering:

– Data-intensive science, which has been called "the fourth research paradigm", is what comes to my mind. Here scientific investigations are carried out by using computers to explore data, rather than by observing physical entities directly.

– In structural bioinformatics, the main data resource we use is the Protein Data Bank, which today contains data on over 100,000 experimentally determined macromolecular structures. By analysing the surfaces of proteins in this data bank we have found that some spatial arrangements of atomic groups have a higher propensity than others for being located in ligand binding sites – this work was done in collaboration with the University of Edinburgh. In another project in collaboration with Biognos AB, we used machine learning methods to analyse data from the Protein Data Bank and also a collection of experimentally measured binding affinity data to build models for predicting whether the binding energy of a protein-ligand complex is dominated by the enthalpy or the entropy term. A current project is using conformational knowledge obtained from exploring many known protein structures to help build new protein models.

– Big data from scientific applications presents challenges for database technology. For over 15 years I've been interested in the technical challenges of processing queries in a federation of heterogeneous database systems. Such capabilities will be increasingly important in the future since in the big data era it is not always feasible to copy large quantities of data to a central location for analysis, so effective data exploration might rely instead on sending queries to the data held at various locations and in a variety of formats, and retrieving and integrating small result sets or summaries.

 

Marie Eneman, Department of Applied IT:

– When it comes to "data science" in relation to my area of research that deals with IT and the sexual exploitation of children, I refer to a new project where I will examine what the praxis looks like when the police is investigating child pornography offenses. The large amount of material in terms of pictures and movies that the police have to deal with in their investigative work means that it often takes a very long time to go through the material. Existing methods are not adapted to handle such large volumes of data in an efficient way.

– The complexity of using and processing these large volumes of data also becomes very obvious when I look at how the police are working to identify the victims who are present in the material. The victim identification work is not always executed as it is now, partly because the amounts of materials are difficult to handle. Some police officers also expressed that they did not know how to use the technology to carry out this type of work.

 

Devdatt Dubhashi, Department of Computer Science and Engineering:

– The terms of "data science" and "big data" are popping up everywhere today – for example in my Finnair in-flight magazine a few weeks ago. Businesses need data science, natural science needs it – "the Fourth Paradigm" – and even social scientists and humanities have embraced it with smart cities and digitized culture.

– In the project "Culturomics" we use data science to automatically infer in what sense an ambiguous word is used e.g does "rock" mean rock music or a stone, and does "java" mean the holiday island or a specific kind of coffee? We are also using data science to see how language changes over time. The word "gay" means a sexual preference today for example, whereas a century ago it meant a nice party!

– In the Vinnova/Marie Curie Career Development mobility grant, we will work to strengthen links with industry both nationally in Sweden and internationally. An example application area is to use data science methods to repurpose drugs – that is to discover new diseases that an old drug could be redirected to.

– When the IT Faculty starts a new master’s programme in "Data Science" in 2017, we look forward to take part in developing the course structure for this exciting new venture!

 

Text: Catharina Jerkbrant

The IT Faculty is starting a new master's programme in data science in 2017

A growing demand for expertise within the field of data science has resulted in the decision to start a new master's programme at University of Gothenburg in 2017.

One of the main ideas is to have a multidisciplinary approach and a number of faculties have been contacted to discuss the composition of courses and possibilities for cooperation.

New international master's programme: Applied Data Science

Page Manager: Catharina Jerkbrant|Last update: 8/22/2015
Share:

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?