MF9155 – Introduction to statistics and bioinformatics for the analysis of large-scale biological data

Course content

The course considers methods integral to data analysis in modern molecular medical research. As such it is relevant to all PhD students and researchers who need to analyze large-scale molecular data themselves, as well as those who need to interpret results and understand publications in the molecular life sciences.

High-throughput techniques are becoming increasingly more prevalent in research in life sciences and the clinic. However, to make effective use of the resulting large datasets it is necessary to understand and apply more advanced statistical methods as well as be able to apply good practices in programming and data analysis. We will describe guidelines for good practice such as the FAIR data principles and introduce the statistical concepts behind typical data analysis tasks for large-scale biological data, including the following topics:

a) high-throughput screening (multiple testing and group tests),

b) unsupervised learning and data visualization (clustering and heatmaps, dimension reduction methods),

c) supervised learning (classification and prediction, cross-validation and bootstrapping).

We will also introduce reference sources and molecular databases that can aid interpretation and will show how they can be accessed and integrated into a data analysis.

Methods will be demonstrated by replicating analyses from publications and real-life gene expression data will be used in the computer labs.

To encourage continued learning after the course, we will also provide an overview of available web-based courses and exercises.

Learning outcome

Knowledge:

  • Learn important statistical and bioinformatics concepts for analysing molecular data, including good practices in programming and data analysis.
  • Have knowledge of the specific statistical challenges associated with the analysis of high-throughput biological data.
  • Know important molecular databases and relevant statistics/ bioinformatics software tools.
  • Understand some of the challenges you will face when trying to apply this knowledge to the analysis of real datasets.

Skills:?

  • Be able to identify the data analysis problem and match the appropriate type of statistical method and corresponding software.
  • Perform basic analyses of high-throughput biological data using R and Bioconductor.
  • Be able to understand and critically evaluate the data analysis procedures in publications in molecular biology/ molecular medicine.

Admission to the course

Maximum number of participants is 30-35.?PhD candidates at UiO will be prioritized.

Applicants admitted to a PhD programme at UiO apply to this course in StudentWeb.

Applicants who are not admitted to a PhD programme at UiO must apply for a right to study before they can apply to this course. See information here: ?How to apply for a right to study and admission to elective PhD courses in medicine and health sciences.

Applicants will receive a reply to the course application in?StudentWeb?at the latest one week after the application deadline.

Students should have passed the exam in an introductory course in statistics (for example?MF9130, MF9130E).

Students should also have working knowledge and practical experience in analysing data with?the statistical programming language R. Basic familiarity with the?Unix shell is also required, for example by having completed?a software carpentry workshop.

It is recommended that students have a basic understanding of molecular biology, at least roughly corresponding to 5-10 university study points in molecular biology or similar. Students would have completed an introductory course in R could for example complete an?introductory online course?or follow a?software carpentry course at UiO.

Overlapping courses

  • 5 credits overlap with