Hypothesis testing challenge

Introduction

The purpose of this challenge is to enhance your understanding of various hypothesis testing methods. You will be working with a dataset that includes 100 features (e.g, gene expression) across 100 samples, divided into ‘Treatment’ and ‘Control’ groups.

You can read the data using:

expression_data <- read.csv("https://raw.githubusercontent.com/PayamEmami/biostats1_challenge/main/expression_data.csv",row.names = 1)
metadata <- read.csv("https://raw.githubusercontent.com/PayamEmami/biostats1_challenge/main/metadata.csv",row.names = 1)

Task1:

Your task is to perform hypothesis testing using the methods we learned on Day 2 of the course, such as Bootstrap, Permutation, and Parametric tests, to identify differentially expressed genes. Additionally, please provide simple descriptive statistics, create a table summarizing the key data points, and generate a boxplot to visually assess the distribution of gene expression levels across different conditions.

Task2:

Fit a GLM for each feature and choose an appropriate covariate and also link function based on the distribution of your data (Day 3). Assess the significance of the group effect for each feature using the model coefficients and their p-values.

You can check your results using Venn diagram.

Check your results

Enter gene names (one per line) to check the overlap with the ground truth.