In the last part of this thesis, we perform experiments on different benchmark datasets in order to compare the performance of the proposed methods with existing techniques. Specifically, we experimentally check, whether additionally preserving the local neighborhood- structure improves the feature selection process. As expected, the proposed methods show an improvement in terms of measurements that quantify how well the selected subset of features is able to differentiate among the natural clusters in the data. Quackenbush Dr. Burkholz Mar Abstract: Genetic mutations accumulate over time and contribute to the development of cancer.
The order in which such mutations occur provides insights into cancer progression and might help on the long run to identify key time points for drug intervention via targeted treatments. We model the successive accumulation of mutations by a cascade process evolving in discrete time on an unobserved Directed Acyclic Graph DAG formed by interacting mutations that are equipped with a binary state. The aim of this work is to infer the DAG and the cascade model parameters based on mutational data summarized at the gene level.
These imply likely orders of mutations in time, yet we observe only the end state of the proposed cascade process. In this setting, we utilize the Bayesian network learning framework developed in BiDAG R to infer a posterior distribution of DAGs and cascade model parameters. We apply this methodology to colorectal cancer data and contrast results from two more general parametric propagation models.
Additionally, we show the consistency of our method on synthetic data. Hongkyu Kim Regression Discontinuity Design Prof. Marloes Maathuis Mar Abstract: The regression discontinuity RD design is a branch of the observational study, which locally resembles the randomized experiment and is used to estimate the local causal effect of a treatment that is assigned fully, or partly by the value of a certain variable and a threshold.
As the RD design is the subject of the causal inference, important concepts of the causal inference are covered to properly proceed discussions. Based on those concepts, the fundamental idea and structure of the RD design are explained including two sub types of the design: the sharp and the fuzzy RD designs. Furthermore, the assumptions of the RD design is formulated, which have been slightly different in different fields. In order to accurately estimate the local causal effect without confounding, we introduce the bandwidth and use the data that are within the bandwidth away from a threshold only.
Performances of these bandwidth selection methods are compared with simulated data, and it can be inferred that the newly proposed method may yield better results. At the end, we intentionally violate the unconfoundedness assumption and analyze three potential confounding models with simulated data. Fabian Patronic Predictions in Mixed Models Dr. Lukas Meier Mar Abstract: This thesis gives an introduction in mixed models and compares empirically different meth- ods of estimating the prediction intervals in a simulation study.
The issue in estimating such intervals lies in the additional source of variation, that is induced by the random effects. The point estimate is estimated usually by the best linear unbiased estimator BLUE, except for the Bayesian method, which uses the mean of the posterior distribution. Marginal and Conditional prediction uses the prediction error made by the best linear unbiased predictor BLUP.
Those errors are estimated by the distribution of the BLUP Henderson Prediction intervals estimated by bootstrap methods simulate the er- ror made by the BLUE compared to the bootstrapped samples. It showed that the method using Bayesian statistics has the best coverage rate. It uses the posterior distribution and its quantile as the prediction interval.
Finally, a cheese tasting example is shown and a guidance how to use the different methods in R is given. Armeen Taeb Mar Abstract: In this thesis, we study causal Gaussian models that have a small number of latent confounders. More precisely, our goal is to estimate the Markov equivalence class of the under- lying directed acyclic graph structure of the observed variables, given a sample of independent observations.
This problem can be formulated as an optimization problem where the latent con- founder structure corresponds to a low-rank constraint, which is non-convex and difficult to deal with. We study and implement al- gorithms to approach this optimization problem and make it more computationally feasible. One of the proposed algorithms is based on nuclear norm regularization and one is based on projected gra- dient descent. Daniela Nguyen The Rasch model and its extensions from a statistical point of view Dr.
Lukas Meier Mar Abstract: The Rasch model is the most classical and popular model for psychological and educational testing. In this thesis, we introduce the Rasch model and its properties along with its correspondence to logistic regression. Due to the inconsistency of the full maximum likelihood estimators, we present two alternative approaches for the item parameter estimation based on the consideration of the people parameters.
We also present different extensions of the Rasch model which can consider polytomous responses or relax some restrictive assumptions of the Rasch model. Throughout this work, estimation methods and models are supported by some implementations in R. Lorraine Electre Bersier Estimating the real demand based on censored data Prof. Maathuis Alexandra Stieger-Federer Mar Abstract: In order to plan accurately the rail transport of goods across Switzerland, the real demand has to be known in advance.
However, during the booking process at SBB Cargo, said de- mand is not stored. The booking system of SBB Cargo is first thoroughly investigated and explained. The focus in this thesis is on the last one. Then, various statistical methods used to estimate the distribution of censored data are explained. Several censored parametric models are proposed and their right-censored log- likelihood functions are derived. Furthermore, it is described how the goodness-of-fit of censored parametric distributions can be tested and two nonparametric maximum likeli- hood estimation methods for censored data are explained.
In order to obtain the unconstrained demand from this variable, censored parametric distributions are fitted.
Master's theses – Seminar for Statistics | ETH Zurich
The Weibull distribution is shown to be the most accurate one. Using this, it is observed that around 8. This value is very encouraging. Marloes Maathuis Leonard Henckel Mar Abstract: Missing data is a common problem which can significantly affect any analysis. Partial Deletion and Multiple Imputation MI are two methods which allow us to use estimators which require complete data. In this thesis, we provide sufficient graphical conditions which allow us to modify the Adjustment Formula for inferring on causal effects using data with missing values while using Available Case Analysis, a special case of Partial Deletion, or MI.
The graphical conditions make use of m-graphs, or missingness graphs, which are a useful extension of causal graphs that allow us to encode causes of missingness into causal graphs. For MI, we focus on Joint Modelling JM , which imputes each missing value per row simultaneously.
A short discussion on Fully Conditional Specification, an alternative to JM, follows. We also present a modified Adjustment Formula which takes advantage of both PD and MI, by first removing rows with missing values in a subset of the variables, and then imputing the remaining missing values. Finally, we relax the conditions of previous results such that any bias caused by the missing data is avoided, but confounding bias remains. Ramona Wechsler Detecting predictive biomarkers for non-small cell lung cancer patients undergoing immunotherapy: A retrospective analysis of data from the University Hospital Zurich Prof.
Maathuis PD Dr. Alessandra Curioni Dr. Stefanie Hiltbrunner Mar Abstract: The aim of this thesis is to reveal possible predictive biomarkers for the effect of immuno- therapy on overall survival and tumor response for non-small cell lung cancer patients.
The used data was collected from patients treated with immunotherapy from March to January at the University Hospital of Zurich. The analyses are performed on two groups of patients. Patients who received chemotherapy before immunotherapy are called further line IT patients in this work and patients who got immunotherapy as first line treatment, sometimes combined with chemotherapy, are referred to as first line IT patients.
The applied methods are survival analysis, random forest and logistic regression. One possible predictive biomarker was detected for each of the patient groups with survival analysis for the effect of immunotherapy on overall survival. Further line IT patients with a higher PD-L1 expression in tumor cells had a longer overall survival than patients with a lower expression. In the first line IT group patients with a higher lymphocyte count had a longer overall survival than patients with a lower lymphocyte count. These results have to be treated with caution, since the risk of false positive findings is very high in this work.
The revealed biomarkers should be verified with further studies. No predictive biomarker was detected from classification with random forest or logistic regression of tumor response after three and after six months of immunotherapy.
The patient groups were small and therefore the risk to miss real predictive biomarkers is high for all analyses in this thesis. As a separate part in this work the risk of p-value hunting is described and a simulation to show the issue is performed. P-value hunting and multiple testing are huge problems in many clinical studies. Marloes Maathuis Alexandra Stieger-Federer Mar Abstract: In order to plan accurately the rail transport of goods across Switzerland, the real demand has to be known in advance.
Die MetaTrader Supreme Edition bei Ihrem Forex Broker
However, because of different reasons according to the booking process at SBB Cargo, the real demand is not available. As a result, most of the prediction made is based on the effectively transported quantity of goods and not on the customers' demand. To be able to propose in the future a more attractive offer, this master thesis derives estimates of the percentage of unsatisfied clients' demands.
The most important factors constraining the demand are found to be number of train drivers and locomotives, the routes' availability and the trains' capacity. Several censored parametric models are proposed and their right-censored log-likelihood functions are derived. Furthermore, it is described how the goodness-of-fit of censored parametric distributions can be tested and two nonparametric maximum likelihood estimation methods for censored data are explained. Using this, the percentage of unsatisfied clients' demands due to the limited trains' capacity is computed.
The value found is very encouraging. Indeed, this value is close to the value computed at SBB Cargo using all the limiting factors.
Seminar- und Abschlussarbeiten
Finally, this shows that the use of the censored Weibull distribution, to estimate the demand from the bounded trains' capacity variable, returns very promising results. Davide Luzzati Self-induced crises in a DSGE model Dr. Fadoua Balabdaoui Prof. However, Dy- namic Stochastic General Equilibrium DSGE models have been and still are the workhorse for monetary policy despite their poor perfor- mances in face of the Financial Crisis of In this work we set out a model which takes inspiration from a standard money-in-the- utility DSGE, but which differs for the presence of a feedback mecha- nism on the propensity of individuals to hold cash and bonds.
Namely, that people look also at what others do before optimizing their stan- dard utility function, in a similar fashion to the KUWJ Keeping Up With the Joneses phenomenon. Our aim is twofold. Firstly, we want to show that the presence of such mechanism is responsible for the system to go through a phase transition, from an economy with one equilib- rium point to one with three.