# Research

## Statistics in Metric Spaces

We're working on modeling and testing methods for random objects that exist in one or more metric spaces (e.g. networks, probability distributions, positive definite matrices).

**More detail:**

Traditional networks research has often focused on the local or global properties of a single observed network (e.g. dyads or community structure within a social network). On the other hand, there is increasing availability of data involving several networks (e.g. FMRI coactivation networks, daily computer network traffic), and research questions require treating the entire network as a random object. This departure from Euclidean spaces requires reconstructing familiar statistical tools (e.g. ANOVA, regression) in the more general setting of a metric space. We're investigating multivariate settings where we may have dependent objects from different metric spaces (e.g. a random histogram and a random network for each observation).

## Nonuniform Sampling of Binary Matrices

We developed a way to sample binary matrices with fixed margins, where the probability distribution is intentionally not uniform.

**More detail:**

Binary matrices capture relationships between sets of people, animals, things, etc. To determine how noteworthy an observed binary matrix is, one can compare it to randomly generated matrices with the same margins (row and column sums). Often times the sampling is done so that each matrix has equal probability of being sampled. However, in some instances we may wish to give more weight to some matrices and less weight to others (non-uniform sampling). We introduce a method of sampling non-uniformly from binary matrices with fixed margins.

## Protein Interface Prediction

We trained a model to predict which parts of two proteins form the interface between them.

**More detail:**

Many cellular processes rely on proteins, which facilitate these processes via their interactions with one another and with small molecules within the cell. Understanding protein interactions is key to disease and pharmaceutical research, as well as our understanding of basic cellular biology. Proteins normally interact via an interface, a localized region of the protein with special properties. Experimentally identifying the interface between two proteins is a time consuming and expensive process which involves crystallization of the protein complex and imaging via x-ray crystallography or nuclear magnetic resonance. In general, more than one hundred thousand proteins have been crystallized and imaged in the last 40 years, but doing so for weakly interacting protein complexes remains a challenging problem. In contrast, computational methods are faster, cheaper, and complement wet lab experiments by identifying the most relevant and worthwhile experiments to attempt in vivo.

*The first principle is that you must not fool yourself — and you are the easiest person to fool*- Richard Feynman