Certified Computation from Unreliable DatasetsSep 12 2017Jun 13 2018A wide range of learning tasks require human input in labeling massive data. The collected data though are usually low quality and contain inaccuracies and errors. As a result, modern science and business face the problem of learning from unreliable data ... More
Sampling CorrectorsApr 24 2015In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, this paper introduces the concept of a sampling corrector. Such algorithms use structure that the distribution is purported to have, in ... More
Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex CoverFeb 22 2018Jan 07 2019We present $O(\log\log n)$-round algorithms in the Massively Parallel Computation (MPC) model, with $\tilde{O}(n)$ memory per machine, that compute a maximal independent set, a $1+\epsilon$ approximation of maximum matching, and a $2+\epsilon$ approximation ... More
Communication and Memory Efficient Testing of Discrete DistributionsJun 11 2019We study distribution testing with communication and memory constraints in the following computational models: (1) The {\em one-pass streaming model} where the goal is to minimize the sample complexity of the protocol subject to a memory constraint, and ... More
Local Computation Algorithms for the Lovász Local LemmaSep 21 2018We consider the task of designing Local Computation Algorithms (LCA) for applications of the Lov\'{a}sz Local Lemma (LLL). LCA is a class of sublinear algorithms proposed by Rubinfeld et al. that have received a lot of attention in recent years. The LLL ... More
$L^p-L^q$ Carleman estimates with convex power weightsSep 09 2016We prove $L^p-L^q$ Carleman estimates with convex power weights $|x|^\beta$, extending previous work by J. O. Str\"omberg.
Collision-based Testers are Optimal for Uniformity and ClosenessNov 11 2016We study the fundamental problems of (i) uniformity testing of a discrete distribution, and (ii) closeness testing between two discrete distributions with bounded $\ell_2$-norm. These problems have been extensively studied in distribution testing and ... More
Optimal Identity Testing with High ProbabilityAug 09 2017Jan 15 2019We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $0< ... More
Efficient Statistics, in High Dimensions, from Truncated SamplesSep 11 2018We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a $d$-variate ... More
Testing Shape Restrictions of Discrete DistributionsJul 13 2015Jan 21 2016We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution $D$ over $[n]$ and a property $\mathcal{P}$, the goal is to distinguish between $D\in\mathcal{P}$ ... More
Menshov' "adjustment theorem" with respect to general measuresMay 27 2016A classical theorem of Menshov states that every measurable function can redefined on a set of arbitrarily small Lebesgue measure, so that the resulting function has uniformly convergent Fourier series. We prove that the same is true if we replace Lebesgue ... More
The essential norm of a composition operator on the minimal Mobius invariant spaceMay 26 2010Aug 04 2010We derive a formula for the essential norm of a composition operator on the minimal Mobius invariant space of analytic functions. As an application, we show that the essential norm of a non-compact composition operator is at least 1. We also obtain lower ... More
Faster Sublinear Algorithms using Conditional SamplingAug 16 2016A conditional sampling oracle for a probability distribution D returns samples from the conditional distribution of D restricted to a specified subset of the domain. A recent line of work (Chakraborty et al. 2013 and Cannone et al. 2014) has shown that ... More
A fractal perspective on optimal antichains and intersecting subsets of the unit $n$-cubeJul 16 2017An \emph{$n$-cube antichain} is a subset of the unit $n$-cube $[0,1]^n$ that does not contain two elements $\mathbf{x}=(x_1, x_2,\ldots, x_n)$ and $\mathbf{y}=(y_1, y_2,\ldots, y_n)$ satisfying $x_i\le y_i$ for all $i\in \{1,\ldots,n\}$. Using a chain ... More
Error-tolerant Exemplar Queries on RDF GraphsSep 10 2016Sep 13 2016Edge-labeled graphs are widely used to describe relationships between entities in a database. We study a class of queries, referred to as exemplar queries, on edge-labeled graphs where each query gives an example of what the user is searching for. Given ... More
Efficient Error-tolerant Search on Knowledge GraphsSep 10 2016Dec 01 2017Edge-labeled graphs are widely used to describe relationships between entities in a database. Given a query subgraph that represents an example of what the user is searching for, we study the problem of efficiently searching for similar subgraphs in a ... More
A technology agnostic RRAM characterisation methodology protocolSep 18 2018The emergence of memristor technologies brings new prospects for modern electronics via enabling novel in-memory computing solutions and affordable and scalable reconfigurable hardware implementations. Several competing memristor technologies have been ... More
Node Classification in Uncertain GraphsMay 22 2014In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the ... More
Zero Impact Parameter White Dwarf Collisions in FLASHSep 17 2012We systematically explore zero impact parameter collisions of white dwarfs with the Eulerian adaptive grid code FLASH for 0.64+0.64 M$_{\odot}$ and 0.81+0.81 M$_{\odot}$ mass pairings. Our models span a range of effective linear spatial resolutions from ... More
The de Bruijn-Erdős theorem from a Hausdorff measure point of viewMay 28 2018Motivated by a well-known result in extremal set theory, due to Nicolaas Govert de Bruijn and Paul Erd\H{o}s, we consider curves in the unit $n$-cube $[0,1]^n$ of the form \[ A=\{(x,f_1(x),\ldots,f_{n-2}(x),\alpha): x\in [0,1]\}, \] where $\alpha$ is ... More
A continuous analogue of Erdős' $k$-Sperner theoremApr 21 2019A \emph{chain} in the unit $n$-cube is a set $C\subset [0,1]^n$ such that for every $\mathbf{x}=(x_1,\ldots,x_n)$ and $\mathbf{y}=(y_1,\ldots,y_n)$ in $C$ we either have $x_i\le y_i$ for all $i\in [n]$, or $x_i\ge y_i$ for all $i\in [n]$. We show that ... More
Learning to MatchFeb 09 is a virtual two-sided marketplace where guests and accommodation providers are the two distinct stakeholders. They meet to satisfy their respective and different goals. Guests want to be able to choose accommodations from a huge and diverse ... More
Uncertain Time-Series Similarity: Return to the BasicsAug 09 2012In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants and engineering ... More
A Survey of Blocking and Filtering Techniques for Entity ResolutionMay 15 2019Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid techniques, facilitating their understanding and use. We also provided ... More
End-to-End Entity Resolution for Big Data: A SurveyMay 15 2019One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. ... More
Schema-agnostic Progressive Entity Resolution (extended version)May 15 2019Entity Resolution (ER) is the task of finding entity profiles that correspond to the same real-world entity. Progressive ER aims to efficiently resolve large datasets when limited time and/or computational resources are available. In practice, its goal ... More
Projection inequalities for antichainsDec 16 2018A set $A \subseteq {\mathbb{R}}^n$ is called an antichain (resp. antichain) if it does not contain two distinct elements ${\mathbf x}=(x_1,\ldots, x_n)$ and ${\mathbf y}=(y_1,\ldots, y_n)$ satisfying $x_i\le y_i$ (resp. $x_i < y_i$) for all $i\in \{1,\ldots,n\}$. ... More
Analytical Approach For Solving Population Balances: A Homotopy Perturbation MethodDec 25 2017In the present work, a new approach is proposed for finding the analytical solution of population balances. This approach is relying on idea of Homotopy Perturbation Method (HPM). The HPM solves both linear and nonlinear initial and boundary value problems ... More
Sublinear-Time Algorithms for Counting Star Subgraphs with Applications to Join Selectivity EstimationJan 17 2016We study the problem of estimating the value of sums of the form $S_p \triangleq \sum \binom{x_i}{p}$ when one has the ability to sample $x_i \geq 0$ with probability proportional to its magnitude. When $p=2$, this problem is equivalent to estimating ... More
Chondrule Formation in Bow Shocks around Eccentric Planetary EmbryosApr 03 2012Recent isotopic studies of Martian meteorites by Dauphas & Pourmond (2011) have established that large (~ 3000 km radius) planetary embryos existed in the solar nebula at the same time that chondrules - millimeter-sized igneous inclusions found in meteorites ... More
A Turán-type theorem for large-distance graphs in Euclidean spaces, and related isodiametric problemsApr 16 2019Given a measurable set $A\subset \mathbb R^d$ we consider the "large-distance graph" $\mathcal{G}_A$, on the ground set $A$, in which each pair of points from $A$ whose distance is bigger than 2 forms an edge. We consider the problems of maximizing the ... More
Progressive Data Science: Potential and ChallengesDec 19 2018Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped up significantly by ... More
On Silicon Group Elements Ejected by Supernovae Type IaApr 17 2013May 08 2014There is compelling evidence that the peak brightness of a Type Ia supernova is affected by the electron fraction Ye at the time of the explosion. The electron fraction is set by the aboriginal composition of the white dwarf and the reactions that occur ... More
3D Reconstruction of Coronary Arteries and Atherosclerotic Plaques based on Computed Tomography Angiography ImagesMar 13 2019The purpose of this study is to present a new semi-automated methodology for three-dimensional (3D) reconstruction of coronary arteries and their plaque morphology using Computed Tomography Angiography (CTA) images. The methodology is summarized in seven ... More
Performance of a simple proxy for U.S. cloud-to-ground lightningOct 01 2018The product of convective available potential energy (CAPE) and precipitation rate has previously been used as a proxy for cloud-to-ground (CG) lightning flash counts in climate change applications. Here the ability of this proxy, denoted CP, to represent ... More
Low power commissioning of an innovative laser beam circulator for inverse Compton scattering Gamma-ray sourceJan 16 2019We report on the optical commissioning of the high power laser beam circulator (LBC) for the high brightness Compton {\gamma}-ray source Extreme Light Infrastructure for Nuclear Physics. Tests aiming at demonstrating the optical performances of the LBC ... More