This paper studies how to estimate an individual's taste for forming a connection with another individual in a network. It compares the difficulty of estimation with and without the assumption that ut...
Autoformalization aims to translate natural-language mathematics into compilable, machine-checkable statements. However, semantic consistency does not imply prover effectiveness: even semantically con...
Reconstructing dynamic non-rigid objects from monocular video requires integrating visual cues from direct observations with data-driven priors over geometry and appearance. Prior approaches either le...
Interstellar objects (ISOs) motivate a coupled mission-design and inference question relevant to spacecraft dynamics and control in extreme environments: if volatile-rich, rotating comet-like bodies w...
The rapid growth of molecular foundation models and general-purpose large language models has encouraged a scale-centric view of artificial intelligence in drug discovery, in which larger pretrained m...
This paper develops a novel change point identification method for high-dimensional data using random projections. By projecting high-dimensional time series into a one-dimensional space, we are able ...
Reinforcement Learning from Verifiable Rewards (RLVR) suffers from exploration inefficiency, where models struggle to generate successful rollouts, resulting in minimal learning signal. This challenge...
In causal analysis, understanding the causal mechanisms through which an intervention or treatment affects an outcome is often of central interest. We propose a test to evaluate (i) whether the causal...
Unnormalized probability distributions are frequently used in machine learning for modeling complex data generating processes. Though Markov chain Monte Carlo (MCMC) algorithms can approximately sampl...
Recent advances in machine learning and large-scale biological data collections have revived the prospect of building a virtual cell, a computational model of cellular behavior that could accelerate b...
Diffusion-based generative models increasingly rely on inference-time guidance, adding a drift term or reweighting mixture of experts, to improve sample quality on task-specific objectives. However, m...
Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are larg...
We study risk-neutral density extraction from short-dated option chains. As expiry approaches, option premia decline and bid--ask spreads can be large relative to prices, making mid quotes particularl...
Financial statement auditing is conducted under a risk-based evidence approach to obtain reasonable assurance. In practice, auditors often perform additional sampling or related procedures when an ini...
In variety testing, multi-environment trials (MET) are essential for evaluating the genotypic performance of crop plants. A persistent challenge in the statistical analysis of MET data is the estimati...
Data assimilation (DA) in subsurface flow entails calibrating model parameters to match observed data, typically at wells, while preserving geological realism. Latent diffusion models (LDMs) provide e...
This paper introduces a comprehensive framework for complex-valued probability measures and explores their novel applications in information theory and statistical analysis. We define a complex probab...
Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, exist...
Diesel engine particulate matter (PM) is one of the most challenging emission constituents to predict. As engines become cleaner and emissions levels drop, manufacturers need reliable methods to quant...
Phytolith analysis is a crucial tool for reconstructing past vegetation and human activities, but traditional methods are severely limited by labour-intensive, time-consuming manual microscopy. To add...