Missing data imputation

Missing values complicate the analysis of large-scale observational datasets such as electronic health records. Our work has developed several foundational new models for missing value imputation, including low rank models and Gaussian copula models. We have also demonstrated improved methods to handle missing-not-at-random or informative missing data through the missing indicator method.

Talks

Software

  • gcimpute: imputation with the Gaussian copula

  • LowRankModels: low rank models for missing value imputation

Papers

In Defense of Zero Imputation for Tabular Deep Learning
M. Van Ness and M. Udell
Table representation learning workshop at NeurIPS, 2023
[bib]

The Missing Indicator Method: From Low to High Dimensions
V. N. Mike, T. Bosschieter, R. Halpin-Gregorio, and M. Udell
29th SIGKDD Conference on Knowledge Discovery and Data Mining - Applied Data Science Track, 2023
[arxiv][bib]

gcimpute: A Package for Missing Data Imputation
Y. Zhao and M. Udell
Accepted at Journal of Statistical Software, 2023
[arxiv][code][bib]

Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data
Y. Zhao, A. Townsend, and M. Udell
NeurIPS, 2022
[arxiv][url][bib]

Sparse Data Reconstruction, Missing Value and Multiple Imputation through Matrix Factorization
N. Sengupta, M. Udell, N. Srebro, and J. Evans
Sociological Methodology, 2022
[url][bib]

TenIPS: Inverse Propensity Sampling for Tensor Completion
C. Yang, L. Ding, Z. Wu, and M. Udell
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
[arxiv][url][bib]

Online Missing Value Imputation and Correlation Change Detection for Mixed-type Data via Gaussian Copula
Y. Zhao, E. Landgrebe, E. Shekhtman, and M. Udell
AAAI, 2021
[arxiv][url][bib]

Online Mixed Missing Value Imputation Using Gaussian Copula
E. Landgrebe, Y. Zhao, and M. Udell
ICML Workshop on the Art of Learning with Missing Values (Artemiss), 2020
[bib]

TenIPS: Inverse Propensity Sampling for Tensor Completion (Workshop)
C. Yang, L. Ding, Z. Wu, and M. Udell
OPT2020: 12th Annual Workshop on Optimization for Machine Learning, 2020
[url][bib]

Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula
Y. Zhao and M. Udell
Advances in Neural Information Processing Systems (NeurIPS), 2020
[arxiv][pdf][bib]

Polynomial Matrix Completion for Missing Data Imputation and Transductive Learning
J. Fan, Y. Zhang, and M. Udell
Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
[arxiv][url][bib]

Missing Value Imputation for Mixed Data Through Gaussian Copula
Y. Zhao and M. Udell
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2020
[arxiv][pdf][slides][bib]

Online High-Rank Matrix Completion
J. Fan and M. Udell
Computer Vision and Pattern Recognition (CVPR), 2019
Oral Presentation
[pdf][bib]

Causal Inference with Noisy and Missing Covariates via Matrix Factorization
N. Kallus, X. Mao, and M. Udell
Advances in Neural Information Processing Systems, 2018
[arxiv][code][bib]

Graph-Regularized Generalized Low Rank Models
M. Paradkar and M. Udell
CVPR Workshop on Tensor Methods in Computer Vision, 2017
[pdf][bib]

Generalized Low Rank Models
M. Udell, C. Horn, R. Zadeh, and S. Boyd
Foundations and Trends in Machine Learning, 2016
[arxiv][pdf][url][slides][code][bib]

Generalized Low Rank Models
M. Udell
Stanford University Thesis, 2015
[pdf][code][bib]

PCA on a Data Frame
M. Udell and S. Boyd
2015
[pdf][code][bib]

Beyond Principal Component Analysis (PCA)
M. Udell and S. Boyd
Biomedical Computation Review, 2014
[pdf][url][bib]

Generalized Low Rank Models
M. Udell, C. Horn, R. Zadeh, and S. Boyd
NeurIPS Workshop on Distributed Machine Learning and Matrix Computations, 2014
[pdf][code][bib]