In this blog post, we explore the importance of accurate data labeling in entity resolution systems and share some best practices for data labeling.
This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm...
Many populations are "hidden" from the point of view of traditional probabilistic surveys. They are populations for which we have no meaningful sampling frame and whose members may be difficult to identify. Examples include victims of human trafficking and civilian casualties in armed conflicts. Understanding these populations is central to policymaking and to prosecuting human rights violations, yet statistical inference remains extremely difficult in practice. [...]
I'll be giving a talk Wednesday 2pm at JSM in D.C. Come see it! It'll be an interesting session with winners of the best student paper award of the Survey Methods, Government Statistics, and Social Statistics sections.
I will be talking about my recent paper "Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org"
As a professional working with sensitive data, I am responsible for keeping my computers and accounts secure. Here's how I do it and how you can too.
It's easy to come up with stories and plausible explanations. What's hard is to be *right*.
Play "Welcome to the Moon" on the couch or remotely.
I review a few black-box hyperparameter optimization techniques at a high conceptual level: grid search, randomized search and sequential model-based optimization.
The Duke Graduate and Professional Student Government (GPSG) Community Pantry is a student-operated food pantry serving the student community at Duke University. In this post, I describe the record linkage system used at the Pantry to identify individual customers and obtain their order history. This is done using a Python module for deterministic record linkage and model evaluation techniques which I describe in detail.
Notes on some research in progress.
When p < 0.05 provides evidence in favor of the null...
Some notes regarding various 'optimalities' of posterior distributions.
Some techniques to bound the Jensen functional.
A short description of the post.
A sound re-interpretation of the adjusted $R^2$ value for model comparison.
Complete proof of the tubular neighborhood theorem for submanifolds of euclidean space. I was unable to find an elementary version in the litterature.
Copyright 2021 Olivier Binette
Made using Distill for Rmarkdown