FNN packageEXERCISE: How many levels do we need to go down to reach \(\approx k\) candidate neighbors?
SOLUTION: Set \(n 2^{-d}\) to \(k\) and solve: \[\begin{eqnarray} n 2^{-d} & = & k\\ \log_2{n} - d & =& \log_2{k}\\ d & = & \log_2{n/k} \end{eqnarray}\]
Because we’ve used the median, we’ve ensured that each child contains 1/2 of the points of its parents
Azadkia, Mona. 2019. “Optimal Choice of \(k\) for \(k\)-Nearest Neighbor Regression.” E-print, arxiv:1909.05495. http://arxiv.org/abs/1909.05495.
Bentley, Jon Louis. 1975. “Multidimensional Binary Search Trees Used for Associative Searching.” Communications of the ACM 18:508–17. https://doi.org/10.1145/361002.361007.
Charikar, Moses S. 2002. “Similarity Estimation Techniques from Rounding Algorithms.” In, edited by John Reif, 380–88. New York: ACM. https://doi.org/10.1145/509907.509965.
Claeskens, Gerda, and Nils Lid Hjort. 2008. Model Selection and Model Averaging. Cambridge, England: Cambridge University Press.
Datar, Mayur, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. “Locality-Sensitive Hashing Scheme Based on P-Stable Distributions.” In Proceedings of the 20th Annual Symposium on Computational Geometry [Scg04], edited by Jack Snoeyink and Jean-Daniel Boissonnat, 253–62. New York: ACM. https://doi.org/10.1145/997817.997857.
Gershenfeld, Neil. 1999. The Nature of Mathematical Modeling. Cambridge, England: Cambridge University Press.
Gionis, Aristides, Piotr Indyk, and Rajeev Motwani. 1999. “Similarity Search in High Dimensions via Hashing.” In Proceedings of the 25th International Conference on Very Large Data Bases [Vldb ’99], edited by Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie, 518–29. San Francisco: Morgan Kaufmann.
Leskovec, Jure, Anand Rajaraman, and Jeffrey D. Ullman. 2014. Mining of Massive Datasets. Second. Cambridge, England: Cambridge University Press. http://www.mmds.org.