
For a 100 m precision tag of a camera pose for lookup over a limited region, not much to concatenate the literal human interpretable hash. Tagging sets of features with ortho images or prior in situ images is ok, using global/aggregate features to get a pose prior in that sense is clear, but the bag of visual words model suggests we might not need that structure for a weaker estimate of position using the positions of query features and the features themselves in inverted file sense (with some conditions that features aren’t completely uninformative). Clustering on position metadata seems reasonable but curious if you have seen anything similar I could read
English














