region category within it. This bag-of-visual-words (BoVW) inspired representation [16] enables efficient
indexing and comparison.
3. Flexible and Expressive Query Formulation: We introduce a sophisticated query interface that
supports logical composition. Users can construct complex queries using Boolean operators (AND, OR, NOT)
over region categories, moving beyond single example queries to express nuanced information needs.
MOTIVATIONS AND NOVELTY OF OUR WORK
Several researchers [17] have noted that global image retrieval is not suitable for all situations. This limitation arises
because an image may contain multiple objects, while the user is often interested in only one of them. In such cases,
allowing the user to select only a part of the image (for example, a region) can be highly beneficial [2], [6].
a) Flexible and Expressive Query Formulation:
In some cases, allowing the user to select a single region of interest may be sufficient to fulfil their search needs.
However, in many scenarios, users require more expressive queries that combine multiple regions to represent
complex search intents. For example, as illustrated in Figure 2, a user may wish to retrieve images containing
different objects or semantic concepts located in separate regions. Instead of being limited to a single image, the user
can select one region of interest from one image and another from a different image to construct a more
comprehensive and meaningful query.
Figure 2: The user formulates the query by selecting an object of interest.
We introduce a sophisticated query interface that supports logical composition. Users can construct complex queries
using Boolean operators (AND, OR, NOT) over region categories, moving beyond single example queries to express
nuanced information needs. In this work, we provide the user with the ability to literally construct their query by:
a) Picking the desired regions from among the multitude of candidate images.
b) Combining these regions using logical connectors such as AND, OR, and XOR.
One of the innovative aspects of our approach lies in how logical connectors are processed during retrieval.
In existing studies, the relationship between regions was often modelled using set-theoretic operators. For example,
a logical AND between two regions was replaced by an intersection, and an OR by a union. If the query is “find images
containing a region similar to A and a region similar to B,” then these methods proceed as follows:
a) Retrieve all images containing a region similar to A;
b) Retrieve all images containing a region similar to B;
c) Return the intersection of the two result sets.
In contrast, in our approach, logical connectors are integrated directly into the similarity computation itself, as
detailed later. Compared to set-based methods, our approach is faster, more intuitive, and enables natural ranking
of all retrieved images.
RELATED WORKS
Our work sits at the intersection of Content-Based Image Retrieval (CBIR), Region-Based Image Retrieval (RBIR),
and modern deep learning. This section reviews the evolution of these fields, highlighting the foundational techniques
and the specific challenges our framework aims to address.
A. The Evolution of Content-Based Image Retrieval (CBIR)