The framework enhances local and global image retrieval through semantic region classification and deep learning techniques. It focuses on unsupervised clustering of image regions into meaningful categories, allowing users to create precise queries using logical operators. Experimental results indicate a significant improvement in retrieval accuracy, achieving up to a 20% increase in mean Average Precision (mAP). This research is particularly relevant for computer science students and professionals interested in advanced image retrieval systems and deep learning applications. The paper outlines a novel approach to bridging the semantic gap in image retrieval.

Key Points

  • Introduces a novel framework for semantic image retrieval using deep learning.
  • Employs unsupervised clustering for categorizing image regions into meaningful classes.
  • Allows users to formulate complex queries with logical operators like AND, OR, and NOT.
  • Demonstrates a 20% increase in mean Average Precision (mAP) over traditional methods.
Khamis Sirine
Author:Mohammed Salim Meflah, Mohammed Lamine Kherfi, Sihem Kechida
12 pages
Language:English
Type:Research Paper
Khamis Sirine
Author:Mohammed Salim Meflah, Mohammed Lamine Kherfi, Sihem Kechida
12 pages
Language:English
Type:Research Paper
109
/ 12
Journal of Information Systems Engineering and Management
2025, 10(61s)
e-ISSN: 2468-4376
https://www.jisem-journal.com/
Research Article
839
Copyright © 2024 by Author/s and Licensed by JISEM. This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
A Classification-Based Framework for Semantic Local and Global
Image Retrieval
Mohammed Salim Meflah
1
, Mohammed Lamine Kherfi
2
, Sihem Kechida
3
1
Department of Computer Science, Faculty of Technology, University Badji MokhtarAnnaba, Algeria
2
Department of Computer Science, Faculty of Technology, University of Kasdi Merbah, Ouargla, Algeria.
3
Laboratoire d’Automatique et Informatique de Guelma (LAIG), University 8 mai 1945, Guelma, Algeria.
ARTICLE INFO
Received: 24 Dec 2024
Revised: 12 Feb 2025
Accepted: 26 Feb 202
INTRODUCTION
Over the past decade, Content-Based Image Retrieval (CBIR) has undergone a major transformation, shifting from
handcrafted feature-based approaches to deep learning-driven methods. Early CBIR systems such as QBIC, Virage,
VisualSEEK, and MARS relied primarily on global visual descriptors (color, texture, and shape) to retrieve visually
similar images [1], [2], [3]. While these systems achieved promising results for simple images, they often failed to
capture the user’s true semantic intent when images contained multiple distinct objects. Users are rarely interested
in an entire image but rather in specific regions or objects (Figure 1) that correspond to their query goals. To address
this issue, Region-Based Image Retrieval (RBIR) was proposed. RBIR systems, including BlobWorld, Netra, and
SIMPLIcity [4], [5], [6], decompose images into meaningful Regions of Interest (ROIs) and perform retrieval based
on region-level similarity rather than global features.
However, these early systems were limited by their dependence on low-level visual features and accurate
segmentation, which hindered their scalability and semantic robustness. With the rise of deep learning,
Convolutional Neural Networks (CNNs) have revolutionized visual representation by learning hierarchical, semantic
features [7], [8], [9]. Modern deep retrieval frameworks, such as R-MAC and DELF [10], [11], leverage CNN
embeddings to achieve state-of-the-art performance on benchmark datasets. More recently, multimodal models such
as CLIP [12] have demonstrated the power of aligning visual and textual semantics within a unified embedding space,
opening new opportunities for semantic-level image retrieval.
Beyond retrieval, recent research has revisited the problem of region classification and representation. For instance,
Hi-nami et al. (2017) proposed region-aware retrieval that allows the specification of objects and spatial relations
Journal of Information Systems Engineering and Management
2025, 10(61s)
e-ISSN: 2468-4376
https://www.jisem-journal.com/
Research Article
840
Copyright © 2024 by Author/s and Licensed by JISEM. This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
rather than relying purely on global similarity [13]. Shlapentokh-Rothman et al. (2024) demonstrated that
segmentation models combined with self-supervised embeddings produce compact region representations suitable
for semantic search [14]. RegionCLIP (Zhong et al., 2021) further integrates region-text pretraining to enable zero-
shot region retrieval and fine-grained localization [15]. These studies highlight that region classification remains an
essential component of advanced RBIR systems.
Figure 1. Example of queries where the user is not interested in the entire
In our framework, each image is first decomposed into multiple ROIs. These regions are categorized into semantic
classes using a deep feature-based clustering or classification model. The region categorization process converts low-
level features into high-level semantic vectors capturing visual concepts such as *car*, *person*, or *bicycle*. As
illustrated in Figure 1, the system matches query regions with learned categories and represents each image by a
vector of category membership degrees. This representation allows users to construct logical queries that combine
multiple semantic regions using operators such as **AND**, **OR**, and **XOR**, as well as negation-based
operators such as **NOT** and **BUT-NOT**. This integration results in a more expressive and human-like retrieval
process that unifies symbolic reasoning with deep representation learning.
The main contributions of this work are as follows:
We propose a deep learning-based semantic region classification model for RBIR.
We introduce logical query composition using operators such as AND, OR, and NOT to enhance retrieval
expressiveness.
We combine symbolic reasoning with deep region embeddings to improve retrieval precision and
interpretability.
In this paper, we propose a novel RBIR framework that fundamentally shifts the retrieval paradigm from low-level
region matching to high-level semantic region categorization. Our method is designed to overcome the scalability
and semantic limitations of prior work by introducing an abstracted, efficient representation. The principal
contributions of this work are fourfold:
1. Unsupervised Semantic Vocabulary Construction: We employ unsupervised clustering on deep
features extracted from salient regions across the database to automatically generate a vocabulary of visual
concepts. This process groups semantically similar regions into distinct categories, forming a visual
dictionary.
2. Semantic Image Representation via Category Membership Vectors: Each image in the database is
compactly represented by a single, fixed-length vector that encodes the presence and significance of each
Journal of Information Systems Engineering and Management
2025, 10(61s)
e-ISSN: 2468-4376
https://www.jisem-journal.com/
Research Article
841
Copyright © 2024 by Author/s and Licensed by JISEM. This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
region category within it. This bag-of-visual-words (BoVW) inspired representation [16] enables efficient
indexing and comparison.
3. Flexible and Expressive Query Formulation: We introduce a sophisticated query interface that
supports logical composition. Users can construct complex queries using Boolean operators (AND, OR, NOT)
over region categories, moving beyond single example queries to express nuanced information needs.
MOTIVATIONS AND NOVELTY OF OUR WORK
Several researchers [17] have noted that global image retrieval is not suitable for all situations. This limitation arises
because an image may contain multiple objects, while the user is often interested in only one of them. In such cases,
allowing the user to select only a part of the image (for example, a region) can be highly beneficial [2], [6].
a) Flexible and Expressive Query Formulation:
In some cases, allowing the user to select a single region of interest may be sufficient to fulfil their search needs.
However, in many scenarios, users require more expressive queries that combine multiple regions to represent
complex search intents. For example, as illustrated in Figure 2, a user may wish to retrieve images containing
different objects or semantic concepts located in separate regions. Instead of being limited to a single image, the user
can select one region of interest from one image and another from a different image to construct a more
comprehensive and meaningful query.
Figure 2: The user formulates the query by selecting an object of interest.
We introduce a sophisticated query interface that supports logical composition. Users can construct complex queries
using Boolean operators (AND, OR, NOT) over region categories, moving beyond single example queries to express
nuanced information needs. In this work, we provide the user with the ability to literally construct their query by:
a) Picking the desired regions from among the multitude of candidate images.
b) Combining these regions using logical connectors such as AND, OR, and XOR.
One of the innovative aspects of our approach lies in how logical connectors are processed during retrieval.
In existing studies, the relationship between regions was often modelled using set-theoretic operators. For example,
a logical AND between two regions was replaced by an intersection, and an OR by a union. If the query is “find images
containing a region similar to A and a region similar to B,” then these methods proceed as follows:
a) Retrieve all images containing a region similar to A;
b) Retrieve all images containing a region similar to B;
c) Return the intersection of the two result sets.
In contrast, in our approach, logical connectors are integrated directly into the similarity computation itself, as
detailed later. Compared to set-based methods, our approach is faster, more intuitive, and enables natural ranking
of all retrieved images.
RELATED WORKS
Our work sits at the intersection of Content-Based Image Retrieval (CBIR), Region-Based Image Retrieval (RBIR),
and modern deep learning. This section reviews the evolution of these fields, highlighting the foundational techniques
and the specific challenges our framework aims to address.
A. The Evolution of Content-Based Image Retrieval (CBIR)
/ 12
End of Document
109

FAQs

What is the main focus of the framework for image retrieval?
The framework primarily focuses on enhancing image retrieval accuracy by leveraging semantic region classification and deep learning techniques. It addresses the limitations of traditional content-based image retrieval systems by allowing users to query specific regions of interest rather than entire images. This method improves the relevance of search results by aligning them more closely with user intent, thereby reducing the semantic gap that often exists in image retrieval.
How does the framework categorize image regions?
Image regions are categorized using an unsupervised clustering approach that groups semantically similar regions into distinct categories. This process involves extracting deep features from the images and applying clustering algorithms to identify meaningful visual concepts. Each image is then represented by a category membership vector, which quantifies the presence of different categories within the image, enabling more precise retrieval.
What advantages do logical operators provide in query formulation?
Logical operators such as AND, OR, and NOT allow users to create complex queries that reflect nuanced search intents. This flexibility enables users to combine multiple regions or categories in their queries, leading to more accurate and relevant search results. By integrating these operators directly into the similarity computation, the framework enhances the retrieval process, making it faster and more intuitive.
What experimental results support the effectiveness of this framework?
The experimental results demonstrate a substantial improvement in retrieval accuracy, with the framework achieving up to a 20% increase in mean Average Precision (mAP) compared to conventional methods. These results indicate that the classification-based approach effectively bridges the semantic gap in image retrieval, making it a valuable tool for users seeking specific visual content.