What is the main focus of the framework for image retrieval?

The framework primarily focuses on enhancing image retrieval accuracy by leveraging semantic region classification and deep learning techniques. It addresses the limitations of traditional content-based image retrieval systems by allowing users to query specific regions of interest rather than entire images. This method improves the relevance of search results by aligning them more closely with user intent, thereby reducing the semantic gap that often exists in image retrieval.

How does the framework categorize image regions?

Image regions are categorized using an unsupervised clustering approach that groups semantically similar regions into distinct categories. This process involves extracting deep features from the images and applying clustering algorithms to identify meaningful visual concepts. Each image is then represented by a category membership vector, which quantifies the presence of different categories within the image, enabling more precise retrieval.

What advantages do logical operators provide in query formulation?

Logical operators such as AND, OR, and NOT allow users to create complex queries that reflect nuanced search intents. This flexibility enables users to combine multiple regions or categories in their queries, leading to more accurate and relevant search results. By integrating these operators directly into the similarity computation, the framework enhances the retrieval process, making it faster and more intuitive.

What experimental results support the effectiveness of this framework?

The experimental results demonstrate a substantial improvement in retrieval accuracy, with the framework achieving up to a 20% increase in mean Average Precision (mAP) compared to conventional methods. These results indicate that the classification-based approach effectively bridges the semantic gap in image retrieval, making it a valuable tool for users seeking specific visual content.

A Classification-Based Framework for Semantic Image Retrieva

The framework enhances local and global image retrieval through semantic region classification and deep learning techniques. It focuses on unsupervised clustering of image regions into meaningful categories, allowing users to create precise queries using logical operators. Experimental results indicate a significant improvement in retrieval accuracy, achieving up to a 20% increase in mean Average Precision (mAP). This research is particularly relevant for computer science students and professionals interested in advanced image retrieval systems and deep learning applications. The paper outlines a novel approach to bridging the semantic gap in image retrieval.

Key Points

Introduces a novel framework for semantic image retrieval using deep learning.
Employs unsupervised clustering for categorizing image regions into meaningful classes.
Allows users to formulate complex queries with logical operators like AND, OR, and NOT.
Demonstrates a 20% increase in mean Average Precision (mAP) over traditional methods.

Journal of Information Systems Engineering and Management

2025, 10(61s)

e-ISSN: 2468-4376

https://www.jisem-journal.com/

Research Article

839

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

A Classification-Based Framework for Semantic Local and Global

Image Retrieval

Mohammed Salim Meflah

, Mohammed Lamine Kherfi

, Sihem Kechida

Department of Computer Science, Faculty of Technology, University Badji Mokhtar–Annaba, Algeria

Department of Computer Science, Faculty of Technology, University of Kasdi Merbah, Ouargla, Algeria.

Laboratoire d’Automatique et Informatique de Guelma (LAIG), University 8 mai 1945, Guelma, Algeria.

ARTICLE INFO

ABSTRACT

Received: 24 Dec 2024

Revised: 12 Feb 2025

Accepted: 26 Feb 202

This paper presents a novel framework for enhancing both local and global image

retrieval by leveraging semantic region classification and deep learning. The core of our

approach involves the unsupervised clustering of image regions into semantically

meaningful categories. Each image in the database is subsequently represented by a

membership weight vector indicating its affinity to these region categories, refined

through a Convolutional Neural Network (CNN). This

representation enables a flexible and expressive query paradigm, allowing users to

formulate precise queries by logically combining example regions using operators such

as AND, OR, and NOT, as well as by specifying positive (example) and negative

(counter-example) constraints. Experimental results demonstrate a substantial

improvement in retrieval accuracy and user satisfaction compared to conventional

methods, achieving up to a 20% increase in mean Average Precision (mAP) and

confirming the effectiveness of our classification-based approach in bridging the

semantic gap.

Keywords: Region-based image retrieval, semantic region classification, logical

operators, deep learning, counter-examples, semantic representation.

INTRODUCTION

Over the past decade, Content-Based Image Retrieval (CBIR) has undergone a major transformation, shifting from

handcrafted feature-based approaches to deep learning-driven methods. Early CBIR systems such as QBIC, Virage,

VisualSEEK, and MARS relied primarily on global visual descriptors (color, texture, and shape) to retrieve visually

similar images [1], [2], [3]. While these systems achieved promising results for simple images, they often failed to

capture the user’s true semantic intent when images contained multiple distinct objects. Users are rarely interested

in an entire image but rather in specific regions or objects (Figure 1) that correspond to their query goals. To address

this issue, Region-Based Image Retrieval (RBIR) was proposed. RBIR systems, including BlobWorld, Netra, and

SIMPLIcity [4], [5], [6], decompose images into meaningful Regions of Interest (ROIs) and perform retrieval based

on region-level similarity rather than global features.

However, these early systems were limited by their dependence on low-level visual features and accurate

segmentation, which hindered their scalability and semantic robustness. With the rise of deep learning,

Convolutional Neural Networks (CNNs) have revolutionized visual representation by learning hierarchical, semantic

features [7], [8], [9]. Modern deep retrieval frameworks, such as R-MAC and DELF [10], [11], leverage CNN

embeddings to achieve state-of-the-art performance on benchmark datasets. More recently, multimodal models such

as CLIP [12] have demonstrated the power of aligning visual and textual semantics within a unified embedding space,

opening new opportunities for semantic-level image retrieval.

Beyond retrieval, recent research has revisited the problem of region classification and representation. For instance,

Hi-nami et al. (2017) proposed region-aware retrieval that allows the specification of objects and spatial relations

Journal of Information Systems Engineering and Management

2025, 10(61s)

e-ISSN: 2468-4376

https://www.jisem-journal.com/

Research Article

840

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

rather than relying purely on global similarity [13]. Shlapentokh-Rothman et al. (2024) demonstrated that

segmentation models combined with self-supervised embeddings produce compact region representations suitable

for semantic search [14]. RegionCLIP (Zhong et al., 2021) further integrates region-text pretraining to enable zero-

shot region retrieval and fine-grained localization [15]. These studies highlight that region classification remains an

essential component of advanced RBIR systems.

Figure 1. Example of queries where the user is not interested in the entire

In our framework, each image is first decomposed into multiple ROIs. These regions are categorized into semantic

classes using a deep feature-based clustering or classification model. The region categorization process converts low-

level features into high-level semantic vectors capturing visual concepts such as *car*, *person*, or *bicycle*. As

illustrated in Figure 1, the system matches query regions with learned categories and represents each image by a

vector of category membership degrees. This representation allows users to construct logical queries that combine

multiple semantic regions using operators such as **AND**, **OR**, and **XOR**, as well as negation-based

operators such as **NOT** and **BUT-NOT**. This integration results in a more expressive and human-like retrieval

process that unifies symbolic reasoning with deep representation learning.

The main contributions of this work are as follows:

• We propose a deep learning-based semantic region classification model for RBIR.

• We introduce logical query composition using operators such as AND, OR, and NOT to enhance retrieval

expressiveness.

• We combine symbolic reasoning with deep region embeddings to improve retrieval precision and

interpretability.

In this paper, we propose a novel RBIR framework that fundamentally shifts the retrieval paradigm from low-level

region matching to high-level semantic region categorization. Our method is designed to overcome the scalability

and semantic limitations of prior work by introducing an abstracted, efficient representation. The principal

contributions of this work are fourfold:

1. Unsupervised Semantic Vocabulary Construction: We employ unsupervised clustering on deep

features extracted from salient regions across the database to automatically generate a vocabulary of visual

concepts. This process groups semantically similar regions into distinct categories, forming a visual

dictionary.

2. Semantic Image Representation via Category Membership Vectors: Each image in the database is

compactly represented by a single, fixed-length vector that encodes the presence and significance of each

Journal of Information Systems Engineering and Management

2025, 10(61s)

e-ISSN: 2468-4376

https://www.jisem-journal.com/

Research Article

841

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

region category within it. This bag-of-visual-words (BoVW) inspired representation [16] enables efficient

indexing and comparison.

3. Flexible and Expressive Query Formulation: We introduce a sophisticated query interface that

supports logical composition. Users can construct complex queries using Boolean operators (AND, OR, NOT)

over region categories, moving beyond single example queries to express nuanced information needs.

MOTIVATIONS AND NOVELTY OF OUR WORK

Several researchers [17] have noted that global image retrieval is not suitable for all situations. This limitation arises

because an image may contain multiple objects, while the user is often interested in only one of them. In such cases,

allowing the user to select only a part of the image (for example, a region) can be highly beneficial [2], [6].

a) Flexible and Expressive Query Formulation:

In some cases, allowing the user to select a single region of interest may be sufficient to fulfil their search needs.

However, in many scenarios, users require more expressive queries that combine multiple regions to represent

complex search intents. For example, as illustrated in Figure 2, a user may wish to retrieve images containing

different objects or semantic concepts located in separate regions. Instead of being limited to a single image, the user

can select one region of interest from one image and another from a different image to construct a more

comprehensive and meaningful query.

Figure 2: The user formulates the query by selecting an object of interest.

We introduce a sophisticated query interface that supports logical composition. Users can construct complex queries

using Boolean operators (AND, OR, NOT) over region categories, moving beyond single example queries to express

nuanced information needs. In this work, we provide the user with the ability to literally construct their query by:

a) Picking the desired regions from among the multitude of candidate images.

b) Combining these regions using logical connectors such as AND, OR, and XOR.

One of the innovative aspects of our approach lies in how logical connectors are processed during retrieval.

In existing studies, the relationship between regions was often modelled using set-theoretic operators. For example,

a logical AND between two regions was replaced by an intersection, and an OR by a union. If the query is “find images

containing a region similar to A and a region similar to B,” then these methods proceed as follows:

a) Retrieve all images containing a region similar to A;

b) Retrieve all images containing a region similar to B;

c) Return the intersection of the two result sets.

In contrast, in our approach, logical connectors are integrated directly into the similarity computation itself, as

detailed later. Compared to set-based methods, our approach is faster, more intuitive, and enables natural ranking

of all retrieved images.

RELATED WORKS

Our work sits at the intersection of Content-Based Image Retrieval (CBIR), Region-Based Image Retrieval (RBIR),

and modern deep learning. This section reviews the evolution of these fields, highlighting the foundational techniques

and the specific challenges our framework aims to address.

A. The Evolution of Content-Based Image Retrieval (CBIR)

Overview

A Classification-Based Framework for Semantic Image Retrieval

/ 12

A Classification-Based Framework for Semantic Image Retrieval

Phonomotor Versus Semantic Feature Analysis Treatment

Deep Learning Based Medical Image Severity Forecasting System

Analysis and Evaluation of Grad-CAM Explanations

Dampak Dan Faktor Job Insecurity

Nico Breakthrough Trading Journal May 2025

Modeling Optimal Investment and Reinsurance in Ambiguity Markets

Vogue Covers and Women’s Rights in the United States

Liquidated: An Ethnography of Wall Street by Karen Ho

Dekolonisatie Van De Denkwijze Lena Melis

La Comunidad Internacional y su Participación en Procesos de Paz

Medidas de Protección de Niños, Niñas y Adolescentes

Sistemas de Protección de Derechos de Niños y Adolescentes

Examining the Role of HR Metrics and Analytics in Decision-Making

LifeLink: A Real-Time Web-Based Blood Donor Network

Artificial Intelligence Risk Management Framework 1.0

NIST AI Risk Management Framework Overview 2023

The Window of Tolerance Framework for Emotional Regulation

Soil Classification and Pedons Nyle C. Brady Raymond Weil