Research

My research explores various areas, including Medical Image Analysis, Explainable Artificial Intelligence, Computer Vision, Generative Adversarial Networks, Multimodal Deep Learning, Large Language Models, Natural Language Processing, Machine Learning, and the applications of Deep Learning.

1. Explainable AI in Medical Image Analysis

Medical image analysis plays a important role in diagnosing and treating diseases, particularly in eye care and cancer treatment, using advanced machine learning models like convolutional neural networks (CNNs). However, the complexity of these models can make their decisions difficult to understand, which is problematic in clinical settings. To address this, explainable AI techniques clarify how CNNs classify eye conditions from retinal images, while new segmentation models accurately identify blood vessels, aiding in vascular health assessment. By combining classification and segmentation, doctors can make better decisions. For lung and colon cancer detection, explainable AI also helps healthcare professionals understand and trust the predictions, improving patient communication and outcomes.

2. Multimodal Deep Learning

Multimodal deep learning improves understanding by combining images and text using three techniques: early, late, and intermediate fusion. Early fusion merges raw images and text before processing, allowing shared representation but risking noise sensitivity. Late fusion processes them separately, offering flexibility but possibly missing important connections. Intermediate fusion combines features at different stages, balancing the strengths of both data types. A challenge for Bangla language applications is the lack of diverse annotated image-text datasets. Bangla's unique features complicate text processing, and without specialized models, areas like fake news detection and disaster identification are difficult to improve. Custom models are needed to address these challenges in Bangla.

3. Sentiment Analysis and Assessing the Level of Toxicity in Social Media

Sentiment analysis categorizes emotions in text as positive, negative, or neutral, while toxic comment detection focuses on identifying harmful or abusive language, such as personal attacks and discriminatory remarks. Both are crucial in today’s online environment. Large Language Models (LLMs) like Gemini 1.5 Pro and GPT-3.5 Turbo have transformed NLP by learning from raw text, improving performance in tasks like sentiment analysis and toxic comment detection, especially for low-resource languages like Bangla. Sensitive topics such as transgender rights, indigenous issues, and migration are frequent targets of toxic language, but traditional models struggle due to the lack of curated datasets in Bangla. LLMs help mitigate this by performing well even with limited data, but high-quality, issue-specific datasets are still necessary to enhance accuracy, foster inclusivity, and protect marginalized communities from harmful content. Accurate models are essential for detecting toxicity, promoting fairness, and improving comment classification, particularly in languages with fewer resources like Bangla.

4. Natural Language Inference

Natural Language Inference (NLI) is a crucial NLP task that determines whether a premise supports, contradicts, or is unrelated to a hypothesis, aiding applications like question answering, information retrieval, and chatbots. NLI has three categories: entailment (the hypothesis follows from the premise), contradiction (both cannot be true), and neutral (the hypothesis is independent of the premise). It is especially important for languages like Bangla, improving NLP models' ability to interpret Bangla text and meet the growing demand for tools like chatbots and virtual assistants. Large Language Models (LLMs) enhance NLI by learning subtle sentence relationships and can be fine-tuned for specific tasks, making them effective even with limited labeled data.

5. Text Generation in Bengali

Text generation in NLP involves creating human-like text using models, with tasks like paraphrase generation, reading comprehension, and formal document creation. Large Language Models (LLMs) excel at understanding context and generating varied expressions of the same idea, which is useful for Bengali educational content and creative writing. Fine-tuning LLMs with Bengali datasets enhances their ability to answer questions, summarize information, and generate formal documents, aided by techniques like Retrieval-Augmented Generation (RAG). In the mental health domain, LLMs can offer empathetic, culturally relevant advice by training on specialized datasets. While challenges like limited data hinder development for low-resource languages like Bangla, methods like fine-tuning and few-shot learning help LLMs perform well. Ethical considerations are vital to ensure generated content, especially in mental health, is safe, reliable, and culturally sensitive.

Ongoing Work:

6. Image-to-Text Generation

Image-to-text generation in agriculture, particularly for disease diagnosis and recommendations, is a growing field with great potential. Large vision models like LLaMA 1.5, InstructBLIP, GPT-4, and Fuyu can analyze and interpret visual features in plant images, such as color changes, texture, and shapes, to accurately diagnose plant diseases. These models generate detailed textual descriptions, explaining the disease, its symptoms, and suggesting treatments. Key applications include identifying plant diseases through visual symptoms like discoloration or wilting, helping farmers and agronomists make informed decisions for effective disease management.

Ongoing Work:

7. Natural Language Processing for Medical Question Answering

Developing a question-answering (QA) system in low-resource languages like Bangla, particularly in the medical field, presents challenges due to limited datasets and pre-trained models. The system must process medical literature, clinical notes, and patient records to respond accurately to queries, despite the scarcity of Bangla medical resources. It also needs to handle complex medical terminology and English code-switching. Such a system could support healthcare professionals in making informed decisions, especially in rural areas, and improve patient education by answering medical questions in Bangla. Addressing various question types (factoid, list, confirmation, etc.), it must overcome data scarcity through methods like crowdsourcing and domain expert involvement to build annotated medical datasets in Bangla.

Ongoing Work:

8. Machine Translation and Regional Dialect Detection

Machine Translation (MT) in natural language processing (NLP) helps automatically translate text between languages, with Transformer models improving translation speed and accuracy. However, for low-resource languages, including dialects spoken by marginalized communities, there are challenges due to limited linguistic resources like annotated datasets. In Bangladesh, regional Bangla dialects such as those from Sylhet, Noakhali, and Mymensingh differ significantly from Standard Bangla, creating challenges for Dialect Machine Translation (DMT). These dialects have unique expressions that may not easily translate, and the lack of dialect-specific datasets further complicates model development. Similarly, Dialect Text Classification organizes text by dialect, enabling applications like regional content targeting, public sentiment analysis, and social media insights. Both tasks require careful handling of dialect variability and cultural nuances for effective translation and classification.

9. Generative Adversarial Networks in Agriculture

Generative Adversarial Networks (GANs) have transformed machine learning, particularly in agriculture, by enabling synthetic data generation to improve disease detection, such as for potato crops. Gathering images of infected potatoes at various stages is often difficult, but GANs create realistic, diverse images that enhance training datasets, improving models' ability to generalize and diagnose diseases accurately. This innovation helps researchers and farmers develop more effective diagnostic tools. Additionally, explainable AI builds trust by offering transparency in disease classification, fostering confidence among agricultural professionals. Instance segmentation further aids potato disease detection by identifying infected areas at the pixel level, enabling precise analysis of diseases like Black Scurf, Common Scab, Dry Rot, and Pink Rot. This technique helps differentiate between healthy and diseased tissues, assess disease severity, and track its progression, allowing for timely interventions and better crop management.

10. Computer Vision Applications in Agriculture

Disease classification is essential for sustainable farming and food security, as crop diseases can cause financial losses and disrupt food supply chains. Timely detection helps manage outbreaks and ensure healthy yields. Potatoes, a key staple, are prone to diseases like Black Scurf and Common Scab. Machine learning, particularly convolutional neural networks (CNNs), effectively detect such diseases by analyzing spatial patterns in images. Hybrid models that combine CNNs with LSTM, GRU, and Bi-LSTM architectures enhance predictions by capturing both spatial features and the progression of symptoms over time, enabling more robust and comprehensive disease detection.