top of page
Selected Projects

This page contains a number of representative projects I was involved in and the progression in my research interests and directions.

Screenshot 2024-09-07 at 1.26_edited.png

Tonmoay Deb, Lichen Wang, Zachary Bessinger, Naji Khosravan, Eric Penner, Sing Bing Kang,

CVPRW, 2024

We presented an extension to Zillow Indoor Dataset, adding natural language descriptions of layout to enable bridging the gap between layout understanding and description in the form of natural language.

Screenshot 2024-09-07 at 1.27.21 PM.png

Taotao Jing, Lichen Wang, Naji Khosravan, Zhiqiang Wan, Zachary Bessinger, Zhengming Ding, Sing Bing Kang

WACV, 2024

We proposed a method that addresses the challenges related to the long tailed distribution of room layouts. Our method is more robust than state of the art to appearance, structure and shape variations in real world room layouts.

Screenshot 2024-09-07 at 1.28_edited.jpg

Negar Nejatishahidin, Will Hutchcroft, Manjunath Narayana, Ivaylo Boyadzhiev, Yuguang Li, Naji Khosravan, Jana Košecká, Sing Bing Kang

CVPRW, 2023, (OmniCV Best Paper Award)

We address the problem of wide-baseline camera pose estimation from a group of 360◦ panoramas, proposing a deep learning graph neural network based method to deal with textureless empty indoor spaces using co-visible structure from multiple views.

Screenshot 2024-09-07 at 1.28.40 PM.png

Yu Yin, Will Hutchcroft, Naji Khosravan, Ivaylo Boyadzhiev, Yun Fu, Sing Bing Kang

ICMR, 2022

We presented an iterative topological learning algorithm for a graph neural network solution to learn/predict topological structure of floorplans using individual room attributes as input.

teaser-fov.gif

Zhixiang Min, Naji Khosravan, Zachary Bessinger, Manjunath Narayana, Sing Bing Kang, Enrique Dunn, Ivaylo Boyadzhiev

CVPR, 2022 (Oral)

LASER, an image-based Monte Carlo Localization (MCL) framework. We proposed the concept of latent space rendering, where 2D pose hypotheses on the floor map are directly rendered into a geometrically-structured latent space by aggregating viewing ray features. Our method achieved state of the art performance and speed for visual localization in indoor textureless spaces.

teaser_final.png

Steve Cruz, Will Hutchcroft, Yuguang Li, Naji Khosravan, Ivaylo Boyadzhiev, Sing Bing Kang

CVPR, 2021

We released Zillow Indoor Dataset (ZInD). ZInD is the largest real dataset with layout annotations providing annotations of 3D room layouts, 2D and 3D floor plans, panorama location in the floor
plan, and locations of windows and doors. ZInD follows a real world distribution (cuboid, more general Manhattan, and non-Manhattan layouts) as opposed to the mostly cuboid or Manhattan layouts in current publicly available datasets.

Screenshot 2024-09-08 at 2.25.44 PM.png

Rodney LaLonde, Naji Khosravan, Ulas Bagci

Advanced Intelligent Systems, 2021

We proposed a new family of capsule neural networks, deformable capsules (DeformCaps), to address object detection problem. DeformCaps resolves the limitations of existing capsule networks around memory‐efficient with a new perspective of using deformable convolutions rather than imposing geometric constraints. Two new algorithms associated with our DeformCaps, a novel capsule structure (SplitCaps), and a novel dynamic routing algorithm (SE-Routing) is proposed to address the problem at hand.

teaser_figure_att.png

Naji Khosravan, Shervin Ardeshir, Rohit Puri

CVPRW, 2019

We proposed a convolutional neural network (CNN) architecture using a spatio-temporal attention module  that is capable of identifying the important portions of a video (eliminating background audio) to determine the synchronization between the audio and visual signals. We explored defining the problem of audio-video synchronization both as a regression and a classification problem.

Screenshot 2024-09-08 at 2.30.17 PM.png

Naji Khosravan

Ph.D. Dissertation, 2019

In this dissertation, I proposed a wide range of novel computer vision and machine learning algorithms to tackle current challenges in radiology screening. I proposed to incorporate expert/radiologist knowledge through eye-tracking, making the whole process a human-centered AI system. Proposed machine learning methods solve image-based diagnosis, detection and segmentation while using gaze as the minimally distractive and natural interaction medium with the human in the loop. 

Screenshot 2024-09-08 at 2.47.23 PM.png

Naji Khosravan, Aliasghar Mortazi, Michael Wallace, Ulas Bagci

MICCAI, 2019

PAN is an adversarial learning approach designed to effectively capture long-range and high-level label consistencies in semantic segmentation. Specific to medical images, we proposed capturing 3D semantics in a computationally efficient way through 2D projections. Furthermore, we introduce an attention module into our framework that helps for a selective integration of global information. 

Screenshot 2024-09-08 at 2.48.05 PM.png

Naji Khosravan, Haydar Celik, Baris Turkbey, Elizabeth C Jones, Bradford Wood, Ulas Bagci

Medical image analysis journal, 2019

In this study, we developed a paradigm shifting CAD system, called collaborative CAD (C-CAD), that unifies CAD and eye-tracking systems in realistic radiology room settings. We present a new graph based clustering and sparsification algorithm to transform eye-tracking data (gaze) into a graph model to enable effective interaction with the radiologist through eye-tracking. C-CAD incorporates a deep learning algorithm in a newly designed multi-task learning platform to segment and diagnose suspicious areas simultaneously.

Screenshot 2024-09-08 at 2.33.58 PM.png

Naji Khosravan, Ulas Bagci

IEEE EMBC, 2018

Limitations in expert annotated data is one of the major challenges in developing deep learning models for diagnosis and segmentation of abnormalities in medical images. This paper shows the benefit of multi-task learning, when auxiliary tasks are selected carefully, in addressing the limited performance of semi-supervised methods in with limited data available.

Screenshot 2024-09-08 at 2_edited.png

Naji Khosravan, Ulas Bagci

MICCAI, 2018

S4ND is the first single shot long nodule detection method. Modeling detection as a point detection problem and using dense residual connection within the architecture allowed us to rely on a single feed forward of a 3D CNN network to find tiny objects in a large 3D gray-scale search space (long nodules within volumetric CT scans).

Screenshot 2024-09-08 at 2.34.56 PM.png

Naji Khosravan, Haydar Celik, Baris Turkbey, Ruida Cheng, Evan McCreedy, Matthew McAuliffe, Sandra Bednarova, Elizabeth Jones, Xinjian Chen, Peter Choyke, Bradford Wood, Ulas Bagci

MICCAIW, 2017

Gaze2Segment was our very first step towards designing and implementing a Computer Aided Diagnosis system integrating expert/radiologists' gaze into computer vision algorithms in an interactive and dynamic way.

bottom of page