Modality-agnostic decoders leverage modality-invariant representations in human subjects' brain activity to predict stimuli irrespective of their modality (image, text, mental imagery).
Abstract: Text detection and recognition in natural scene imagery pose formidable challenges due to variations in orientation, distortions, intricate backgrounds, and inconsistent illumination.
MORAN is a network with rectification mechanism for general scene text recognition. The paper (accepted to appear in Pattern Recognition, 2019) in arXiv, final version is available now.
Omni, a fully omnimodal AI model with strong benchmark results, multilingual support, and new audio-visual coding capabilities.
Abstract: Image fusion involves integrating the information of multiple source images into one fused image, maximizing each source image's advantages. Existing image fusion methods combining text ...