The accurate extraction of visemes—the minimal distinguishable units in lip reading—lacks a systematic solution. Existing methods primarily rely on audio alignment and phoneme mapping, which suffers ...