Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03.

Ähnliche Präsentationen


Präsentation zum Thema: "Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03."—  Präsentation transkript:

1 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

2 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen User Interfaces In the beginning: Wimpy Computing –Windows, Icons, Menus, Pointing

3 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen 2 nd Generation:Human-Machine Interaction Speaking Pointing, Gesturing Hand-Writing Drawing Presence/Focus of Attention Combination –Sp+HndWrtg+Gestr. –Repair Multimodal NLP & Dialog

4 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Perceptual User Interfaces Perceptive –human-like perceptual capabilities (what is the user saying, who is the user, where is the user, what is he doing?) Multimodal –People use multiple modalities to communicate (speech, gestures, facial expressions, …) Multimedia –Text, graphics, audio and video (Matthew Turk (Ed.), Proceedings of the 1998 Workshop on Perceptual User Interfaces)

5 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Next: Pervasive Computing Human-Computer Interaction not the Only Exchange Humans Want to Interact with Other Humans –Computers in the Human Interaction Loop (CHIL) –The Transparent, Invisible Computer –Computers Needs to be Context Aware –Should Require little or no Learning or Attention –Should be proactive rather than command driven –Produce Little or No Distraction –Permit a HCI and CHIL Mix

6 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Smart/Intelligent Rooms Use of computation to enhance everyday activity Integrate computers seamlessly into the real world (e.g. offices, homes) Use natural interfaces for communication (voice, gesture, etc. ) Computer should adapt to the human, not vice- versa!

7 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Perception In order to respond appropriately, objects/room need(s) to pay attention to –People and –Context Machines have to be aware of their environment: –Who, What, When, Where and Why? Interfaces must be adaptive to –Overall situation –Individual User

8 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Intelligent Environments Classroom 2000 (Georgia Tech) Mozers Adaptive House Enhanced Meeting Rooms Kids Room (MIT) … Enhanced Objects such as Whiteboards, Desks, Chairs, … See also the Intelligent Environments Resource Page (http://www.research.microsoft.com/ierp/)

9 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Intelligent Rooms, Univ. California, San Diego

10 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Classroom 2000 Capturing activity in a classroom –Speakers voice –Video –Slides –Handwritten Notes

11 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Classroom 2000 Presenting (recorded) lectures through a web- based interface Integration of Slides, Notes, Audio, Video Searching Adding additional material

12 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Microsoft Easy Living Project XML-based distributed agent system Computer vision for person-tracking and visual user interaction. Multiple sensor modalities combined. Use of a geometric model of the world to provide context. Automatic or semi-automatic sensor calibration and model building. Fine-grained events and adaptation of the user interface. Device- independent communication and data protocols. Ability to extend the system in many ways.

13 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Mozers Adaptive House Operated as an ordinary home –Usual light-switches, thermostats, doors etc. Adjustments are measured and used to train the house to –automatically adjust temperature –adjust lighting –choose music or TV channel The house infers the users desires from their actions and behaviours

14 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Adaptive House (Mozer) Sensors: Light Level Sound Level Temperature Motion Door status Window status Light settings Fan Heaters … (M. Mozer, Univ. of Colorado, Boulder)

15 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Issues in Perception Visual –Face-detection / Tracking –Body-Tracking –Face Recognition –Gesture Recognition –Action Recognition –Gaze Tracking / Tracking Focus of Attention Auditory –Speech Recognition –Speaker Tracking –Auditory Scene Analysis –Speaker Identification Other: Haptic, Olfactoric, … ?

16 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Enhanced Meeting Rooms Capturing of Meetings Transcription Summarization Dialog Processing Who was there ? Who talked to whom ?

17 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Work at ISL Face Tracking Facial Feature Tracking (Eyes, Nose, Mouth) Head Pose Estimation / Gaze Tracking Lip-Reading (Audio-Visual Speech Reco.) 3D Person Tracking Pointing Gesture Tracking Other Modalities: Speech (!!!, see John), Dialogue, Translation, Handwriting,...

18 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Tracking of Human Faces A face provides different functions: identification perception of emotional expressions Human Computer Interaction requires tracking of faces: lip-reading eye/gaze tracking facial action analysis / synthesis Video Conferencing / video telephony application: tracking the speaker achieving low bit rate transmission

19 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Demo: FaceTracker

20 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Color Based Face Tracking Human skin-colors: cluster in a small area of a color space skin-colors of different people mainly differ in intensity! variance can be reduced by color normalization distribution can be characterized by a Gaussian model Chromatic colors :

21 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Color Model Advantages: very fast orientation invariant stable object representation not person-dependent model parameters can be quickly adapted Disadvantages: environment dependent (light-sources heavily affect color distribution)

22 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Tracking Gaze and Focus of Attention In meetings: –to determine the addressee of a speech act –to track the participants attention –to analyse, who was in the center of focus –for meeting indexing / retrieval Interactive rooms –to guide the environments focus to the right application –to suppress unwanted responses Virtual collaborative workspaces (CSCW) Human-Robot Cooperation Cars (Driver monitoring)

23 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Tracking a Users Focus of Attention Focus of Attention tracking: –To detect a persons interest –To know what a user is interacting with –To understand his actions/intentions –To know whether a user is aware of something In meetings: –to determine the addressee of a speech act –to understand the dynamics of interaction –for meeting indexing / retrieval Other areas – Smart environments – Video-conferencing – Human-Robot Interaction

24 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Head Pose Estimation Model-based approaches: –Locate and track a number of facial features –Compute head pose from 2D to 3D correspondences (Gee & Cipolla '94, Stiefelhagen et.al '96, Jebara & Pentland '97,Toyama '98) Example-based approaches: –estimate new pose with function approximator (such as ANN) (Beymer et.al.'94, Schiele & Waibel '95, Rae & Ritter '98) –use face database to encode images (Pentland et.al. '94)

25 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Model-based Head Pose estimation Image 3D ModelReal World Y Z X Feature TrackingPose Estimation Find correspondences between points in a 3D model and points in the image Iteratively solve linear equation system to find pose parameters (r x, r y, r z, t x, t y, t z )

26 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Demo: Facial Feature Tracking

27 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Demo: Model-based Head Pose

28 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Model-based Head Pose Pose estimation accuracy depends on correct feature localization! Problems: –Choice of good features –Occlusion due to strong head rotation –Fast head movement –Detection of tracking failure / re-initialization –Requires good image resolution Video

29 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Estimating Head Pose with ANNs Train neural network to estimate head orientation Preprocessed image of the face used as input

30 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Network Architecture Hidden Layer: 40 to 150 units Pan (Tilt) Input Retina: up to 3 x 20x30 pixel units

31 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Tracking People in a Panoramic View Camera View Panoramic View Perspective View

32 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Training Separate nets for pan and tilt Trained with Std.-Backprop with Momentum Term Datasets: –Training on 6100 images from 12 users –Crossevaluation on 750 images from same users –Tested on 750 images from same users Additional User Independent Testset: –1500 images from two new users

33 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Results histo: Histogram-normalized image used as input edges: Horizontal- and Vertical Edge Image used as input both: Both, Histogram-image plus Edge Images used

34 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Demo

35 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Spatial-Awareness in Smart Rooms Tracking people indoors To focus sensors on people To resolve spatial l relationships To avoid bumping into humans To analyze activity

36 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Person Tracking Vision based localization of people/objects: Single Perspective: Pfinder - W 3 S - Hydra - etc. Multiple Perspective: AVIARY - Easy Living

37 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Person Tracking in the ISL Smart Room Cam2 Cam1 Cam0 Features Tracking agent Feature extractor People Cam3

38 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Personen-Tracking mit mehreren Kameras Ziel: 3D Tracking von Personen in Räumen Segmentierung von Vordergrundobjekten in jedem Bild 3D Schnitt der Strahlen durch die Objektmitten Kalman-Filter

39 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Adaptive Silhouette Extraction Background subtraction : Adaptive Multi-Gaussian background model [Stauffer et al., CVPR 1998] Morphological operators smooth foreground output Connected components form silhouettes

40 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Locating people Location Hypotheses: i) (X,Y) ii) (X,Y) a b a a b b Extract reference point: Centroid Use calibrated sensors to calculate absolute position Create list of location hypotheses 1

41 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Tracking people Best Hypothesis Tracking: Match location hypotheses a to tracks Smooth tracks with Kalman a filter Hypotheses i) (X,Y) ii) (X,Y) Track 1 Track 2 i) ii) Track 1 Track 2

42 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Tracking Problems Imperfect and Merged silhouettes : Counterstrategies Better Vision algorithm Probabilistic Multi-Hypothesis a Tracking Reference point: Head

43 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Use head as reference point instead of centroid Head tracker has significantly lower tracking error and false alarm rate Reference point: Head Tracking error False alarm rate

44 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Demo

45 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Erkennung von Zeigegesten Ziele: –Menschliche Zeigegesten erkennen –Zeigerichtung in 3D extrahieren Einsatzgebiete: –Mensch-Roboter-Interaktion –smart rooms Anforderungen: –Personenunabhängig –Echtzeitbetrieb –Kamerabewegung möglich

46 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Erkennung von Zeigegesten StereokameraLinkes/rechtes Bild

47 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen 3D-Tracker: Verarbeitungsschritte KameraHautfarbeDisparität 3D-Clustering von Hautfarbpixeln liefert Hinweise auf Position von Kopf und Hände.

48 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Gestenerkennung: Bewegungsphasen Zeigegesten bestehen aus drei intuitiv unterscheidbaren Bewegungsphasen: –Beginn –Halten –Ende Genaue Lokalisierung der Haltephase wichtig zur Bestimmung der Zeigerichtung Mittlere Dauer der Bewegungsphasen μ [sec]σ [sec] Komplette Geste Beginn Halten Ende

49 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Gestenerkennung: Modelle Modellierung der 3 Phasen mit separaten Modellen Kontinuierliche HMMs mit 2 Gaussians pro Zustand Null-Modell als Schwellwert für die Phasen-Modelle Training auf handgelabelten Daten

50 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Gestenerkennung: Detektion Eine Zeigegeste wird erkannt, wenn 3 Zeitpunkte t B < t H < t E gefunden werden, so dass –P E (t E ) > P B (t E ) und P E (t E ) > 0 –P B (t B ) > P E (t B ) und P B (t B ) > 0 –P H (t H ) > 0

51 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Gestenerkennung: Merkmale Merkmalsvektor: (r, Δθ, Δy ) Experimente: zylindrische Koordinaten besser als sphärische und kartesische Hand relativ zum Kopf unabhängig von Position im Raum Δθ, Δy keine Anpassung an Zeigeziele aus dem Training Spline-Interpolation der Merkmals- sequenzen auf konstant 40Hz.

52 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Zeigerichtung Kopf-Hand-Linie –Sehstrahl Auge-Hand –Einfach zu messen Unterarmlinie –Potenziell überlegen bei abgewinkeltem Arm –Schwieriger zu messen

53 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Audio-Visual Speech Recognition

54 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Lip Tracking Module Feature based detects localization failures and automatic recover from failures tracks facial features (pupils, nostrils, lips)

55 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Audio-Visual Recognition hyp c = a hyp a v hyp v 1 = a v Kombinations Methoden SNR Gewichte Entropie Gewichte trainierte Gewichte

56 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Fusion Levels Word Level (Vote, Decide based on A and V score) Phoneme Level (Combine by Diff. Weighting Schemes) Feature Level (Combine Features)

57 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Audio-Visual Speech

58 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Mögliche Themen Personentracking Gestenerkennung Attentive Interfaces Face Detection Lippenlesen (Audio-Visual Speech Reco.) Audio-Visual Tracking Emotion Recognition Person Identification Microphone-Arrays Sensor Fusion Smart Room Infrastructure Intelligent Camera Control Self-Calibration Other Smart Room Projects (MIT, Georgia Tech, IM2) Other Sensors: Pressure, IR, etc Speech Recognition –in Meetings –Far-Field –Efficient Microphone-Arrays

59 Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

60


Herunterladen ppt "Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03."

Ähnliche Präsentationen


Google-Anzeigen