Wednesday, 4 April 2018

Lip-sync_Maris_Z







CASE STUDIES


         Probably the very first animation, which really surprised me, was the appearance of the G-man in the game Half-Life 2. Can you imagine the level of the facial animation for that time? It looked more than real in my inflamed imagination. We can confidently say that facial expressions greatly contribute to immersion in the game world (We take into account the game in which facial expressions are one of the main attributes in building a player's relationship with the NPC)

G-man facial expressions showcase.



Developers from Valve approached the issue of visualization dialogues with all responsibility of dialogues. At that time (2004), it was conspicuous (in a good way) and gave life to digital models. Nature, intentions and emotional state were obvious and understandable. It was also obvious that developers did not try to overact emotion animation, but tried to bring them closer to realism. NOTE! There was no motion capture involved to produce this level of the facial animation.

Unfortunately, I'm unable to add a video where Valve is showcasing capabilities of their animation where the lip-sync can be adjusted to match the language where the game was released. I remember this myself as I was playing Russian release and I could not figure out why the to lip-sync is covering some Russian words so perfectly despite the fact that those particular words are being expressed through completely different facial expressions in the English language.


http://www.escapistmagazine.com/forums/read/9.408683-Did-HL2-Cap-Facial-Animation

Here we can observe some interesting discussions about facial expressions and how Valve animations were better even after many years after release.




In my personal opinion, Half-life 2 should be in front before any lesson about the facial expression starts. This company achieved an unreal level of animation in 2004. And even now, in 2018 it looks amazing and can compete with 90% of games out there. Their dedication and expertise level is incredible. If I would go for facial expressions for the game I would definitely reference Half-Life 2 in the first place.

















RESOURCES
References/Images/Information used to create this animation



lip′-sync`


or lip′-synch`,


v.t.
1. to synchronize (recorded sound) with lip movements, as of an actor in a film.
2. to match lip movements with (recorded speech or singing).
v.i.
3. to synchronize or match lip movements and recorded sound.
[1960–65]

Random House Kernerman Webster's College Dictionary, © 2010 K Dictionaries Ltd. Copyright 2005, 1997, 1991 by Random House, Inc. All rights reserved.


FACIAL EXPRESSION ANIMATION




I have used this image to reference realistic emotions without unduly expressive 3D animation style. It was necessary to understand how the real human muscles are able to change facial expression before I go into the 3D realm.


It is quite easy to identify identical emotions looking at these two images, however, stylised CGI characters are often a subject to well-known animation principles such as squash and stretch, timing, Secondary animation etc. These principles are easily traced in almost all animated films which are using this style of the animation. Overexpressed, sharp changes in emotions, rapid change of eye, lips position. In a word, the character does not even hint at the presence of bones.


Coming back to Half-Life 2 I would like to note an obvious desire to give a certain realism to the emotions of the characters. this technique immediately makes it clear that the story will be serious, the characters are important and they are alive. Personally, I like realism much more than those toy characters with rubber bones. I would like to learn, how to rig faces following the concept of realism. Of course, the main technique to achieve realistic muscle action is the Motion Capture. It is this invention that enables us to create stunning facial animations for any character. To make such manually, well, I do not know, I would not undertake such.


There is a special coding system called FACS created by psychologist Paul Ekman. He proposes that there are emotional expressions universal to all of us - Happiness, Sadness, Surprise, Fear, Anger, Disgust, and Contempt. These 7 emotions are the foundation, the transition of one active muscle group to another can represent, for example, a middle between anger and fear. Animated cartoon characters are specially made expressive for a better transfer of mood of the character. More narrowly focused I would say.



When manually animating a character, for example, in a serious picture, we understand how many different muscles can be involved in a smile (17) But what if the character hides his emotions under the fear of death....? Combining emotions can be quite a difficult task, especially if the realism factor is being exploited. This level of animation requires serious experience and competence in matters of facial expression and micromovement. But again, we have the Motion Capture for that...




An excellent example of the transition from stylisation to realism. It is easy to guess that the first level can be done by a second-year animation student. Each subsequent level complicates the task and inevitably requires additional knowledge of the facial anatomy.



This is the reference I used to create my animation. I have decided not to use illustrations of the position of the lips and mouth shape because he wanted to experiment on the forms of the mouth, relying on a real basis. I want to note that this photo helped me the most, there are the forms for any letter, both separate and in a combined form



This is an approximation of the English phonemes each mouth shape can be used to represent:

  • A: m, b, p, h
  • B: s, d, j, i, k, t
  • C: e, a
  • D: A, E
  • E: o
  • F: u, oo
  • G: f, ph
  • X: Silence, undetermined sound


This reference would apply to hand-drawn animation, although some aspects can be used for 3D. For example, the letter F and O are defined in a simple manner, which can help to understand the basic shape of the lips when pronounced.










A detailed explanation of the basic principles of the location of the eyebrows, lips, folds of the nose and forehead. A very good example of how to position the character's muscles to achieve a well-understood emotion.



ENVIRONMENT


After listening to some sound clips from the 11secondsclub site, the first thing downloaded was the competition August soundtrack.

It was a fragment from the musical composition of some rapper. I did not even bother myself to find out who that was. First of all, I chose this composition, because my first character for rigging was a dog named Cody. I did the experimentation part with him. I will talk about that in the PRODUCTION page. Since I associated this composition with streets, graffiti, disorder, I decided to model the environment in this style. Some ghetto street.




The idea was to put the dog in this type of the environment. But immediately there were some difficulties in terms of implementation. The character is very simple .... take a look




Pleasant lines of the body and soft forms required a stylized environment. I understood this, so other references should be taken into account. I was fairly limited in time, for this reason, I could not come up with forms and styles based on a real photo. Some aspects could also be borrowed, but it was simpler and more efficient to model from a simpler example





Screenshots from Animated television series "Hey Arnold!"

This environment was what I needed. Simple forms, minimum details. After modeling, and basic texturing (based on the visual characteristics of the character as he lacks detailed fur, and obviously is primitive)...I got something like this:



Created in Cinema 4D


The main emphasis was on the door and the rubbish bins. The second building was created only in case the composition requires moving the camera parallel to the stage. I tried to stick to a simple design, with explicitly stylized proportions of objects. Disproportions of windows and curvature of polygons. In my opinion, this would fit the character. Later I changed Cody to another rig, but the scene remained the same.

PRODUCTION

Cody//exprementation


The first character I chose was Cody. The free rig, with advanced functionality. The built-in limitation of the amplitude of some muscles makes it possible to avoid extreme topology changes. It was thanks to clear control that I decided to start my experiments on synchronization. Audio clip - here


Since I had some experience in animating head movements, it was easier for me to compute more natural positions of the body. Following a rather sad motive, I tried to depict sadness on the dog's face, with a drop of hope in my eyes. 


Using this reference, I could study which muscles pull eyebrows to reach the apogee of a certain emotion. This video also helped me with the subsequent character.



It is not very difficult to portray a sad dog, it is much harder to create a symbiosis of human mimicry with an anatomy of a dog. Ie, the references I have collected also work on animal characters. Using the guides for muscle location at specific stages of the emotion, I managed to achieve the desired result when trying to change the mood of the character from sad to decisive.



Preparing the rig





It is crucial to record the characters original potion in case if something goes wrong. This little step will act as a backup. 


Loading audio




Real-time
Sets the playback speed of your current scene to 30 frames per second.

Lip syncing




From the very beginning, I started with basic exercises, opening the mouth on the vowels and closing/Half-closed on the consonant sounds. For the most visually expressed letters, such as o, f, a, u, I started using lip animation, following the above-mentioned references. Also, when using small control levers, I recorded myself on camera, in order to see which muscles are tightened and which compress under certain sounds.

Screenshot of a recording process.

I found that the text sounds quite quickly, and with a logical application of all the knowledge obtained before this does not work and the animation did not live up to expectations. The animation was quite sharp, too expressive movements of the jaw did not give the character a feeling that he is alive.



I tried to soften the sharp changes in coordinates, but even this did not help change the situation. Keyframes were too close to each other, for example when the character says - "In my opinion"

Only after a careful inspection ow my own jaw movement I noticed that the "muscle blending" from MY to O occurs only with the lip movement and cheek muscle position change. The jaw itself increases the amplitude only on MY and after just complements the rest of the sentence by a very little Y-axis change. Understanding this feature gave me the opportunity to basically use the animation of the lips, cheeks, and mice of the skull for a lot smother facial animation.





Using this reference, I could study which muscles pull eyebrows to reach the apogee of a certain emotion. This video also helped me with the subsequent character.

For eyebrow animation and eyelid position, I used a video showing which muscles are responsible for their specific location. Following the pictures or photos would not have such an effect. I could see only the final result of the emotion without understanding the trajectory of the movement of the muses.


Evaluation of experiment:

The first experience of animation of facial muscles was successful. I began not only to copy the original coordinates of the location of the muscles and jaws but also to understand how the emotion is transformed from one into another. The use of such techniques as video references and photographs of the extreme position of emotion also gave a good preparation when using controllers. I no longer thought that the use of unlimited controllers was too complex, and the turnover became a necessity to achieve more realistic emotions. Now I understand why the experiment was needed. As it turned out, the first steps in the direction of the lip-sync have revealed not a few questions, which I was able to answer only with the practical application of the knowledge and photo/Video references.

At this point, I felt much more confident that I would start the final animation with a guman character...




Final animation



The reasons why the quality is so low/resolution too small are going to be described in the Rendering section.




A GUY


This character no longer had any restrictions on animation. But at the same was configured for the correct trajectory of the selected muscle group or individual controller. I imported the environment made for the first character, put the guy on the steps and began to animate of the face.


The initial layout of the character had to match the mood of the sound clip. Derisive pose, sad look.




The features of animating the animal and the human should be instantiated immediately. A good rig should have enough controllers for more detailed work. Fortunately, it was this character that met all the requirements. Now viewing photos and video has become an integral part of the work on this animation. I am not going to describe the animation process as it was exactly the same as at the experimentation stage, except that I must tell about some subjective limitations.

As I said before, there are several levels of realizing the character's reality. We can overexpose emotions, give an unnatural location of bones and muscles for a more expressive appearance. Or we can follow the anatomical law and the real limitations of the facial muscle if we pursue a more realistic product. In my case, I did the second. This required me to be careful in changing the coordinates of the controllers, which would be settled in organic constraints.

Animation of the body/head/arms

for animating an unsteady posture, nervous movements of the head and eyes, I tried to picture more suitable movements that would understand how my hands and head move if I were this character. I also tried to give traffic and fingers that would be compressed at some keyword to show, that the character is feeling what he says.

I also added bottles to the scene indicating that the character is probably drunk and talking to an empty beer bottle.





Evaluation of the animation 





From a critical point of view, the work could be done better. But I encountered some limitations about which I could not even think about. After a certain number of keyframes, viewport began to noticeably slow down. In the middle of the animation, I realized that work is becoming more difficult. Especially when you want to achieve some kind of micro-movements using the graph editor. This became simply impossible since I could make changes in the muscles but could not trace them in a real-time. I used all the available spells to optimize the viewport but even with the wireframe enabled and hidden character layer, the controllers did not move smoothly. I believe that the rig was not optimized enough, or some files were damaged since I had a bug about missing aspects of the rig. But even this did not stop me from completing a full animation, with some secondary movement.



RENDERING

Camera setup



The location of the camera in such a position, in my opinion,n could have brought surprises in the development of events and also included in the scene key characters: a guy and a bottle
Of course, with this foreshortening, most of the environment just remained behind the scenes, but otherwise, it was impossible, I had to get close to the character, that would show the animation of his face

Lighting setup



An HDRI map used for SkyDome






TEXTURING




Since the character itself consisted of flat colors, I decided to match this trend.I was using only the AistandardSurface and playing with settings changed the characteristics of the materials that I needed for the scene. I paid special attention to the customization of the specular settings for the bottle, post hole, and the door handle. some materials had an effect of granite instead of a color channel, which would create the effect of concrete construction. Naturally, for full texturing, I would use the substitute painter, but again - the time limit is friendly with the optimization. The color palette was mostly copied from the "Hi Arnold!" street screenshot mentioned above.




SkyDome Settings

(The number of intensity levels directly affects the purity of the image itself, you can also increase the diffuse value in the render settings, but this will only increase the rendering time. In my case, when using the standard settings, it took up to 6 minutes per frame.

Arnold's standard settings can already create the correct lighting and shadow palette. Although,  I had to increase the value of the exposure, which would illuminate the scene properly as the standard value gave me a darkened image in the output.


Standart renderer settings
(Even having these settings can affect the final footage. It can look too grainy. To correct this issue it is worth to increase diffuse and camera AA levels).
Since I had light reflecting elements in the scene, I had to play with the settings of the transmission and the specular. The final image using these settings without  Directional light looks like this:





As you can see, the image is grainy, and the response of the bottles is not so noticeable. I would like to note that the granularity of the steps and walls themselves is caused by a specially created texture. (6 minutes PF, 1280x710)

Let's try better settings (only for a still image) 


All phases increased
+Directional light with a decreased intensity level



The render wasn't finished, but we can see the difference straight away especially at the neck are and palms. But with such adjustments, my laptop will simply burn during the rendering of 264 frames, since it took about 20 minutes to render the half-way image. I understood that the quality of the video could affect my grade, but I decided not to torture my technique for the sake of infinite rendering. Instead, I lowered almost all the phases, leaving only the diffuse and specular, so that reflections could be seen.

Render settings for the final image sequence.


In general, I lowered the resolution, eased the settings and, thanks to this, could reduce the rendering time to 80 seconds. But being distracted by the render settings, I completely forgot about the composition, looking straight at the render, I did not really see how bad it was...




But it was too late. I was very limited in time, I was short of a week, and all this work was done in a hurry. But it is thanks to this that I learned how to quickly solve problems and create optimized render settings, which are quite enough to showcase the actual lip-sync skills. if I am allowed, I will make a new render, which can be be assessed properly.


COMBINING



After downloading the image, I found that its length is 8 seconds instead of 11. Since the number of frames was 264 at 24ps. I had to increase the time stretch by 25% that would fit into this framework and not lose synchronization. After that, I sent it all to the encoder and exported it as an mp4.



EVALUATION


This is not my first project requiring knowledge of animation. Although, many aspects of this brief were new to me. Copying an emotion is not difficult, just pull the right levers and enjoy the result. Another thing, when you need to animate smooth transitions, compliment the voiceover with the right facial muscle movement and position. The most difficult thing I have experienced is the sense of time - when and how quickly particular parts should move, how to balance the cartoonish look with realistic movements and how to keep it simple. By this, I mean the organic feel as we as animators are trying to show, that we actually have been animating something and in the result - we can easily spoil the animation with overly obvious manipulations.  It won't bring any aesthetics and especially - the sense. As for rendering, this is a separate topic. Arnold was new for me, however, many of his options are similar to the Corona renderer which I have used in  my older interior design projects

Like these:




You can easily notice the identical quality of the bedroom render and my final animation. Obviously, in both cases, I rendered a smaller amount of phases than it is required for a crisp image. 


While rendering it's going through different passes + if you have an HDRi map stretched around the dome - shadows are going to be perfect. For my animation render, I changed the shadow color to dark grey that would soften the contrast.

It is necessary to understand the settings of Arnold to understand what changes can you make. For example, if you don't have any transparent object in your scene, there is no point to use transmission settings as they won't affect the render at all.

In general, I learned how to animate a character face accompanied by an audio track. How to set up Arnold for a good render and How to follow the references in terms of facial expressions. But it was important to try the movements on ьmyself before animating the character. Since through this technique, I better understood how my facial muscles move and how the chin behaves when pronouncing different letters. I can also note that I feel much more confident in terms of lip synchronization after I have studied the fundamentals of the facial animation. Now I need to increase my competence by studying the anatomy of the face and the behavior of facial muscles.


Q&A

Why I did not animate the camera?

In my personal opinion, the camera should not be moved just for the movement. I f I would have a larger timescale, I would split the shots, one for the main character and another for the bottle. The camera movement mostly captures some action, or some deep thoughts if the character is thinking about something. Panning wouldn't give any benefit also. In general, I decided to match the mood of the sound clip with a static camera that captures both characters in the same shot.

Why you didn't use any other lights apart from the directional light?

In this case, I had enough light information computed from the HDRi map. Arnold rendered AO using this information. And the directional light "continued the sun position" so I could achieve sharper shadows. And because of laziness...you can just add directional light and all you need is to rotate it as light rays are infinite so it doesn't matter where you allocate it.

Why are you using flat colour materials?


Only because of the relative simplicity of the character. It lacks any additional detail. If I would use normal, specular and diffuse maps for a house texture, the character wouldn't fit in this scenery






BIBLIOGRAPHY

Support.solidangle.com. (2018). Rendering Your First Scene - Arnold for Maya User Guide 4 - Solid Angle. [online] Available at: https://support.solidangle.com/display/AFMUG/Rendering+Your+First+Scene [Accessed 4 Apr. 2018].


Highend3d.com. (2018). Cody Rigged Dog Character for Maya - Free Character Rigs Downloads for Maya. [online] Available at: https://www.highend3d.com/maya/downloads/character-rigs/c/cody-rigged-dog-character-for-maya [Accessed 4 Apr. 2018].



11secondclub.com. (2018). 11 Second Club - August Competition. [online] Available at: http://www.11secondclub.com/competitions/august17 [Accessed 4 Apr. 2018].

Lazzeri, N., Mazzei, D., Greco, A., Rotesi, A., LanatÃ, A. and De Rossi, D. (2018). Can a Humanoid Face be Expressive? A Psychophysiological Investigation.

BethMelia (2018). Music video script_template. [online] Slideshare.net. Available at: https://www.slideshare.net/BethMelia/music-video-scripttemplate-15425849 [Accessed 4 Apr. 2018].

YouTube. (2018). The Evolution of Facial Animation In Video Games. [online] Available at: https://www.youtube.com/watch?v=kvCxyDwHeag [Accessed 4 Apr. 2018].



Pinterest. (2018). Animation Lip Syncing. [online] Available at: https://www.pinterest.com/zakzych/animation-lip-syncing/ [Accessed 4 Apr. 2018].


 Cherry, K. (2017). Scientists Suggest That Humans Really Only Have Four Emotions. [online] Verywell Mind. Available at: https://www.verywellmind.com/how-many-emotions-are-there-2795179 [Accessed 4 Apr. 2018].


Schieve, T., Science, L. and Mind, I. (2018). How many muscles does it take to smile?. [online] HowStuffWorks. Available at: https://science.howstuffworks.com/life/inside-the-mind/emotions/muscles-smile.htm [Accessed 4 Apr. 2018].



Bryn Farnsworth, P., Lokajová, M. and Kryder, C. (2016). Facial Expression Pictures Chart & Facial Movements - iMotions. [online] iMotions. Available at: https://imotions.com/blog/facial-expression-pictures/ [Accessed 4 Apr. 2018].

YouTube. (2018). Slow Motion Facial Expressions - Eyes. [online] Available at: https://www.youtube.com/watch?v=tNuqfqNFrEQ [Accessed 4 Apr. 2018].


Cgtarian.com. (2018). Download Character Ray. [online] Available at: http://www.cgtarian.com/character-ray/download-character-ray.html [Accessed 4 Apr. 2018].