CBR and VBR in mp4 H264 video files

Posted on October 26, 2011 by Marco Barnig

CBR versus VBR in video encoding

When referring to codecs, CBR (constant bitrate) encoding means that the rate at which a codec’s output data should be consumed is constant. As opposed to constant bitrate, VBR (variable bitrate) vary the amount of output data per time segment. VBR allows you to set a maximum and minimum bitrate. The advantages of VBR are that it produces a better quality-to-space ratio compared to a CBR file of the same data. The bits available are used more flexibly to encode the sound or video data more accurately, with fewer bits used in less demanding passages and more bits used in difficult-to-encode passages.

The disadvantages are that it takes more time to encode, as the process is more complex. VBR may pose problems when streaming over a web connection since it is the maximum bit rate that matters, not the average.

The generally accepted best practice is to use CBR when producing for streaming delivery, and VBR when producing for progressive download.

MPEG-4 containers : mp4, m4a, m4p, m4v

Posted on October 25, 2011 by Marco Barnig

Last update : September 16, 2013

To play H264 encoded movies, the encoded video and audio files must be packaged in a specific type of container following the MPEG-4 Part 12 specification. Stream packaging, also known as muxing, is the procedure to combine multiple elements that enable control of the distribution delivery process into a single multiplexed media file.

The most common container for H264 encoded videos is specified by MPEG-4 Part 14 (standard ISO 14496-14) and has the file extension mp4. This format, often called MP4 container, is based on the quicktime format mov. Audio-only MPEG-4 files have a m4a extension, or m4p when they are encrypted.

Apple introduced the extension m4v for it’s iTunes applications. It’s very close to .mp4, some differences are the optional Apple’s DRM copyright protection, and the treatment of AC3 (Dolby Digital) audio which is not standardized for MP4 container

The following command line tools are available to create and modify MP4 files by combining (multiplexing) previously encoded video or audio tracks, as well as subtitles, chapter information and meta data.

AtomicParsley : lightweight program for reading, parsing and setting metadata into MPEG-4 files
MP4Creator : tool from Cisco’s mpeg4ip suite that combines video, audio, text and other media to create MPEG-4 streams.

GUI’s for both programs are also available. Atomicparsleygui is a GUI for AtomicParsley, MP4Muxer is a GUI for MP4Creator. All these programs have not been updated during the last four years. A better choice for MPEG-4 tools today is FFmpeg and MP4Box.

More informations about the MPEG-4 containers are listed hereafter :

Dooms9’s Forum : MP4 FAQ
Wikipedia : Comparison of container formats

Indexing (MOOV atom) in H264 video files

Posted on October 25, 2011 by Marco Barnig

Last update : July 8, 2013

QTIndexSwapper 2 (to move MOOV atom)

When a streaming mp4 H264 video file won’t play immediately in a Flash video player, the reason could be a quicktime (QT) Index problem. This index is called MOOV atom. The moov atom, also referred to as the movie atom, defines the timescale, duration, display characteristics of the movie, as well as subatoms containing information for each track in the movie.

Often the moov atom is located at the end of the video file and the Flash player needs to load the entire file to read this information. The solution is simple: Move the moov atom from the end of the file to the beginning. Renaun Erickson, Developer Evangelist for Adobe Systems Inc., created a simple tool called QTIndexSwapper 2 (version 2.3.8) to do this job. This AIR application can be downloaded from his blog. Another tool to move the moov atom is MP4 FastStart (version 1.0.0). A tool with a similar name qt-faststart is available for ffmpeg :
qt-faststart old.mp4 new.mp4
Because the moov atom uses absolute file sizes in its format, putting at the beginning of the file before the entire file has been created isn’t possible. The only way to move it to the beginning is to generate the entire file with the moov atom at the end, and then re-process the entire file to move it to the beginning. This is done by Handbrake. If you use this tool to convert videos, the moov atom is set correctly if you select “web optimized”.

To see if the moov atom is at the beginning of a video, you can open the file in a text editor and look for “moov” string in the raw output:
^@^@^@ ftypisom^@^@^B^@isomiso2avc1mp41^@^Eï moov^@^@^@lmvhd
More informations about atoms in mp4 files are available at the following links :

Atomic Parsley : MPEG-4 files
Adobe Devnet : Understanding the MPEG-4 movie atom
H.264 HTTP Web Notes : Web Video Resource Wiki
Post processing in ffmpeg to move ‘moov atom’ in MP4 files (qt-faststart), stackoverflow
Is there a downside to putting the MOOV atom at the beginning of an MP4 file?, stackoverflow

AVC (H264) video settings

Posted on October 24, 2011 by Marco Barnig

Last update : August 21, 2013

It’s not easy to configure an AVC (H264) codec to create videos which will play on different devices and stream from various servers on the web, including Amazon S3 Cloudfront. Some basic informations about the different frame types of AVC are given at the post Smart editing of MPEG-4/H264 videos. The following list gives some informations about the common H264 parameters :

CABAC : stands for Context Adaptive Binary Arithmetic Coding. Improves encoding efficiency at the expense of playback/decoding efficiency. The default option is on, unless the encoded video is to be played back on devices with limited decoding power (for example iPod). CABAC is only supported by the main and higher profiles.

Trellis : Trellis is only available with CABAC on. It improves quality, while maintaining a small file size but it will increase conversion time slightly. The default value is on.

Encoding mode :

Single Pass – Bitrate: encodes the video once with a set constant bitrate for each frame
Single Pass – Quantizer: encodes the video with a set quantizer (higher quantizer => lower quality) for each frame. The default value is 26, the maximum value is 51.
Single Pass – Quality: encodes the video with a set quality rating for each frame
Two Pass: encodes the video twice (once to determine it’s properties, another to ensure the selected output file size is reached with maximum efficiency). This is the most common setting.
Multi Pass: Same as Two Pass except for extra encoding passes to ensure even better quality/accurate file size. During multipass encoding, the video results of the first pass are saved into a log file. In a second step the encoding is done based on the logfile data.

Bit Rate : the average bitrate varies between 0 and 5000 Kbits/s; the default values are 800 Kbits/s for low quality, 1000 Kbit/s for medium quality and 1200 Kbits/s for high quality.

Keyframe Boost : High values give better visual quality but also bigger file sizes. The default value for I-Frames is 40%. Values vary from 0 to 70.
B-Frame reduction : these frames are responsible for the interpretation of motion in the video. This setting determines the reduction of quality in B-frames in favor of P-frames (predicted picture). The default vallue is 30%, the range varies from 0 to 60%. For cartoons higher values are recommended.
Bitrate variability : This attribute indicates in how far the bitrate is allowed to vary in relation to what is set as target bitrate. A variable bitrate tells the encoder to vary bitrate as needed, based on the information in the frames. The default value is 60%, the range varies from 0 to 100%.

Quantization limits : these values are only used when the Single Pass – Quantizer encoding mode is selected.

Min QP : Values vary from 0 to 50, the default value is 10.
Max QP : Values vary from 0 to 51, the default value is 51
Max QP step : Values vary from 0 to 50, the default value is 4.

Scene cuts : this option sets how H264 determines when a scene change has occurred and hence when a key frame is needed.

Scene cut threshold : The default value is 40. A higher value will allow H264 to be less sensitive to scene changes. A lower value is recommended for dark videos.
Min IDR frame interval : IDR means Instantaneous Decode Refresh, a parameter to indicate the amount of frames in between before the encoder can detect a new scene change. Setting this to high will result in not detecting enough scene changes. Setting it too low results in an unnecessary high bitrate. The range varies from 0 to 100.000, the default value is 25.
Max IDR frame interval : Setting this too low results in too many keyframes and as such wasting bitrate for nothing. The range varies from 0 to 100.000, the default value is 250.

Partitions : During the encoding process, the encoder will break down the video into so-called Macroblocks. Then it will search for similar blocks in order to discard redundant data. The macroblocks can be subdivided into 16×8, 8×16, 8×8, 4×8, 8×4, and 4×4 partitions. The partition searches increase accuracy and compression efficiency. As a general rule, the more search types are performed, the better and stronger the compression will be while maintaining a high quality output.

8×8 transform : the 8×8 Adaptive DCT transform is a very powerful compression technique but it is not compatible with every device. It makes the video High Profile AVC.
8×8, 8×16 and 16×8 P-Frame search : This settings enables the 8×8 partitions on P-Frames and thus improves the visual quality of these frames.
8×8, 8×16 and 16×8 B-Frame search : This settings enables the 8×8 partitions on B-Frames and thus improves the visual quality of these frames.
4×4, 4×8 and 8×4 P-Frame search : This settings enables the 4×4 partitions on P-Frames, but usually the quality improvement will be negligible. Therefore this option is not worth the additional encoding time and thus can safely be turned off.
8×8 intra search : This settings enables the 8×8 partitions on I-Frames and thus improves the visual quality of these frames, but it requires the 8×8 Adaptive DCT Transform.
4×4 intra search : This settings enables the 4×4 partitions on I-Frames and thus improves the visual quality of these frames.

B-Frames :

Use as a reference : alows a B-Frame to reference another B-Frame to provide better quality. Only useful when using more than 2 consecutive B-Frames.
Adaptive : Turns on adaptive B-frames, which allows H264 to determine the number of B-frames to use. The default value is on. This option is only available when at least 1 B-frame has been set.
Bidirectional ME : allows predictions based on motion both before and after the B-frames. Default value is on.
Weighted bipredictional : allows B-Frames to be predicted more heavily from P-Frames which results in improved accuracy and therefore a more efficient encoding. Default value is on. This option is only available when at least 1 B-frame has been set.
Direct B-Frame mode : temporal or spatial : The default value is temporal. The spatial mode handles better animated content.
Max consecutive : the number of consecutive B-Frames. The values vary from 0 to 5, the default value is 3.
Bias : Sets how much bias H264 should give the usage of B-frames (higher means more use of B-frames). Setting this to 100 is the equivalent of not selecting the “Adaptive” option.The default value is 0, possible values vary from -100 to +100.

Motion estimation :

Partition decision : This controls the precision with which the motion in the video is estimated. Values range from 1 to 6. The default value is 5. A setting of 6 is even better but it strongly increases the amount of time needed for the conversion.
Method : The better the method, the more efficient compression and high quality output. Hexagonal Search is the default setting. Uneven Multi-hexagon is meant for powerful computers, while Exhaustive search works only on super computers.
Range : this field is disabled when you select Hexagonal Search. It only works with the powerful methods and it specifies the motion search in the pixels. The more pixels are examined, the more processor power is needed, but the better the outcome. The values vary from 0 to 64, the default value is 16.
Max Ref Frames : This value indicates how many previous frames can be referenced by a P-frame or B-frame. The higher this value, the better the quality at the expense of speed. The values vary from 0 to 16, the default value is 0.
Mixed references : offers the codec greater freedom to make references on a smaller scale. This option is only available when the Max Ref Frames value is greater than 1.
Chroma ME : uses the color information in the video to estimate motions, which increases the visual quality. It is recommended to set this option on.

Misc. options :

Threads : This sets the number of CPU threads to use in encoding. Default value is 1.
Noise reduction : this setting depends if there is noise in the video images or not. Videos with noise appear grainyNoise Reduction filters out that noise and the more noise you have, the higher you need to set the value. Varies from 0 to 65535. Default value is 0.
Deblocking filter : A deblocking filter is a video filter applied to blocks in decoded video to improve visual quality and prediction performance by smoothing the sharp edges which can form between macroblocks when block coding techniques are used. The strength (values from -6 to +6) and threshold (values from -6 to +6) of the filter are set. The default values are 0 and 0.

The AVC specifications define a number of different profiles specifying which compression features of H.264 are allowed or forbidden. In addition to the profiles, the AVC specifications also define a number of levels putting further restrictions on other properties of the video. These restrictions include the maximum resolution, the maximum bitrate, the maximum framerate. The common notation for Profiles and Levels is “Profile@Level”, for example Main@3.1.

The most common profiles for webstreaming are baseline (BP) and main (MP). Some differences in the features for these profiles are shown hereafter :

Compression features	Baseline Profile	Main Profile
B-Frames	no	yes
CABAC	no	yes
FMO, ASO, RS	yes	no
PicAFF, MBAFF	no	yes

The next table shows the maximum values for some common levels :

Level Number	Video bitrate	Resolution & frame rate
1.3	768 Kbit/s	352×288 ; 30 fps
2.2	4 Mbit/s	352×576 ; 25 fps
3.1	14 Mbit/s	720×576 ; 25 fps
4.0	20 Mbit/s	1920×1080 ; 30 fps

To display mp4 videos in all browsers and devices, especially in IE9, it’s necessary to include the right MIME type. If the videos are stored on Amazon AWS S3, the default content type is “application/octet-stream”. It’s easy to change the content type in video/mp4 in the properties-Metadata menus.

Further informations about AVC are available at the following websites :

Miracle Tutorials : Converting video to streaming MP4 for S3 Amazon with AVS Video Converter, by Rudolf Boogerman
AVS4YOU : H.264 Advanced Settings
Streaminglearningcenter : Producing H264 Video, by Jan Ozer
avidemux : H264 encoding guide

MPEG-4 Part 2 and Part 10

Posted on October 19, 2011 by Marco Barnig

Last update : September 16, 2013

MPEG-4 Part 2

MPEG-4 Part 2 (MPEG-4 Visual) is a video compression technology developed by MPEG, similar to previous standards such as MPEG-1 and MPEG-2 and compatible with H.263. Several popular codecs including DivX and Xvid implement this standard.

MPEG-4 Part 10

MPEG-4 Visual should not be confused with MPEG-4 Part 10 which is commonly referred to as H.264 or AVC (Advanced Video Coding), and was jointly developed by ITU-T and MPEG.

AVC is currently one of the most commonly used formats for the recording, compression, and distribution of high definition video.

Bio:Fiction

Posted on October 15, 2011 by Marco Barnig

synth-ethic

Bio:Fiction was the world’s first synthetic biology film festival. The first and original festival took place at the Museum of Natural History in Vienna, Austria, from 13-14th of May 2011. Since then Bio:Fiction is officially on tour around the world. The festival provided information and dialogue about synthetic biology in an attractive, factual and entertaining way.

Synthetic biology aims at applying engineering principles to biology. The DNA of an organism is no longer manipulated, but programmed on a computer and built up from scratch.

The Festival marked also the beginning of the synbio art exhibition Synth-ethics in the Museum, that presented biotech art objects related to synthetic biology. The exhibition featured 10 artist and lasted from May 14th to June 26th, 2011.

The exhibition was produced by Biofaction KG and is part of the Cinema and Synthetic Biology project, funded by GEN-AU ELSA.

Strandbeests : Kinetic Sculptures by Theo Jansen

Posted on October 15, 2011 by Marco Barnig

Last update : August 9, 2013

Strandbeest “Animaris Ordis Parvus” Assembly Kit

Theo Jansen, born 14 March 1948, is a Dutch artist and kinetic sculptor. Since 1990 he builds large works, called strandbeests, which resemble skeletons of animals, that are able to walk using wind power on the beaches of the Netherlands. Not pollen or seeds but plastic yellow tubes are used as the basic material of this new creatures. Eventually he wants to put these animals out in herds on the beaches, so they will live their own lives.

Some beach animals have a stomach consisting of recycled plastic bottles containing air, that can be pumped up to a high pressure by the wind. Others are able to detect once they have entered water and walk away from it, and one species will even anchor itself to the earth if it senses a storm approaching.

The artworks of Theo Jansen have been presented on numerous websites, TV shows, videos, books, conferences and exhibitions. The movie Strandbeesten, directed by Alexander Schlichter (2008), was presented at the Bio:Fiction festival in 2011.

An assembly kit of a miniature version of the strandbeests Animaris Ordis Parvus is available at the website of the artist. It is produced by Gakken Education Publishing Co. Ltd. Japan. After assembling, the mini strandbeest walks on the wind, by hand or by blowing against the propeller. Another strandbeest, the mini Rhinocerus, has been published by the same company in Japan.

MMDAgent toolkit

Posted on October 14, 2011 by Marco Barnig

Last update : August 9, 2013

MMDAgent

MMDAgent is a toolkit for building voice interaction systems. The toolkit is released for contributing to the popularization of speech technology. Users can design users own dialog scenario, 3D agents, and voices. This software is released under the New and Simplified BSD license. Version 1.3.1 was released on December 25, 2012, and is available at the SourceForge website. The toolkit was created by the Department of Computer Science at the Nagoya Institute of Technology, Japan. The current members of the project team are Keiichi Tokuda, Akinobu Lee and Keiichiro Oura.

MMDAgent employs MikuMikuDance (MDD) as a foundation for its 3D rendering system, as well as allowing users to maintain a lively conversation with their 3D companion. Hatsune Miku is one of the many models that can be used to hold a conversation with.

MMDAgent speech recognition

The speech recognition module of MMDAgent is based on Julius, the open-source large-vocabulary continuous speech recognition engine. New words to be recognized to MMDAgent can be added by making a user dictionary.

An MMDAgent WordPress Blog with news about the project was launched in August 2012.

Miku Miku Dance (MMD)

Posted on October 13, 2011 by Marco Barnig

MikuMikuDance (MMD) is a freeware animation program that lets users animate and create 3D animation movies for Vocaloid models. MikuMikuDance was programmed by Yu Higuchi and has gone through significant upgrades since its creation. Its production was made as part of the Vocaloid Promotion Video Project (VPVP).

The software allows users to import 3D models into a virtual space that can be moved and animated accordingly. The following features are available :

import of .wav files to create music videos
import and export of motion data
integrated physics engine
use of Microsoft’s Kinect
map shadowing

Miku Miku Dance screen snap

The software comes with a number of 3D models based on the mascots of Crypton Future Media Vocaloids. The default models Miku, Meiko, Kaito, Kagamine Rin/Len, Akita Neru and Haku Yowane were created by Animasa, the default Sakine Meiko model was created by Kio. All content, including the 3D models, is distributed freely by the users and most of its additional content is produced by fans using 3D modeling software. As recognition and popularity of Vocaloids grew, the japanese video hosting platform Nico Nico Douga became a place for collaborate content creation.

The first version of Miku Miku Dance was released on February, 24, 2008. An english version was released one month later. On May 26, 2011, Yu Higuchi announced he would retire from developing MMD. The last stable release of the program is version 7.30.

Additional useful informations about Miku Miku Dance are available at the following links :

Vocaloids

Posted on October 12, 2011 by Marco Barnig

Vocaloid is a singing synthesizer application, with its signal processing part (concatenative synthesis) developed through a joint research project between the Pompeu Fabra University in Spain and Japan’s Yamaha Corporation, who developed the software into a commercial product. Vocaloid enables users to synthesize singing by typing in lyrics and melody. The main parts of the Vocaloid system are the Score Editor, the Singer Library and the Synthesis Engine. The project started in 2000, the first commercial Vocaloid version was presented by Yamaha at the Musikmesse in Germany in 2003 and the Vocaloid version 3 was launched in October 2011.

Each Vocaloid is sold as “a singer in a box” designed to act as a replacement for an actual singer. Today seven studios are involved with the production and distribution of Vocaloids, among them are three studios creating english Vocaloids, the other four are solely creating Japanese Vocaloids.

Zero-G (english virtual vocalists) : Zero-G Limited was founded in 1990, trading under the name Time+Space, by Ed Stratton and Julie Stratton. Zero-G rapidly became the largest distributor of soundware in the UK and one of the most critically acclaimed sound developers in the world.
Power-X (english virtual vocalists) : PowerFX is a small recording company, based in Stockholm, Sweden. The company has been producing music samples, loops and sound effects since 1995.
Crypton Future Music (japanese and english virtual vocalists) : Crypton, is a media company based in Sapporo, Japan, created in 1995. It develops, imports, and sells products for music, such as sound generator software, sampling CDs and DVDs, sound effect and background music libraries.
Internet Co. Ltd. (japanese virtual vocalists) : Internet Co. is a software company based in Osaka, Japan. It is best known for the music sequencer Singer Song Writer and Niconico Movie Maker for the video sharing website Nico Nico Douga.
AH Software (japanese virtual vocalists) : AH-Software is the software brand of AHS Co., Ltd., an importer of digital audio workstations and encoders in Tokyo, Japan. It is also known as the developer of Voiceroid, a speech synthesizer application only available in the Japanese language.
Bplats (japanese virtual vocalists) : Bplats, Inc. is an application service provider (ASP) based in Tokyo, Japan. The company offers Software as a Service (SaaS) and Platform as a Service (PaaS) solutions, such as the Vocaloid series VY1 and a Vocaloid online shop.
Ki/oon Records (japanese virtual vocalists) : Ki/oon Records is a Japanese record label, a subsidiary of Sony Music Japan.

Hatsune

Kagamine

Leon

Sonika

Big AL

Nekomura

A complete list of the Vocaloid products is available at the Wiki website. The marketing of the Vocaloids is done by the studios.

Just like any music synthesizer, the software is treated as a musical instrument and the vocals as sound, belonging to the software user. The mascots for the software can be used to create vocals for commercial or non-commercial use as long as the vocals do not offend public policy. On the other hand, copyrights to the mascot image and name belong to their respective studios and can not be usedd without the consent of the studio who owns them.

There are a number of derivative products, for example Vocaloid-Flex, Vocal Listener, Miku Miku Dance, Project Diva and MMDAgent. An online Vocaloid service (NetVocaloid) in English and Japanese is available at the Y2 Project website.

The following virtual vocalists are the most famous :

Hatsune Miku (by Crypton Future Media)
Kagamine Rin & Len (Twins : boy & girl by Crypton Future Media)
Lola (by Zero-G)
Leon (by Zero-G)
Miriam (by Zero-G)
Megurine Luka (by Crypton Future Media)
Meiko (by Crypton Future Media)
Kaito (by Crypton Future Media)
Sweet Ann (PowerFX)
VY1 alias Mitzki (by Bplats)
Cantor (by Virsyn)

A number of figurines and plush dolls were released for some of these singers, some have their own Twitter, Facebook and MySpace accounts.

In Japan, Vocaloids have a great cultural impact and lead to a lot of legal implications. Vocaloid music is available on CD’s, iTunes, AmazonMP3 etc. Open air concerts with virtual vocalists have been organized recently with great success :

1st live concert (Animelo Summer Live) : August 22, 2009, Saitama Super Arena, Saitama, Japan
2nd live concert (Mikufes 09) : August 31, 2009,
1st overseas concert (Anime Festival Asia) : November 21, 2009, Singapore
3rd live concert (Miku no Hi Kanshasai 39’s Giving Day) : March 09, 2010, Odaibo, Tokio, Japan
1st american live concert : September 18, 2010, San Francisco, USA
Vocarock Festival : January 11, 2011
Vocaloid Festa : February 12, 2011
4th live concert : March, 9, 2011, Tokio, Japan
2nd american live concert : October 11, 2010, Viz Cinema, San Francisco, USA; screening in the New York Anime Festival
3rd american live concert (Mikunopolis) : July 2, 2010, Nokia Theater, Anime Expo, Los Angeles, USA

During the concerts, 3D animations of the Vocaloid mascots are projected on a transparent screen giving an effect of a pseudo-hologram. Videos of different Vocaloid concerts are available at the following Youtube playlist.

A similar software as Vocaloids, developped by Ameya/Ayame, is called UTAU and has been released as freeware. Cracked copies of Vocaloids are called Pocaloids.

Internet with a Brain

Your browser becomes your personal assistant and Internet gets a synthetic consciousness

Author Archives: Marco Barnig