5.1 Surround Sound and FLAC

Posted on March 20, 2013 by Marco Barnig

Suggested configuration for 5.1 music listening (Wikipedia)

Five point one (5.1) is the name for six channel surround sound multichannel digital audio systems, most commonly used in commercial cinemas and home theaters. It uses 5 full bandwidth channels (the “five”) and one low-frequency effects channel (the “point one”). The 5.1 system is used by Dolby Digital (AC3 codec), Sony Dynamic Digital Sound (SDDS), Digital Theater Systems (DTS), and Dolby Pro Logic II.

All 5.1 systems use the same speaker channels and configuration, having a front left (L) and right (R), a center channel (C), two surround channels (SL and SR) and a subwoofer (LFE).

Audio files for 5.1 systems are often encoded with the lossless FLAC codec. FLAC is an open format with royalty-free licensing and a reference implementation which is free software. FLAC has support for metadata tagging, album cover art, and fast seeking.

Lossy compression and encoding schemes for digital audio are MP3 and its successor AAC (Advanced Audio Coding). AAC has been standardized by ISO and IEC, as part of the MPEG-2 and MPEG-4 specifications. AAC is the standard audio format for YouTube, Apple (iPhone, iPod, iPad, …) and Sony devices (Playstation, Walkman, …). AAC is more advanced than the Dolby Digital AC3 codec.

Online music : Last.fm, Deezer and Spotify

Posted on March 4, 2013 by Marco Barnig

A renowned online music service is iTunes, based on SoundJam MP and launched by Apple in 2001. Jeff Robbin and Bill Kincaid developed SoundJam MP in 1998 with assistance from Dave Heller. They chose Casady & Greene to publish SoundJam MP. Jeff Robbin is now the vice president of consumer applications at Apple Inc and he remains the lead software designer for iTunes.

Other online music services are less known, among them Last.fm, Deezer and Spotify.

Last.fm is a music website, founded in the United Kingdom in 2002, acquired by CBS Interactive in May 2007. Using a music recommender system called Audioscrobbler, Last.fm builds a detailed profile of each user’s musical taste by recording details of the songs the user listens to. Audioscrobbler began as a computer science project of Richard Jones. Last.fm was founded in 2002 by Felix Miller, Martin Stiksel, Michael Breidenbruecker and Thomas Willomitzer as an internet radio station and music community site. Last.fm won the Europrix 2002 and was nominated for the Prix Ars Electronica in 2003. Last.fm and Audioscrobbler were merged in 2005 and are still active today. A new desktop player was released on January 15, 2013.

Deezer is a French web-based music streaming service. It allows users to listen to music on various devices. It currently has more than 20 million licensed tracks and over 30,000 radio channels. The first version of Deezer, called Blogmusik, has been developed by Daniel Marhely in Paris in 2006. The company became succesful in 2010 when they entered a partnership with Orange. Deezer has three account types : discovery (free), premium and premium-plus. Deezer was launched in Luxembourg in March 2012 in partnership with Tango.

Spotify is a commercial music streaming service providing DRM-protected content from a range of major and independent record labels, including Sony, EMI, Warner Music Group and Universal. The service was launched in October 2008 by Swedish startup Spotify AB. The company was founded by Daniel Ek and Martin Lorentzon. Since November 2012 the service is also available in Luxembourg.

The system is currently accessible using Microsoft Windows, Mac OS X, Linux, iOS, Android, BlackBerry, Windows Mobile, Windows Phone, S60 (Symbian), Sonos, and other devices. Music can be browsed by artist, album, record label, genre, playlist, radio channels, as well as by direct searches. About 20 million songs are available since December 2012. Some artists are missing because of licensing restrictions imposed by the record labels or by the artists. The Beatles, for example, are not available because of a digital distribution agreement that is exclusive to iTunes.

Three subscriptions, with trials, are available : open, unlimited, premium. A free service is only available upon invitation. Spotify operates under a so-called ‘Freemium’ model, which is offering simple and basic services free for the user to try and more advanced or additional features at a premium price based ont the Open Music Model (OMM). The incorporation of DRM diverges however from the OMM.

In 2011 Spotify was announced as a technology pioneer by the World Economic Forum (WEF).

ID3 in MP3 metadata container

Posted on June 14, 2012 by Marco Barnig

Last update : August 25,2013
ID3 is a metadata container most often used in conjunction with the MP3 audio file format. It allows information such as the title, artist, album, track number, and other information about the file to be stored in the file itself.

There are two unrelated versions of ID3: ID3v1 and ID3v2. They are de facto standards, because no standardization body was involved in its creation nor has given it a formal approval status.

A tag editor is used to read or write ID3 metadata. Many media players, including the standard players iTunes and Windows Media Player, provide tagging features.

The most common fields displayed in a DLNA renderer are the following :

Artist Name
Album Name
Title
Genre
Year
Comments
Length
Picture of the artist or album (high resolution panoramic pictures, for example 640×480 pixels, are best suited)

Sometimes the track number is also shown. Less usual are the Album Artist, the Composer or the Disc Number.

Widely used tag editors are Mp3tag (current version 2.57 released on July 6, 2013, created by Florian Heidenreich) and Easytag (current version 2.1.8 released on February 10, 2013, created by Jerôme Couderc).

The ID3 standard is very flexible with regards to the type and number of images that can be embedded in a MP3 single file (21 types are specified : icons, leaflets, cover art, artist, composer, conductor, location, …). In practice only one picture is used and displayed by the player. More informations about embedding album art in MP3 files are available at the weblog of Richard Farrar.

Informations about metadata editors for image files are available in the following post.

Vocaloids

Posted on October 12, 2011 by Marco Barnig

Vocaloid is a singing synthesizer application, with its signal processing part (concatenative synthesis) developed through a joint research project between the Pompeu Fabra University in Spain and Japan’s Yamaha Corporation, who developed the software into a commercial product. Vocaloid enables users to synthesize singing by typing in lyrics and melody. The main parts of the Vocaloid system are the Score Editor, the Singer Library and the Synthesis Engine. The project started in 2000, the first commercial Vocaloid version was presented by Yamaha at the Musikmesse in Germany in 2003 and the Vocaloid version 3 was launched in October 2011.

Each Vocaloid is sold as “a singer in a box” designed to act as a replacement for an actual singer. Today seven studios are involved with the production and distribution of Vocaloids, among them are three studios creating english Vocaloids, the other four are solely creating Japanese Vocaloids.

Zero-G (english virtual vocalists) : Zero-G Limited was founded in 1990, trading under the name Time+Space, by Ed Stratton and Julie Stratton. Zero-G rapidly became the largest distributor of soundware in the UK and one of the most critically acclaimed sound developers in the world.
Power-X (english virtual vocalists) : PowerFX is a small recording company, based in Stockholm, Sweden. The company has been producing music samples, loops and sound effects since 1995.
Crypton Future Music (japanese and english virtual vocalists) : Crypton, is a media company based in Sapporo, Japan, created in 1995. It develops, imports, and sells products for music, such as sound generator software, sampling CDs and DVDs, sound effect and background music libraries.
Internet Co. Ltd. (japanese virtual vocalists) : Internet Co. is a software company based in Osaka, Japan. It is best known for the music sequencer Singer Song Writer and Niconico Movie Maker for the video sharing website Nico Nico Douga.
AH Software (japanese virtual vocalists) : AH-Software is the software brand of AHS Co., Ltd., an importer of digital audio workstations and encoders in Tokyo, Japan. It is also known as the developer of Voiceroid, a speech synthesizer application only available in the Japanese language.
Bplats (japanese virtual vocalists) : Bplats, Inc. is an application service provider (ASP) based in Tokyo, Japan. The company offers Software as a Service (SaaS) and Platform as a Service (PaaS) solutions, such as the Vocaloid series VY1 and a Vocaloid online shop.
Ki/oon Records (japanese virtual vocalists) : Ki/oon Records is a Japanese record label, a subsidiary of Sony Music Japan.

Hatsune

Kagamine

Leon

Sonika

Big AL

Nekomura

A complete list of the Vocaloid products is available at the Wiki website. The marketing of the Vocaloids is done by the studios.

Just like any music synthesizer, the software is treated as a musical instrument and the vocals as sound, belonging to the software user. The mascots for the software can be used to create vocals for commercial or non-commercial use as long as the vocals do not offend public policy. On the other hand, copyrights to the mascot image and name belong to their respective studios and can not be usedd without the consent of the studio who owns them.

There are a number of derivative products, for example Vocaloid-Flex, Vocal Listener, Miku Miku Dance, Project Diva and MMDAgent. An online Vocaloid service (NetVocaloid) in English and Japanese is available at the Y2 Project website.

The following virtual vocalists are the most famous :

Hatsune Miku (by Crypton Future Media)
Kagamine Rin & Len (Twins : boy & girl by Crypton Future Media)
Lola (by Zero-G)
Leon (by Zero-G)
Miriam (by Zero-G)
Megurine Luka (by Crypton Future Media)
Meiko (by Crypton Future Media)
Kaito (by Crypton Future Media)
Sweet Ann (PowerFX)
VY1 alias Mitzki (by Bplats)
Cantor (by Virsyn)

A number of figurines and plush dolls were released for some of these singers, some have their own Twitter, Facebook and MySpace accounts.

In Japan, Vocaloids have a great cultural impact and lead to a lot of legal implications. Vocaloid music is available on CD’s, iTunes, AmazonMP3 etc. Open air concerts with virtual vocalists have been organized recently with great success :

1st live concert (Animelo Summer Live) : August 22, 2009, Saitama Super Arena, Saitama, Japan
2nd live concert (Mikufes 09) : August 31, 2009,
1st overseas concert (Anime Festival Asia) : November 21, 2009, Singapore
3rd live concert (Miku no Hi Kanshasai 39’s Giving Day) : March 09, 2010, Odaibo, Tokio, Japan
1st american live concert : September 18, 2010, San Francisco, USA
Vocarock Festival : January 11, 2011
Vocaloid Festa : February 12, 2011
4th live concert : March, 9, 2011, Tokio, Japan
2nd american live concert : October 11, 2010, Viz Cinema, San Francisco, USA; screening in the New York Anime Festival
3rd american live concert (Mikunopolis) : July 2, 2010, Nokia Theater, Anime Expo, Los Angeles, USA

During the concerts, 3D animations of the Vocaloid mascots are projected on a transparent screen giving an effect of a pseudo-hologram. Videos of different Vocaloid concerts are available at the following Youtube playlist.

A similar software as Vocaloids, developped by Ameya/Ayame, is called UTAU and has been released as freeware. Cracked copies of Vocaloids are called Pocaloids.

Microsoft Tellme

Posted on October 7, 2011 by Marco Barnig

Microsoft Tellme simplifies everyday tasks with the natural power of your voice. You can talk to your PC, tablet, phone, TV or car.

The results of the Microsoft Tellme technologies “Say it. Get it” are speech recognition and synthesis capabilities in products ranging from Xbox Kinect for fun to Microsoft Tellme IVR for customer care to Windows Phone 7 for life and work.

In Windows 7 you can use voice recognition to control your computer and to dictate and edit text. A guide how to set up your computer for this task is available at the microsoft website.

The provided technologies for business applications are Microsoft Tellme IVR and embedded speach features in Office, Lync and Exchange . Different platforms are available : cloud, server, desktop, phone.

To extend the built-in speech recognition functionality included in Windows on desktop, you can use Windows Speech Recognition Macros or, for more advanced uses, the Microsoft Speech API (SAPI).

SAPI has been an integral component of all Microsoft Windows versions since Windows 98. Microsoft Windows XP and Windows Server 2003 include SAPI version 5.1. Windows Vista and Windows Server 2008 include SAPI version 5.3, while Windows 7 includes SAPI version 5.4. Code written for SAPI 5.3 (Vista) will run on SAPI 5.4 (Windows 7) without recompiling.

Google Text-to-Speech (TTS) support

Posted on July 13, 2010 by Marco Barnig

Last update : 30 April 2011

On november 16th, 2009, Google announced on their official blog that english text-to-speech was added to the translation tools. Google used eSpeak, which is an open source software speech synthesizer for this service.

In may 2010, Google Translate added more audio translations languages, including Afrikaans, Albanian, Catalan, Chinese (Mandarin), Croatian, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Haitian Creole, Hindi, Hungarian, Icelandic, Indonesian, Italian, Latvian, Macedonian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Turkish, Vietnamese and Welsh.

The speech audio is in MP3 format and is queried via a simple HTTP GET (REST) request. For english, an example url is:

http://translate.google.com/translate_tts?tl=en&q=how are you?

The TTS web service is restricting the text to 100 characters and the service returns 404 (Not Found) if the request includes a Referer header.

December 3, 2010, Google acquired Phonetic Arts, a company specialised in speech synthesis. Phonetic Arts Limited delivers technology that generates natural expressive speech. The products include Phonetic Morpher, Phonetic LipSync and Phonetic Synthesizer. Phonetic Arts, formerly known as Tayvin 356 Limited, was founded in 2006 and is based in Cambridge, UK. The Phonetic Arts technology generates natural computer speech from small samples of recorded voice and should improve the voice output quality of Googles text-to-speech applications.

Google does not only provide speech output tools, but also speech input tools (Voice Search, Voice Input, Voice Actions), mainly in relation with the mobile phone OS Android.

Version 11 of the Google Chrome browser includes the HTML5 Speech Input API.

An amusing application of the Google TTS system is the Google Translate Beatbox.

Dewplayer : lecteur mp3 en flash

Posted on December 28, 2009 by Marco Barnig

Alsacréations, une agence web à Strasbourg en Alsace, spécialisée dans la conception de sites internet conformes aux standards internationaux W3C, offre depuis plusieurs années un lecteur audio mp3 en Flash par Dew, simple à installer et à utiliser.

Appelé Dewplayer, ce lecteur est distribué sous licence Creative Commons, son utilisation est libre et gratuite même dans un cadre professionnel ou commercial.

Un générateur de code XHTML est disponible sur le site qui va produire un code à copier-coller selon les besoins des usagers. L’utilisation de swfobject est recommandée pour l’intégration du lecteur.

Le pilotage du lecteur par Javascript est possible et il y a de nombreuses options disponibles. J’utilise le lecteur depuis des années avec succès. La version la plus récente est 1.9.6.

SoundFonts (.sf2)

Posted on December 9, 2009 by Marco Barnig

SoundFont, a registered trademark of E-mu Systems, Inc., is a name that collectively refers to a file format and associated technology to synthesize audio in the context of computer music composition. The exclusive license for re-formatting and managing historical SoundFont content has been acquired by Digital Sound Factory.

A SoundFont file, or SoundFont bank, contains one or more sampled audio waveforms (or samples), which can be re-synthesized at different pitches and dynamic levels. SoundFont banks are related to MIDI devices and can be seamlessly used in place of General MIDI (GM) patches in many computer music sequencers.

The original SoundFont file format was developed in the early 1990s by E-mu Systems and Creative Labs (used in Sound Blaster AWE32). Files in this format conventionally have the file extension of sbk. The SoundFont 2.0 version was released in 1996 and was fully disclosed as a public specification to make it an industry standard. New versions up to 2.4 have been relased in the past years and the new SoundFont files conventionally have the file extension sf2.

There are other sound formats available, e.g. The DownLoadable Sounds (DLS) standardized by the MIDI Manufacturers Association (MMA), the DLS-Level 2 and the Structured Audio Sample Bank Format (SASBF )standardized by he MPEG standards body in collaboration with MMA and MIT and proprietary formats developed by Yamaha and other music companies. Nevertheless the sf2-soundfonts became a de-facto standard and are widely used today.

There are a lot of websites available that offer free and commercial sf2 soundfonts :

The following tools are best suited to use SoundFonts :

SynthFont : a free midi file player using SoundFonts
Viena : a free SoundFont editor
FluidSynth : an open source real-time software synthesizer used in several music applications
Gervill : a software sound synthesizer for use with the Java Sound API
SFPack and SFArk : archivers for SoundFont banks which use different compression techniques

VoicePHP : build voice enabled applications directly in PHP without any 3rd party APIs

Posted on November 8, 2009 by Marco Barnig

VoicePHP is not an extension to PHP; infact it’s the same PHP which now outputs voice instead of text and also takes input as voice instead of text. In technical terms, it’s PHP whose standard text based input & output (stdio, stdout in programmer’s term) are replaced by voice equivalent.

VoicePHP diagram

VOXEO hosting platform

Posted on December 30, 2008 by Marco Barnig

Voxeo offers three main application platforms for free to developers: CallXML, CCXML, and VoiceXML.

The Prophecy 8.0 – CallXML 3.0 platform allows developers to build robust IVR applications using only static content. CallXML suits the needs of most telephony applications that use touchtone input (DTMF).

The Prophecy 8.0 – CCXML W3C 1.0 platform allows to deploy the next-generation conferencing/call routing applications to ensure that they will stand the test of time.

The Prophecy 8.0 – VoiceXML 2.1platform includes the Voxeo ASR engine (available only in US English) and is the world’s first and only 100% certified-compliant VoiceXML 2.0 browser. The Prophecy Platform supports all the VoiceXML 2.1 additions and enhancements, as well as the SISR/SRGS grammar formatting standards. It also includes legacy support for the older GSL and JSGF grammar formats.

I created an account a few years ago to do my first trials with VoiceXML. Today I updated the account and started with a new HelloWorld test application. The telephone numbers to access the application are the following:

Skype VoIP : +99000936 9991425592
SIP VoIP : sip:9991425592@sip.voxeo.net
iNum Number : +883510001801392

iNum Number from Luxembourg : +352 20880108 p 883510001801392

If calling from a mobile phone (for instance with a BlackBerry) to an iNum number, you have to insert a pause between the local number and the iNum number by using the menu during the number editing.

My first iNum call to the HelloWorld application was succesfully established today at 21h21 with my mobile phone.

Internet with a Brain

Your browser becomes your personal assistant and Internet gets a synthetic consciousness

Category Archives: Audio Technologies