Dartmouth Download Matlab
Malcolm's Page of Publications Malcolm Slaney Dr. Malcolm Slaney is a Research Scientist in the Machine Hearing Group at Google Research. He is a at Stanford CCRMA, where he has led the for more than 20 years, and an in the Electrical Engineering Department at the University of Washington. He is a coauthor, with A. Kak, of the IEEE book 'Principles of Computerized Tomographic Imaging.' This book was republished by SIAM in their He is coeditor, with Steven Greenberg, of the book Before joining Google, Dr.
Slaney worked at Bell Laboratory, Schlumberger Palo Alto Research, Apple Computer, Interval Research, IBM's Almaden Research Center, Yahoo! Research, and Microsoft Research. For many years, he has led the auditory group at the Dr. Slaney's recent work is on understanding the role of attention in conversational speech and general audio perception. He is a Publications and Pointers A more complete list of my publications is at this link:. I now work in the Machine Hearing group, which is part of the, at in Mountain View, CA.
I used to work in the in Mountain View, CA. I used to work at and.
Nov 11, 2003. The most recent version of the OPNML toolbox can be downloaded from Make sure the location of the toolbox is included in your Matlab path. Finally, the latest version of BATTRI can be downloaded from nml.dartmouth.edu/Software/BATTRI in a gnu.
My IBM work is described on this. Before that I worked for, Apple Computer's Advanced Technology Group, and Schlumberger Palo Alto Research. Several of my technical reports and papers are available on the net for downloading.
The following is a brief list. I have a for the fun stuff. Many of my papers can be found online via the or the portals. This page shows my, work, my work, some of my, and pointers to. My is now online. Get more information.
The book is back in print and you can order it now from. SIAM honored us by including it in their 'series of books!!! Multimedia Analysis I spent a few years investigating an algorithm known as Locality Sensitive Hashing (LSH) that is used to efficiently find nearest neighbors. I wanted to understand how to make LSH more efficient. I wrote a tutorial with Michael Casey and Christoph Rhodes. Then with colleagues at Yahoo I wrote a 'definitive' article about how to choose the optimum parameters.
Both the Matlab (optimization) and Python (implementation) code is online too. How to I wrote a column for IEEE Multimedia Magazine about my vision of the multimedia world. The columns are online. I get to work with lots of wonderful image data and some very smart computer-vision people. For a couple of years, I worked with Rainer Lienhart and Eva Hoerster on image classification in large databases. With Eva and Rainer:,,, With Srinivasan: I've been working on finding similar songs in large music databases with at Dartmouth and Goldsmiths College, University of London.
We want to find matches that are similar, but not exact (fingerprinting finds exact matches.) Michael wrote a great overview of l, and I helped edit a. I've also been working with from Yahoo's media group to better understand how to deliver music. We've characterized the of people's musical interests, studied (using 480,000 subjects), and, most recently, survey several techniques for content-based similarity. And work with Benjamin Marlin when he was an intern at Yahoo! Research turned into a nice paper about. Best overview of our music-similarity work is in. See earlier work at, and 2006 - 2007 - 2008 - I've been working with several talented students on their research.
This work covers everything from basic research towards understanding the perception of timbre, to machine-learning approaches to recognizing chords, to tracking song popularity. Demodulation is inherently an ill-posed problem so we solve it with an optimization approach. Greg Sell's papers and code are available at: built the world's best chord recognizer.
The is best, and earlier work is from,, and. Timbre Perception Work I've also been fortunate to work with many talented faculty and students at the Telluride every summer in Telluride Colorado. I wrote a column for a speech-recognition newsletter about our work on auditory attention in Telluride: Nima Mesgarani on s for speech discrimination David Anderson and Sourabh Ravindran at Gatech: and Low-power classification at ICASSP2004 Auditory Modeling There is now a new version of the Auditory Toolbox. Evolution By Bergstrom And Dugatkin Pdf To Excel there. It contains functions to implement many different kinds of auditory models.
The toolbox includes code for Lyon's passive longwave model, Patterson's gammatone filterbank, Meddis' hair cell model, Seneff's auditory model, correlograms and several common representations from the speech-recognition world (including MFCC, LPC and spectrograms). This code has been tested on Macintosh, Windows, and Unix machines using Matlab 5.2. Note: This toolbox was originally published as Apple Computer Technical Report #45. The old technical report ( PDF and ) and old code ( and ) are available for historical reasons.
My primary scientific goal is to understand how our brains perceive sound. My role in this research area is a modeler, I build models that explain the neurophysiological and psychoacoustic data. Hopefully these models will help other researchers understand the mechanisms involved and result in better experiments.
My latest work in this area is titled 'Connecting Correlograms to Neurophysiology and Psychoacoustics' and was presented at the in Grantham England from 1-6 August, 1997. Two correlograms, one computed using autocorrelation and other other computed using AIM, are shown on the left.
The information in most auditory models flows exclusively bottom-up, yet there is increasing evidence that a great deluge of information is flowing down from the cortex. A paper I wrote for the is called 'A Critique of Pure Audition'. Top 10 Software Downloads 2013. This paper has been greatly refined and is published in the book in 1998 by Erlbaum. The figure at the left shows the spectrogram of sine-wave speech. I have written several papers describing how to convert auditory representations into sounds.
I have built models of the cochlea and central auditory processing, which I hope both explain auditory processing and will allow us to build auditory sound separation tools. These papers describe the process of converting sounds into cochleagrams and correlograms, and then converting these representations back into sounds.
Unlike the printed versions of this work, the web page includes audio file examples. It includes better spectrogram inversion techniques, a description of how to invert Lyon's passive cochlear model, and a description of correlogram inversion. This material was first presented as part of the Proceedings of the ATR Workshop on 'A Biological Framework for Speech Perception and Production' published in September 1994. A more refined version of this paper was an invited talk at the. The image on the left shows the spectrogram of one channel of cochlear output; one step in the correlogram inversion process. Pattern Playback is the term used by Frank Cooper to describe his successful efforts to paint spectrogram on plastic and then convert them into sound. I wrote of Pattern Playback techniques, from Frank Cooper's efforts to my own efforts with auditory model inversion, in a paper which was published at the 1995 IEEE International Conference on Systems, Man, and Cybernetics.
My paper is titled 'Pattern Playback from 1950 to 1995'. The image at the left shows a portion of one of Cooper's spectrograms. The following are publications during my time at Apple.
The Mathematica notebooks are designed to be self-documenting and in each case the postscript and PDF files are also available. Those files that are Matlab toolboxes include source and documentation All these files are available with the gracious permission of Apple. 'Auditory Model Inversion for Sound Separation' is the first paper to describe correlogram inversion techniques. We also discuss improved methods for inverting spectrograms and a cochlear model designed by Richard F. This paper was published at ICASSP '94.
'A Perceptual Pitch Detector' is a paper that describes a model of human pitch perception. It is similar to work done by Meddis and Hewitt and published in JASA, but this paper has more real-world examples.
This paper was published at ICASSP '90. 'On the importance of time' is an invited chapter by Dick Lyon and myself in the book (edited by Martin Cooke, Steve Beet and Malcolm Crawford, John Wiley & Sons). This tutorial describes the reason that we think time-domain processing is important when modeling the cochlea and higher-level processing. 'Lyon's Cochlear Model' is a Mathematica notebook that describes an implementation of simple (but efficient) cochlear model designed by Richard F. It is also known as Apple Technical Report #13. A software package called MacEar implements the latest version of Lyon's Cochlear Model. MacEar is written in very portable C for Unix and Macintosh computers.
This link points to the last published version (2.2). (Note the README file included has old program results. The names of the output files have changed and there are a couple of extra channels being output. I'm sorry for the confusion.) Gammatone Math is a Mathematica notebook that describes a new more efficient implementation of the Gammatone filters that are often used to implement critical band models. It is also known as Apple Technical Report #35. Apple Hearing Demo Reel was published as Apple Technical Report #25. It includes more than one hour of correlogram videos, including a large fraction of the ASA Auditory Demonstration CD.
I have a limited number of NTSC copies left. Send email to to request a copy. Signal Processing I recently finished some nice work establishing a linear operator connecting the audio and video of a speaker.
A paper describing this work has been accepted for presentation at the NIPS'2000 conference. Chris Bregler, Michele Covell, and I developed a technique we call Video Rewrite to automatically synthesize video of talking heads. This technology is cool because we use a purely data driven approach (concatenative triphone video synthesis) to create new video of a person speaking. Given new audio, we concatenate the best sequence of lip images and morph them into a background sequence. We can automatically create sequences like the Kennedy and Johnson scenes in the movie 'Forrest Gump.' We studied how adults convey affective messages to infants using prosody.
We did not attempt to recognize the words, let alone to distill more nebulous concepts such as satire or irony. We analyzed speech with low-level acoustic features and discriminated approval, attentional bids, and prohibitions from adults speaking to their infants. We built automatic classifiers to create a system, Baby Ears, that performs the task that comes so naturally to infants. The image on the left shows one of the decision surfaces which classifies approval, attention and prohibition utterances on the basis of their pitch. We wrote a more detailed article describing this work for the journal We can't post that article, but I can send you a copy if you send me email. Send for a copy of journal article.
I was able to help Michele Covell do some neat work on time-compression of audio. Lots of people know how to compress a speech utterance by a constant amount. But if you want to do better, which parts of the speech signal can be compressed the most? This paper describes a good technique and shows how to test the resulting comprehension. Eric Scheirer and I worked on a system for discriminating between speech and music in an audio signal. This paper describes a large number of features, how they can be combined into a statistical framework, and the resulting performance on discriminating signals found on radio stations.
The results are better then anybody else's results. (That comparison is not necessarily valid since there are no common testing databases. We did work hard to make our test set representative.) This paper was published at the 1997 ICASSP in Munich. The image on the left shows clouds of our data. Work we've done to morph between two sounds is described in a paper at the 1996 ICASSP. This work is new because it extends previous audio morphing work to include inharmonic sounds. This paper uses results from Auditory Scene Analysis to represent, match, warp, and then interpolate between two sounds.
The image on the left shows the smooth spectrogram, one of two independent representations used when morphing audio signals. I wrote an article describing my experiences writing 'intelligent' signal processing documents. My Mathematica notebook 'Lyon's Cochlear Model' was the first large document written with Mathematica. While I don't use Mathematica as much as I used to, I still believe that intelligent documents are a good way to publish scientific results. These ideas were also published in a book titled 'Knowledge Based Signal Processing' that was published by Prentice Hall. Software Publications I have written Matlab m-functions that read and write QuickTime movies. The WriteQTMovie code is more general than previous solutions for creating movies in Matlab.
It runs on any platform that Matlab runs on. It also lets you add sound to the movie. The ReadQTMovie code reads and parses JPEG compressed moves. And I coded an implementation of an image processing technique known as snakes. There are two m-files that implement a type of dynamic contour following popular in computer vision.
First proposed by Kass, Witkin and Terzopoulos in 1987, snakes are a variational technique to find the best contour that aligns with an image. The basic routine, snake.m, aligns a sequence of points along a contour to the maximum of an array or image. Provide it with an image, a set of starting points, limits on the search space and it returns a new set of points that better align with the image. The second m-file is a demonstration script. Using your own array of image data, or a built-in default, a demo window is displayed where you can click to indicate points and see the snake program in action.
And his colleagues wrote a nice and provided some I added a Graphical User Interface (GUI) so I could play with all the options and put lots of data through it. With the GUI, you select points with the mouse. After you tell it what kind of distance metric you want, you get several plots showing the results. The links at the right show a number of points separated by a fourth order polynomial. Michele Covell and I wrote some Matlab code to compute multi-dimensional scaling (MDS). MDS allows you to reconstruct an estimate of the position of points, given just relative distance data. These routines do both metric (where you know distances) and non-metric (where you just now the order of distances) data.
Apple Publications The SoundAndImage toolbox is a collection of Matlab tools to make it easier to work with sounds and images. On the Macintosh, tools are provided to record and playback sounds through the sound system, and to copy images to and from the scrapbook. For both Macintosh and Unix system, routines are provided to read and write many common sound formats (including AIFF).
Only 68k MEX files are included. Users on other machines will need to recompile the software. This toolbox is published as Apple Computer Technical Report #61. Filter Design is a Mathematica notebook that describes (and implements) many IIR filter design techniques. It was published as Apple Technical Report #34.
I created a Hypercard stack to make it easier for people with a Macintosh and CDROM drive to interact with the Acoustical Society of America's This CD is a wonderful collection of auditory effects and principles. The ASA Demo Hypercard stack includes the text and figures from the book and lets you browse the Audio CD. I wrote a program for the Macintosh 660/AV and 840/AV computers that uses the DSP (AT&T3210) to monitor audio levels. VUMeters runs on any Macintosh with the AT&T DSP chip. Source and binaries are included.
Bill Stafford and I wrote TCPPlay to allow us to play sounds from a Unix machine over the network to the Macintosh on our desks. This archive includes Macintosh and Unix source code and the Macintosh application. There are other network audio solutions, but this works well on the Macintosh. Previous Publications In a past life, I worked on medical imaging. A book on tomographic imaging (cross-sectional x-ray imaging) was published by IEEE Press: Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging, (New York: IEEE Press, c1988).
The software used to generate many of the tomographic images in this book is available. The parallel beam reconstruction on the left was generated with the commands gen n=100 k=100 if=lib.d.s filt n=100 k=100 back n=100 k=100 disn min=1.0 max=1.05 The book is now online. Download the or order the book from ) Code to implement the diffraction tomography algorithms in my PhD Thesis is also available.
Carl Crawford, Mani Azimi and I wrote a simple Unix plotting package called qplot. Both two-dimensional and 3d-surface plots are supported. Now obsolete code to implement a DITroff previewer under SunView is available.
This program was called suntroff and is an ancestor of the X Window System Troff previewer. It was written while I was an employee of Schlumberger Palo Alto Research. All files are compressed Unix TAR files. Other Research Pointers I organize the Stanford CCRMA Hearing Seminar. Just about any topic related to auditory perception is considered fair game at the seminar. An archive of seminar announcements can be found at or at as a chronological listing of email announcements. Send email to if you would like to be added to the mailing list.
For more Information I can be reached at The best way to reach me is to send email. This page last updated on September 3, 2012. Malcolm Slaney ().