If you have a speech or dictation as an MP3 file, and you want to transcribe it, you have two options. You can either listen to the file and keystroke it yourself, or you can have a speech recognition program do the typing for you. Converting audio speech files to text requires a bit of setting up and preparation. But if the audio file is clear enough, using speech recognition software to transcribe it can save you a lot of time.
Open your audio conversion software.
Import the MP3 file you wish to transcribe.
Save a copy of the file as a WAV file.
Open your speech recognition software.
Create a new dictation source for the WAV file. You might want to give it the name of the person speaking on the file.
Click on the menu option to have the program begin taking dictation from the WAV file.
Review the text that the program creates, correcting any errors. If you're planning on transcribing many MP3 files from the same speaker, you may wish to have the speech recognition program learn your corrections.
Save the text file, or import it into a word processing program.
Nearly all speech recognition programs work with WAV files, not with MP3s. This is why you need to convert your file to a WAV before you can have the speech recognition program transcribe it. Speech recognition programs become more accurate the more they can "learn" the speaking style of an individual speaker. Using them for audio file transcription will work best if you're transcribing many files from the same speaker. Audio quality means a lot to a speech recognition program. Your transcriptions will be more accurate if the MP3 was recorded with high-quality sound, with the speaker's mouth a consistent distance from the microphone and with little or no background noise.
MP3 files are compressed files; WAV files are not. This means the WAV file you create from your MP3 will take much more disk space than the original MP3 did. Make sure you've got plenty of hard drive space before you begin your project. Low-quality audio files and files with multiple speakers, cross talk or background noise will produce very inaccurate transcriptions. In these cases, correcting the text file may take more time than doing your own typing will.