Page 1 of 1: DivX ;-) Repair
DivX ;-)
Repair
Written
by Candela
Official Site - please refer to this site for the latest version
Version 1.0, August 2000
1.
Introduction
After many hours of tedious downloading, and
finally sitting down to watch your movie some of you may have noticed one of
the following things:
·
Suddenly the image froze but the sound kept
playing. (example video)
·
Disoriented or equally coloured blocks of
pixels distorted the image for a short time. (example screenshot)
You probably didn't pay too much attention to
the latter, but in the first case you had to get out of your lazy chair and
fast-forward a bit to make the movie play again. If it happened more then once,
you probably deleted the file while wishing the ripper of the movie a one way
ticket to hell. But were these 'bad frames', as they are generally called,
really the ripper's fault? The answer, in most cases, is quite simply NO. They
may be in fact your own fault. In this document I'm going to talk about the
cause of these errors and how to solve them. It is based on my own experience
and therefore may not be 100% accurate, but I'm always open to suggestions and
improvements. I also do not claim I have invented the method for repairing. It
is in fact an obvious thing to do but since I have yet to find someone already
using it or a specific program for the task, the idea apparently hasn't crossed
anyone's mind. Finally I like to mention that even though I'm going to focus on
movies, it will actually work on any type of file.
You can contact me at the
following email address divx_repair@hotmail.com. Or you can try to
find me on IRC (EFNET or DALNET) where I'm known as Candela.
2.
The cause
2.1.
Some history
When I first saw these errors, I thought
something had gone wrong during encoding and that the ripper hadn't bothered to
check the movie before releasing it on the Internet. I didn't really give it
anymore attention until one day I was talking to someone on IRC. It turned out
that he had the same movie as me but without the image freezing. On top of
that, the filesize of his movie was exactly the same as mine, which proved that
it was in fact the same rip and not a different version made by someone else.
After playing my version on different systems to make sure my computer wasn't
to blame, there was only one possibility left: the file I had was corrupt.
After some further investigation this turned out to be true and I was able to
replace the corrupt data (only 2046 bytes to be exact) to successfully repair
my movie.
2.2.
Who's to blame?
Why or
how does data in the movies get corrupt? In an ideal world it shouldn't happen,
but then again in an ideal world I would have lots of money, buy all movies on
DVD and wouldn't even bother to write this document. When downloading from the
Internet, and especially when resuming broken downloads, things apparently can
go wrong. Here is an extract from the GetRight help:
Rollback XX K on resumed connections: Because some data may have been corrupted when GetRight was
disconnected, this allows you to backup a little bit and reget a small amount
of data to be sure that no errors are in the file on your computer. It is suggested that this be the about
number of kilobytes (K) downloaded in 2 seconds, which will depend on the
speed of your Internet connection.
For most modems, the default of 4K is fine. |
As far as
I know, GetRight is the only program that performs this kind of 'rollback' and
I have yet to encounter an IRC or FTP client with the same
functionality. This means that every other program you resume with can corrupt
your files. And there are probably other sources of errors too. Don't despair
though, these errors are rare and there are good ways to prevent them. But the
errors in the movies show that they do happen.
The
following is already common practice and everyone should be doing it to
transfer large files over the Internet. But as said before, the world is not
perfect and some people never learn.
·
Firstly,
do NOT download huge files like movies in one part. Split them in smaller parts
first. That way, you don't have to download the entire file again when
something goes wrong. Splitting can be done by a multitude of programs (e.g. WinRAR).
·
Secondly,
always verify the integrity of the files and download corrupted ones again. The
verification can either be done internally by the splitter itself (e.g.
WinRAR), by a CRC checker (e.g. WIN-SFV32) or by both to be absolutely sure.
If you
are already familiar with all of this you will have no trouble understanding
how to repair the movies as it is very similar.
3.
The cure
3.1. The basics
Let us
first get some things straight. This is NOT a 'press one button and your movie
is fixed' type of solution. It requires some effort, spare time and a bit of
common sense. Even if you think you do not meet these simple requirements, I
advise you to read on and decide afterwards.
The
method is based on the following assumptions (if these are not correct for you
don't even bother sending any comments):
·
The error
is caused by corrupt data and good versions of the movie are available, hence
encoding errors cannot be fixed. Also files where data has not been altered but
inserted or removed1 cannot be repaired as this method is unable to
resynchronise after an error.
·
You do
not want to download the entire movie again to fix the errors, only the small
amount of data that causes the error.
·
You
want to restore the movie in its original state i.e. the state it was in when
it was encoded. There are other and easier possibilities to remove the errors
(e.g. cutting out the bad frames in an editor like VirtualDub) but these 'mutilate' the file and I find this bad
practice. Therefore I will not explain how to do it and neither will I answer
questions about it. And it would be best if you did not redistribute these kind
of 'fixed' movies
Since you
want to replace the corrupt data it's obvious you have to get hold of the good
data. So the first thing you have to do is find someone who has the same movie
without the error. When you have found someone the only thing left is ask him
to send the good data. But how do you know where the file is corrupt and which
bytes to copy? Most of you probably know you can find the differences between 2
files by comparing them with programs like a hex editor. Unfortunately this requires both files to be on
your hard disk which makes this useless here, as you don't have the error-free
movie.
Luckily, there exists such thing as a CRC which stands for Cyclic Redundancy
Code. The theoretical background of this is not important here. All you need to
know is that this number (often 32-bit) is calculated based on the content of a
file. If a single byte in the file changes, the CRC will also change. In order
to find out if a file is different from another one you only need to compare
the CRC2 of both files. This only allows you to see if the file is
different though, not what the differences are or where they are located. To
circumvent this, you can split3 the files in smaller pieces and
compare the CRC of these files. Then you will have good approximation4
of the location of the error (e.g. in part 5 of 20). Only the bad parts have to
be downloaded and replaced. Finally you can merge the pieces back together and
your file is repaired.
note1: Until now I have only encountered
corrupt files were data was altered so probably inserted or deleted data almost
never occurs.
note2: There is a slight possibility 2 different
files of equal size will have the same CRC but the odds are practically zero
and can be neglected.
note3: Even though they are perfectly
suitable for error prevention, compression programs like RAR cannot be used as
a file splitter here (even in store mode). This is because they alter the data
of the file they split depending on which options you set. A program that
merely copies the data into different pieces is needed here.
note4: The only way to find the error
exactly is to do a byte by byte comparison of both files which, as I said
before, is not possible.
3.2. Tools of the trade
Assuming
you already have a player for your movies you are going
to be needing 2 other Windows1 programs that can be downloaded from
the Internet, Topsplit and WIN-SFV32.
They are free, small in size, require no installation after unzipping and are very easy to use. You are not obligated to
use these specific programs. There are other programs which do exactly the same
thing (and are 'compatible'), but these are the best in my opinion. Topsplit,
as the name suggests, splits large files into smaller pieces. WIN-SFV32 is a
CRC calculator and validator. It uses .SFV files, which are plain text, to
store the CRC values.
note1: I apologise to users of other OS.
I'm sure you can find similar programs yourselves.
3.3. Step by step , day by d… (oops almost got carried away
there for a moment)
Before
going any further I would like to emphasise the importance of having a BACKUP
copy of your movie. I don't want to be held responsible if you delete or mess
up your movie. If you follow my directions closely and read section
4
attentively nothing should go wrong. But you never know because life just isn't
fair. Also read the entire document before even thinking about trying anything
and never ever do something you don't fully understand.
Step 1.
The first
part is also the hardest. In order to know where your movie is corrupt and to
get hold of replacement data, you need access to an error-free copy of the
movie. This means you have to search for someone that has it and is willing to
help you. IRC is a good place to start. Once you have found a nice person (like
myself ;) make sure you don't have different versions of the movie. The perfect
way to check is to compare the filesizes in BYTES1. If they are
exactly the same, you can be almost 100% sure it's the same rip. Additional
checks include the resolution of the picture, framerate, bitrate, sound
quality, etc. These can also be used when comparing incompletely downloaded2
movies, where you obviously are not able to compare filesizes. All this
information can easily be obtained in Windows Explorer (Select file · Right click · Properties · General & Details
tab).
note1: Do not compare sizes expressed in
KB or MB! These are only approximations of the real size (converted and
rounded).
note2: You can also use this method as a
safe way to finish incompletely downloaded movies because resuming might
corrupt your file. And here's a nice thing to know: you can play incomplete
movies in VirtualDub or by looking at the Preview
tab visible in the above screenshots.
Step 2.
Next you
both have to split the movie into smaller parts. First thing you need is enough
free hard disk space to hold the movie. Then create a directory were you will
put the files. Now start up Topsplit and select the movie (Source File Information · Select Source File) and the output directory (Output File Information · Select Output Folder). Decide on a split size to use (I recommend 2.000.000 bytes, read section 4
for further information) and configure Topsplit accordingly (Split Size · Change · By Size).
Make sure the output filename is exactly the same for both of you to avoid some
minor inconvenience later. Change it if necessary (Miscellaneous · Miscell · Change Split Name).
Split the
movie (Start Process) and it will
finish in a couple of minutes depending on your PC configuration (650MB on CD
takes about 12 minutes on my lousy P133 with 6x CD-drive).
When it's
done and everything went ok you'll find a lot of files in the output directory
named1 .001, .002, .003, … with a size of 2.000.000 bytes.
note1: You may also find a .BAT file in
the same dir. You can just ignore it or you can prevent it from being created (Miscellaneous · Setting · Split Setting · Batch File Setting · Automatically
Generate Merge Batch File). It is used to join the parts again (with the DOS copy command) when
you do not have Topsplit installed.
Step 3.
Next you
have to find out were the corrupt data is. One of you will have to create a SFV
table. Start WIN-SFV32, select the directory with the split files and press Next.
Then
select the files, choose Create table
and press Next again.
Wait for
it to finish (about 5 minutes here) and send the .SFV file that was created to
the other person. He will use this .SFV file to compare his files with yours.
To do this, start WIN-SFV32, select the correct directory and press Next. Point to the .SFV file, select the
files and choose Verify files instead
of Create table (make sure Delete failed is unchecked if you have
the good movie).
It will
start comparing and files that are good will get a green square but when the CRC is
different (i.e. the file contains corrupt data) the square will turn red.
note:
I mentioned the filename of both movies had to be the same. If they are
not, all files will get a blue
square during verifying because they cannot be found. You can edit the .SFV
file with a text editor and do a search & replace on the filenames to
match yours. Another possibility is to rename all your files or to split again
with the correct name.
Step 4.
As you
have probably guessed by now, you will only have to download the files where
CRC check failed (red
square). Often the errors are small and you'll need only a couple of files.
Once you've downloaded the good files replace your own with them. Now it's time
to join1 the parts (Merge tab)
so fire up Topsplit again and select the first of the split files (Source File Information · Select Source File). Also select an output folder (Output
File Information · Select Output Folder) and an output filename (Merge File List · Miscell · Change Merge Name).
When it's
finished you'll find a completely repaired movie on your hard disk. That wasn't
so hard was it? ENJOY!
note1: You can let Topsplit delete the
files as they are joined together (Setting
· Merge Setting · Delete Each split
file after merge).
This is very usefull if you do not have enough free space to hold yet another
copy of the movie. However, if the files have their read-only attribute set (e.g. when splitting from CD) Topsplit can't
delete them. You can turn this attribute off in Windows Explorer (Select files · Right click · Properties · General tab · Attributes · Read-only).
I have
decided to put some warnings and remarks in a new chapter instead of including
them in the previous one. This is a very important section in my opinion so be
sure to read it very thoroughly. It will deal about the rare occasion were both
people have a corrupt movie, but the errors are at different locations in the
file. As you may have guessed, this allows you to fix both movies but it could
also be the reason why your movie is still corrupt after repairing albeit in a
different place. I will also give some comments about the splitsize. I'll say
it again: read this section thoroughly! Don't come crying to me when you screw
up your movie because you didn't think it was necessary to listen.
Up until
now you didn't have to pay any special attention to the time where the errors
occurred in the movie. You only needed it to check if the other movie was ok at
that particular moment. When you replaced all your files that had a different
CRC your movie was fixed.
But let's
suppose the following situation: movie A is corrupt around time [0:15:10]
(hours:minutes:seconds) and movie B has a problem around [1:10:05]. If you
repaired movie A with movie B by merely replacing the files with a different
CRC, you would get an exact copy of movie B with an error around time
[1:10:05]. This is off course not what you want. In this case it is still easy
to solve because there is almost 1 hour between the errors. If the CRC of files
.057 and .265 doesn't match you can be sure that .057 contains the error at
[0:15:10] and that file .265 represents the error at [1:10:05]. To fix movie A
you copy file .057 from movie B and to fix movie B you copy file .265 from
movie A
But if
the errors are too close to each other (a few seconds apart) they might be in
the same file .057. That means that part .057 would contain both good AND bad
data and you would be unable to use it to fix either movie. One possibility is
to use a smaller splitsize so the errors get separated into different files. Or
you could split up file .057 into smaller files and repeat the process. It
could get even more complicated but save you some trouble in these cases and
find someone that has an error-free version.
Another
problem arises with non-visible errors. If only a few bytes in the file are
corrupt it is very likely these errors don't show up when watching the movie.
However, they will result in a CRC mismatch but since you cannot see the
errors, you will be unable to determine which one of you has the correct data.
In that case, don't fix anything unless your movie has other visible errors.
Then it is more likely that your movie is corrupt in several other places but
you can never be sure off course.
The main
problem in cases where errors are hard to spot is that you are unable to link
the time in the movie to the byte position in the file because the video uses a
variable bitrate. If you get a CRC mismatch of file .075 you cannot determine
at what time in the movie the error should be (or the other way around). So you
can't go watch the movie meticulously at a particular time to find out if there
are any distortions in the image. I have not found a program that gives me this
information. The only thing possible at the moment is to make a rough guess.
The best I think you can do is the following. Suppose a movie is 650MB and
lasts 1,5 hours. Let S be the size of the file in bytes and L the length of the
movie in seconds. The error is at time Te and byte offset Oe.
S=681.574.400
bytes
L=5.400
seconds
The rate
in bytes/seconds is then: R=S/L=126.217 bytes/seconds
Let's
look at 2 cases where either the time or byte offset is known:
Te=1800 seconds Oe=? |
Oe=200.000.000 bytes Te=? |
Estimation
of the byte offset Oe=Te*R=227.191.467
bytes |
Estimation
of the time Te=Oe/R=1.585 seconds |
Remember
these are only rough estimations and they can be quite different from the real
values, but they should give you an idea where to look.
I'm going
to conclude by saying something about the splitsize. The size of 2.000.000
bytes has been chosen as a weighed average of several factors. Firstly files of
this size can be downloaded fairly quickly, even at slower speeds. Using the
above approximations they contain about 15 seconds of video, so errors can be
close in time and still be separated in the files. You get around 340 files per
CD, which is an acceptable amount because directories with many files are slow
to handle. And finally, since errors are usually only a few KB in size, the
bigger the files the more superfluous data you have to download. Therefore I
have decided to use this size, but you are free off course to use another.
5.
Conclusions
5.1. Why repair movies?
The
motivation for this is mainly personal. Maybe you don't mind the errors or you
feel it is too much work to repair and rather download the movie again on your
super-fast T3 connection. This is fine by me but let me give you some reasons
why some people would prefer to repair the movies:
·
You
don't want to waste a CDR on a corrupt movie. Sometimes there is more then one
error and they can last several seconds (I even had a movie with so much
corrupt data it refused to play).
·
Downloading
650MB takes time no matter how fast your connection is. Often, repairing will
take less time, especially if you have 56K modem and spent 3 days of non-stop
downloading to get your movie :-)).
·
If
your provider only allows you to generate a certain amount of traffic (e.g.
2GB/week), you probably do not want to sacrifice 650MB to fix a small error.
·
You
got the movie from a ratio server (or traded for it) and you have to upload
another 650MB to download it again.
·
Most
people are more likely to send you 2MB then 650MB.
·
The
movie was already corrupt on the server, which means downloading will have no
effect.
·
You
risk getting new errors by downloading again.
·
You
might want send the movie to other people and don't want to give them a corrupt
movie.
·
It's a
bad habit to cause unnecessary traffic on the Internet, just like in real life.
Think about other people for a change :-p.
·
Etc.
5.2. The reason for this document
Very simple,
to become rich and famous. No seriously, up until now I have had to fix 10% of
the movies I have downloaded. This is quite a lot and I hope you have been more
fortunate then me. Almost all of them were already corrupt on the server so it
wasn't my own fault and means there are a lot of people out there experiencing
the same problems.
Now I
thought the time was right to share my findings with the rest of the world.
Mainly because I don't think the method can be optimised even further without
someone writing a program (any volunteers?).
Another
reason is to get those corrupt movies out of circulation. I'm sick of them and
I know I am not alone.
And
finally, I wanted to make people aware of current problems and show them how
they can be avoided in the future (see section
2.3.). Even though
this document describes a way to fix the errors, it's always better to prevent
a disease then to find a cure afterwards.
5.3. What's next?
Should
you decide to go out and try to fix your corrupt movies you will soon discover
that finding someone with a good version is hard and takes a lot of time. The
repairing itself can be finished in as little as half an hour. But it would a
whole lot easier if people decided to work together. I was thinking of maybe
creating an IRC channel or ICQ Activelist where you could come
and ask for help. Keeping an archive of .SFV files for error-free movies would
also come in handy. People would just have to download it, verify their movie
and ask for the parts they need. If anyone has any other good ideas or would
like to help, please contact me.
5.4. Credits
I would
like to thank all the people on IRC who took the time to help me fix my movies.
You know who you are.
Congratulations
to all the authors of some of my favourite programs mentioned and/or linked to
in this text. Keep up the good work (and sorry but I can't afford to register
:).
BIG
thanks off course to all the guys of the Microsoft Corporation
involved in the development of their wonderful MPEG-4 codec,
and also to the hackers who made DivX ;-) possible.
To all
the people that laughed and tried to make me look like an idiot when I proposed
my method on Usenet, I would like to say FUCK
YOU! You know who you are too.
I also
like to thank the people that proofread this document for their comments and
suggestions.
And
finally I'd like to say hello to all my friends both on- and offline.