SVN Importer is dead slow

rauar
Posts: 8
Joined: Sat Jun 17, 2006 7:50 am

SVN Importer is dead slow

Postby rauar » Sat Jun 17, 2006 8:11 am

Hi,

I need to convert a large CVS Repository to Subversion. As the source repository is really big (several hundreds MB) I first tried migrating parts of the repository and everything seems ok. However I noticed that the SVN Importer is really slow. By slow I mean the console/log file shows an (file = /tmp/cvs.tempdir/....... ) entry in 1 to 3 seconds. I know that there's some conversion work to be done under the hood. However what makes me really wonder is that the CPU usage is between 0.0% and 1.0%. Unix "Top" even "tops" all running processes when sorting for cpu time.

Ok, if the cpu is not the bottleneck, I checked the network traffic. However at this point SVN Importer seems like it has already checked out the complete CVS repository to the cvs.temp dir and NO network traffic occurs at all.

Is this a really bad internal synchronization issue in the code of SVN Importer or the CVS client code from netbeans.org ?

PS: After 15 hours of running the importer it has created already a 1 Gig-sized dump. It keeps running. But that I suppose that it could be MUCH, MUCH, MUCH more faster :(

Cheerz, Alex

dobisekm
Posts: 118
Joined: Wed Mar 23, 2005 3:29 pm
Location: Prague, Czech Republic

Postby dobisekm » Thu Jun 29, 2006 11:19 am

Hi Alex,

from your description I am not completely sure about which import phase do you speak. Could you please past some part of the log file showing the lazy entries (maybe also with some context).

Michal

rauar
Posts: 8
Joined: Sat Jun 17, 2006 7:50 am

Postby rauar » Sun Jul 02, 2006 11:24 am

dobisekm wrote:Hi Alex,

from your description I am not completely sure about which import phase do you speak. Could you please past some part of the log file showing the lazy entries (maybe also with some context).

Michal


Hi Michal,

just for info: we've converted the project with cvs2svn as svnimporter crashed with an out of memory exception after 36 hours. The conversion with cvs2svn (including the complete history, branches, tags) took 4 minutes to create an svn dump.

You were asking about the phase I was talking about. I don't know which phases exists at all but I'm aware that the conversion process can be done in two "phases": export from cvs and import into svn. I've been using the export functionality only.

here are the most relevant parts of my config.properties:

Code: Select all

srcprovider=cvs
import_dump_into_svn=no
clear_svn_parent_dir=no
use_only_last_revision_content=no
use_file_copy=yes
dump.file.sizelimit.mb=0
trunk_path=trunk
branches_path=branches
tags_path=tags
svnimporter_user_name=SvnImporter
only_trunk=no
svnadmin.executable=svnadmin
svnadmin.repository_path=/home/arau/svnimporter/SVN
svnadmin.parent_dir=.
svnadmin.tempdir=/tmp/
svnclient.executable=svn
svnadmin.verbose_exec=yes
cvs.modulename=*
cvs.tempdir=/tmp/


For me it looks like the cvs backend is the bottleneck. I even profiled svnimporter. 2/3 of the profiled time were in Thread.sleep() in the cvs backend.

I tried the conversion on 2 different platforms (Linux, Windows) and inside the network and via vpn. Absolutely no difference regarding duration and success of the conversion.

Here's a small snippet of the log:


Code: Select all

03:45:47,476 [main] DEBUG CvsProvider:80 - Checkout "src/com/xc/webapp/tiledata/CashingChartTileData.java" rev.1.2
03:45:52,901 [main] DEBUG CvsProvider:80 - Checkout "src/com/xc/webapp/servlet/ChartManagementXMLServlet.java" rev.1.5
03:45:58,662 [main] DEBUG CvsProvider:80 - Checkout "src/com/xc/webapp/utils/WebAppCommonUtils.java" rev.1.15
03:46:04,632 [main] DEBUG CvsProvider:80 - Checkout "src/com/xc/webapp/utils/WebAppCommonUtils.java" rev.1.16
03:46:10,217 [main] DEBUG CvsProvider:80 - Checkout "src/com/xc/webapp/utils/WebAppCommonUtils.java" rev.1.17
03:46:16,085 [main] DEBUG CvsProvider:80 - Checkout "src/com/xc/webapp/tiledata/CommonTreeTileData.java" rev.1.3
03:46:21,408 [main] DEBUG CvsProvider:80 - Checkout "src/com/xc/webapp/tiledata/DefaultTileData.java" rev.1.3
03:46:28,195 [main] DEBUG CvsProvider:80 - Checkout "src/com/xc/webapp/utils/WebAppCommonUtils.java" rev.1.18




Regards

Alex

dobisekm
Posts: 118
Joined: Wed Mar 23, 2005 3:29 pm
Location: Prague, Czech Republic

Postby dobisekm » Tue Jul 04, 2006 8:27 am

Hi Alex,

thanks for the log snippet. We will have a look on this.

Best,

Michal

joewilliams
Posts: 10
Joined: Tue Jul 25, 2006 11:59 pm

Postby joewilliams » Mon Aug 07, 2006 7:13 am

Michal,

So what’s your verdict?

Cheers joe

dobisekm
Posts: 118
Joined: Wed Mar 23, 2005 3:29 pm
Location: Prague, Czech Republic

Postby dobisekm » Mon Aug 07, 2006 9:12 am

Hi Joe,

sorry, I completely forgot about this thread. Actually - what we did - after observing the same as you - we created a new SVS provider, which is working in the same way as cvs2svn, which gives performance boost. It's available in trunk.

Michal

joewilliams
Posts: 10
Joined: Tue Jul 25, 2006 11:59 pm

Postby joewilliams » Tue Aug 08, 2006 1:50 am

Thanks Michal for your response..

Wanted to know if this fix resolves only cvs2svn or this improves performance on cc2svn as well. Last week I was converting cc2svn and it took about 6 days and dumper about 35gb of dump file before running out of disk space . is there a way I can speed up the process?

Thanks in advance

Cheers Joe.

dobisekm
Posts: 118
Joined: Wed Mar 23, 2005 3:29 pm
Location: Prague, Czech Republic

Postby dobisekm » Tue Aug 08, 2006 7:30 am

Hi Joe,

this fix is CVS specific - it adds an extra CVS provider. I don't have an experience with CC mysels, so I can't advise much, but if you have a feeling, that the dumped data do not correspond to the reality (e.g. you have 200 MB repository and get 35GB dump), then it's possible that there is some bug. I would suggest to check the log file produced by the importer - it writes which files it examines, how many revisions are there, etc. so that you can get a quite good idea, whether everything goes well (especially watch for warnings/errors in the log file) - some simple errors (e.g. parsing dates in incorrect format) can cause even a strange results.

I hope this helps,

Michal

joewilliams
Posts: 10
Joined: Tue Jul 25, 2006 11:59 pm

Postby joewilliams » Tue Aug 08, 2006 8:17 am

Thanks Michal,

May be what you say is true.. but for example i have a test project in ClearCase and the view size is 45.8 KB but the dump in about 1.22 MB and I don’t know where the rest is coming from .. this one is with no error.. so 35gb of dump for 430 MB view size could be an error ?...hummmm...not sure as the log has not recorded any error…

09:20:52,057 [main] INFO Model:84 - Summary:
09:20:52,057 [main] INFO Model:84 - Files: 4613
09:20:52,057 [main] INFO Model:84 - Revisions: 47538
09:20:52,057 [main] INFO Model:84 - Commits: 45053
Is this worth 35+GB don’t know…

Now i don’t know how to find the real VOB size as clearcase views are not the actual size of the VOB. So the extra data must be coming form the database that the CC is associated with… so that’s the next challenge... any way thanks…

Cheers Joe

dobisekm
Posts: 118
Joined: Wed Mar 23, 2005 3:29 pm
Location: Prague, Czech Republic

Postby dobisekm » Tue Aug 08, 2006 8:26 am

Hi Joe,

I am not sure, what a CC view is, but sizes are like this:

1) The actual size of current version of the files
2) The actual size of the repository (including history) on disk
3) The dump file size

1) < 2) < 3)

The repository usually uses some delta storage to save changed files - so a small change to your source creates just a little extra data. But the dump file stores full file content for every change, so changing one character in 100k source will increase the dump file by 100k.
So if you have 4600 files summing up to 430 MB, which have 47538 changes done to them, then 35GB could be possible. But it's hard to estimate.

Michal

joewilliams
Posts: 10
Joined: Tue Jul 25, 2006 11:59 pm

Postby joewilliams » Tue Aug 08, 2006 10:24 pm

Thanks Michal for the insight, it really helps…

My log file is about 500MB and I could not see any noticeable errors in it. So guessing the tool was running good. I also noticed that the dump files are sequence numbered for example,

Name Type
full_dump_20060801_093437 Text Document
full_dump_20060801_111130.txt_part1 TXT_PART1File
full_dump_20060801_123517.txt_part2 TXT_PART2File
full_dump_20060801_133837.txt_part3 TXT_PART3File
full_dump_20060801_150046.txt_part4 TXT_PART4File
full_dump_20060801_165110 TXT_PART5File
full_dump_20060801_181703 TXT_PART6File
full_dump_20060801_193906.txt_part7 TXT_PART7File
full_dump_20060801_205712.txt_part8 TXT_PART8File

Do you know why the part 5 and 6 file dumps missing the .txt_part5 and txt_part6 extensions? Is there some sort of an error? Do you think I should rename them or leave it as it is. I am just hoping SVN doesn’t go cranky looking at those two files…

Cheers joe

dobisekm
Posts: 118
Joined: Wed Mar 23, 2005 3:29 pm
Location: Prague, Czech Republic

Postby dobisekm » Thu Aug 10, 2006 7:52 am

Hi Joe,

I don't know, why the file name for some dumps is different. Probably it's some kind of bug. However, SVN does not care about the dump file name, so it should not be a problem (if you load dump files manually - if you let importer load them, then it might get confused by different file names - or maybe not - since it generated them ;-)

Michal


Return to “Polarion SVN Importer (Repository Converter)”

Who is online

Users browsing this forum: No registered users and 3 guests