Ticket #92 (closed task: fixed)

Opened 6 years ago

Last modified 6 years ago

Migrate subversion data

Reported by: chris Owned by: john
Priority: minor Milestone:
Component: Drupal modules & settings Keywords:
Cc: jim, ed, john Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 4.0

Description

Currently the code is in this repo https://svn.webarch.net/transition/

We want to insert it into the new repo here: https://tech.transitionnetwork.org/svn/

We should create top level directories so that we can host different projects in the one repo.

The https://svn.webarch.net/transition/ repo can remain available as an archive -- we should probably *not* migrate all the history across?

Change History

comment:1 Changed 6 years ago by chris

  • Cc jim, ed added
  • Estimated Number of Hours set to 0.0
  • Billable? unset

Migrating the code from the existing subversion repo to the trac one was mentioned in the meeting yesterday, this is the ticket to track it on. The trac usernames / passwords are the same as for the new trac linked repo.

comment:2 Changed 6 years ago by chris

  • Cc john added

How is this for a plan:

  1. Copy the live site to a tmp directory.
  1. Use grep to workout which directories and files came from the https://svn.webarch.net/transition repo and delete everything else and delete all the .svn directories
  1. Commit what is left to some directory in in the new repo https://tech.transitionnetwork.org/svn/

comment:3 follow-up: ↓ 4 Changed 6 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 2.0
  • Total Hours changed from 0 to 2.0
  • Billable? set

How is this for a plan:

  1. Copy the live site to a tmp directory.

I have done this:

cp -a /web/transitionnetwork.org/www /home/chris/svn-import
  1. Use grep to workout which directories and files came from the https://svn.webarch.net/transition repo and delete everything else and delete all the .svn directories

I have written a script which generates a list of the local files which are not present on the repo at https://svn.webarch.net/transition/code/branches/DEV and this file, files.delete, can be run as a script to delete all these files:

#!/bin/bash

# list files here
find ./ | sed -e 's/^\.\///' | sort > files.here

# list remote files
svn list -R https://svn.webarch.net/transition/code/branches/DEV | sed -e 's/\/$//' | sort > files.there

# get the list of files not in the svn repo for deletion
comm -23 <(sort files.here | uniq) <(sort files.there | uniq) | sed -e 's/^/rm "/' | sed -e 's/$/"/' > files.delete
  1. Commit what is left to some directory in in the new repo https://tech.transitionnetwork.org/svn/

Which directory should this be, https://tech.transitionnetwork.org/svn/www/trunk perhaps (using www for the Drupal web project so we can use other top level directories for other future projects)? I'll use this unless anyone has some better suggestions in the next hour, in any case it's easy enough to move things around if anyone wants another repo layout.

What we would then have in the trac svn repo is all the files for the live site minus their history, but the history will still be available in the old repo.

Once that is done we then need to make the live and dev and test sites use the new repo. I suggest we do this by deleting all the .svn directories, doing a fresh checkout of the new code to a tmp directory and then rsyncing the checkout over the web sites -- this will result in the new .svn directories being in the right places.

I will test this first on the dev server first.

comment:4 in reply to: ↑ 3 Changed 6 years ago by john

Replying to chris:

How is this for a plan:

  1. Copy the live site to a tmp directory.

I have done this:

cp -a /web/transitionnetwork.org/www /home/chris/svn-import
  1. Use grep to workout which directories and files came from the https://svn.webarch.net/transition repo and delete everything else and delete all the .svn directories

I am a bit puzzled here and may be misunderstanding. But if you are copying the docroot to a temp location and then removing stuff that is not in the repo on the DEV branch. Why not just check it out from the repo to begin with and use that as your starting point ?

Note also that svn export will also "checkout" all the files without the .svn directories, for instances where you just want the literal code base.

I have written a script which generates a list of the local files which are not present on the repo at https://svn.webarch.net/transition/code/branches/DEV and this file, files.delete, can be run as a script to delete all these files:

#!/bin/bash

# list files here
find ./ | sed -e 's/^\.\///' | sort > files.here

# list remote files
svn list -R https://svn.webarch.net/transition/code/branches/DEV | sed -e 's/\/$//' | sort > files.there

# get the list of files not in the svn repo for deletion
comm -23 <(sort files.here | uniq) <(sort files.there | uniq) | sed -e 's/^/rm "/' | sed -e 's/$/"/' > files.delete
  1. Commit what is left to some directory in in the new repo https://tech.transitionnetwork.org/svn/

Which directory should this be, https://tech.transitionnetwork.org/svn/www/trunk perhaps (using www for the Drupal web project so we can use other top level directories for other future projects)? I'll use this unless anyone has some better suggestions in the next hour, in any case it's easy enough to move things around if anyone wants another repo layout.

What we would then have in the trac svn repo is all the files for the live site minus their history, but the history will still be available in the old repo.

Once that is done we then need to make the live and dev and test sites use the new repo. I suggest we do this by deleting all the .svn directories, doing a fresh checkout of the new code to a tmp directory and then rsyncing the checkout over the web sites -- this will result in the new .svn directories being in the right places.

I will test this first on the dev server first.

I have no idea exactly what state the repo is currently in ie whether trunk reflects live or DEV branch reflects DEV. But on a general point if you use the repo as a starting point and then get things back in line from that start point, you should not lose the history (assuming you want to keep the history) and you should then be able to import the repo directly to the TRAC svn location with the history intact.

comment:5 follow-up: ↓ 6 Changed 6 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 2.0 to 2.5

I am a bit puzzled here and may be misunderstanding. But if you are copying the docroot to a temp location and then removing stuff that is not in the repo on the DEV branch. Why not just check it out from the repo to begin with and use that as your starting point ?

Sure, it turns out that this would have been fine -- I wanted to be sure that the live site didn't have any files that were not in the repo.

The only differences I found was that the repo didn't have the /favicon.ico file in the DocumentRoot and the live site didn't have this file:

https://svn.webarch.net/transition/code/branches/DEV/sites/all/modules/panels/plugins/cache/simple.inc

I guess it got removed by accident when clearing out the cache files.

Note also that svn export will also "checkout" all the files without the .svn directories, for instances where you just want the literal code base.

Right.

I have no idea exactly what state the repo is currently in ie whether trunk reflects live or DEV branch reflects DEV.

The live site is currently running the code from https://svn.webarch.net/transition/code/branches/DEV/

But on a general point if you use the repo as a starting point and then get things back in line from that start point, you should not lose the history (assuming you want to keep the history) and you should then be able to import the repo directly to the TRAC svn location with the history intact.

Sure, if we do want to keep the history (do we?) we could use svndumpfilter, http://svnbook.red-bean.com/en/1.5/svn.ref.svndumpfilter.html to just put the files under https://svn.webarch.net/transition/code/branches/DEV/ into the new repo, but there is a gotcha to this that I suspect might apply to us:

copied paths can give you some trouble. Subversion supports copy operations in the repository, where a new path is created by copying some already existing path. It is possible that at some point in the lifetime of your repository, you might have copied a file or directory from some location that svndumpfilter is excluding, to a location that it is including. In order to make the dump data self-sufficient, svndumpfilter needs to still show the addition of the new path—including the contents of any files created by the copy—and not represent that addition as a copy from a source that won't exist in your filtered dump data stream. But because the Subversion repository dump format only shows what was changed in each revision, the contents of the copy source might not be readily available. If you suspect that you have any copies of this sort in your repository, you might want to rethink your set of included/excluded paths, perhaps including the paths that served as sources of your troublesome copy operations, too.

http://chestofbooks.com/computers/revision-control/subversion-svn/Filtering-Repository-History-Reposadmin-Maint-Filtering.html

Do we think we will need the history in the new repo? Without it the repo will be a lot smaller, currently a dump of it is 364M...

comment:6 in reply to: ↑ 5 Changed 6 years ago by john

Replying to chris:

I am a bit puzzled here and may be misunderstanding. But if you are copying the docroot to a temp location and then removing stuff that is not in the repo on the DEV branch. Why not just check it out from the repo to begin with and use that as your starting point ?

Sure, it turns out that this would have been fine -- I wanted to be sure that the live site didn't have any files that were not in the repo.

The only differences I found was that the repo didn't have the /favicon.ico file in the DocumentRoot and the live site didn't have this file:

https://svn.webarch.net/transition/code/branches/DEV/sites/all/modules/panels/plugins/cache/simple.inc

I guess it got removed by accident when clearing out the cache files.

Note also that svn export will also "checkout" all the files without the .svn directories, for instances where you just want the literal code base.

Right.

I have no idea exactly what state the repo is currently in ie whether trunk reflects live or DEV branch reflects DEV.

The live site is currently running the code from https://svn.webarch.net/transition/code/branches/DEV/

But on a general point if you use the repo as a starting point and then get things back in line from that start point, you should not lose the history (assuming you want to keep the history) and you should then be able to import the repo directly to the TRAC svn location with the history intact.

Sure, if we do want to keep the history (do we?) we could use svndumpfilter, http://svnbook.red-bean.com/en/1.5/svn.ref.svndumpfilter.html to just put the files under https://svn.webarch.net/transition/code/branches/DEV/ into the new repo, but there is a gotcha to this that I suspect might apply to us:

copied paths can give you some trouble. Subversion supports copy operations in the repository, where a new path is created by copying some already existing path. It is possible that at some point in the lifetime of your repository, you might have copied a file or directory from some location that svndumpfilter is excluding, to a location that it is including. In order to make the dump data self-sufficient, svndumpfilter needs to still show the addition of the new path—including the contents of any files created by the copy—and not represent that addition as a copy from a source that won't exist in your filtered dump data stream. But because the Subversion repository dump format only shows what was changed in each revision, the contents of the copy source might not be readily available. If you suspect that you have any copies of this sort in your repository, you might want to rethink your set of included/excluded paths, perhaps including the paths that served as sources of your troublesome copy operations, too.

http://chestofbooks.com/computers/revision-control/subversion-svn/Filtering-Repository-History-Reposadmin-Maint-Filtering.html

Do we think we will need the history in the new repo? Without it the repo will be a lot smaller, currently a dump of it is 364M...

yep, i think we can live without the history (personally), wasn't sure if you wanted to keep it

comment:7 Changed 6 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 2.5 to 3.0

yep, i think we can live without the history (personally), wasn't sure if you wanted to keep it

OK, the dev site: http://dev.transitionnetwork.org.webarch.net/ is now running off the trunk in the new repo, https://tech.transitionnetwork.org/svn/www/trunk/ and this repo can also now be browsed via trac, https://tech.transitionnetwork.org/trac/browser

This all seems to be OK so I'll also switch the test and live sites to use the new repo.

comment:8 Changed 6 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Status changed from new to closed
  • Resolution set to fixed
  • Total Hours changed from 3.0 to 4.0

OK, all done, the live, test and dev sites are all now running off the new trunk, which can be browsed here: browser:www/trunk

Also backupninja is now set up to backup the repo onto the local backup server and then this is synced to an off-site backup server, this means the data will be on 6 physical disks (due to RAID), on 3 different machines and in addition backupninja is set to retain 60 days worth of data so we can roll back to the version of the data for any day in the last two months.

I'm closing this ticket as all the tasks in it, as it was originally formulated, are complete.

However there is still the issue of documenting how the new repo will be used, the plan being to use svn switch rather than svn merge.

PS The username/passwords for the new repo match the trac ones.

comment:9 Changed 6 years ago by jim

I've created a wiki page for CodeManagementReleaseProcess - not ready quite yet will complete ASAP.

Front page of wiki updated, too.

comment:10 Changed 6 years ago by chris

I've created a wiki page for CodeManagementReleaseProcess

I've added a comment to it, wiki:CodeManagementReleaseProcess?version=2#CommentfromChris

comment:11 Changed 6 years ago by jim

Typos/tweaks aside, this is done... Leaving this dead issue alone now.

See wiki:CodeManagementReleaseProcess

Note: See TracTickets for help on using tickets.