Git Development: Dealing with CVS $id$ tags

Git Development: Dealing with CVS $id$ tags

by Dan Poltawski -
Number of replies: 31
Sorry for starting yet another git thread..

I just went through an infuriating time with Petr, cursing ide's not accepting my patches when we discovered a patch I had created was not applying because of the mismatch between the unexpanded $id$ tag and the expanded one from a cvs checkout.

What I am surprised about is that i've never really heard others come across this problem before today (though Tim actually mentioned it earlier).

As I have never heard about this before I am wondering if I have missed some way to prevent these conflicts with git tools, because as I see it, this will effect any git user going near commited $id$ tags when sharing their changes for others to review or even trying to commit without review using git-cvsexportcommit. Is there a solution to this i'm missing?

Also on a related note - the $id$ tags tend to get used lots round here and people find them useful, is there a way to do this sort of thing in git with hooks? Its another thing to consider for a migration plan..
Average of ratings: -
In reply to Dan Poltawski

Re: Git Development: Dealing with CVS $id$ tags

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
I think a low tech but simple solution is that every php file in Moodle core should start

<?php // $Id$

(where $Id$ is probably expanded), followed by the large NOTICE OF COPYRIGHT boilerplate. E.g. http://cvs.moodle.org/moodle/admin/filters.php?view=markup

If that is the case, diffs never go anywhere near the $Id$ tags, and there is no problem. The mess I got into was in patches where I was adding that boilerplate to files that were missing it. In future, if I start changing near the start of a file without the standard start, I am going to add the boilerplate in CVS, and wait for it to come through in a git pull.

(By the way, in the git repostiory, all the $Id$ tags are collapsed to $Id$, so they never change and you can ignore them - until you try a git cvsexportcommit.

If we do anything, I would suggest and enhancement to cvsexportcommit, to collapse all $Id$ tags in the CVS workspace before trying to apply the patch - or try it if the patch faile to apply the first time.


In reply to Tim Hunt

Re: Git Development: Dealing with CVS $id$ tags

by Martín Langhoff -
Imagine for a second... if we used a DB that decided that the id column of the table is so important that its value needs to be embedded in the data that goes in the text fields. Hmmm, no, the id is the id, and the text field is the text field. You shall not change the text we put in the text field, not one bit.

Following the same rule... the version control tool is supposed to keep the files you've given it... without changing them one bit!

We should kill kill kill $id$ tags. They are a serious chunk of misdesign in CVS. It is good and valid to have metadata about your files... but not inside of them! smile
In reply to Dan Poltawski

Re: Git Development: Dealing with CVS $id$ tags

by Martín Langhoff -
> What I am surprised about is that i've never really heard others come across this problem before today (though Tim actually mentioned it earlier).

If you use cvsexport commit, which works with cvs itself, it won't give you trouble. My guess is that patch has some magic to auto-ignore expanded $Id$ tags and their ilk.

I am surprised that his IDE doesn't have that magic, but then again, those tags are deeply evil.

We can get that data from git's metadata trivially, and if we have a user that says "I have this file, dunno where it came from" we can instantly know what file & version it is, or what file it resembles the most.

But polluting the file with external metadata. Ugh!
In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Dan Poltawski -

> If you use cvsexport commit, which works with cvs itself, it won't give you trouble. My guess is that patch has some magic to auto-ignore expanded $Id$ tags and their ilk.


Actually I think Tim came up with this issue on git-cvsexportcommit too - but as I just posted below the fix is just to checkout cvs with -kk.
In reply to Dan Poltawski

Re: Git Development: Dealing with CVS $id$ tags

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Hello,

just spent a few hours trying to work out so solution suitable for all people, unfortunately failed pretty badly. Here is a short summary.

At present
  • All files in cvs must have -kkv flag.
  • All patches submitted into cvs should use -kkv.
So if you just checkout your code without any tweaking of command line cvs or IDE you will get expanded form $Id: readme_moodle.txt,v 1.16 2008/12/09 22:48:59 skodak Exp $


Problems
  • People using git often submitting patches using -kk ($Id$), Netbeans and Eclipse can not apply these to -kkv checkouts.
  • -kkv patches do not apply cleanly to later versions because they lack the -kkv patching magic which is available in patch binary. Unfortunately patching from command line while using IDEs confuses both Netbeans and Eclipse.

The solution proposed by Dan suggests we start using -kk on development machines and start submitting -kk patches. Eloy suggested several tricks how to fool IDEs into thinking that the files are -kk on cvs server, unfortunately none of them are usable.


So it means we either switch to --kk everyhere or stay with -kkv, mixing of these two is going to cause a lot of trouble for anybody using IDE sad
In reply to Petr Skoda

Re: Git Development: Dealing with CVS $id$ tags

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
OK, so we've been discussing this in the dev chat.

Positives to expanded keywords are:
a) You get a quick idea of who edited this last.
b) Users downstream can easily tell you what version they are using.

Negatives are:
a) git doesn't support them
b) IDEs have trouble with them
c) some developers hate metadata embedded in code

There seems to be four solutions:

1) Add a GPL boilerplate to every file in one huge commit so that the keywords never get caught up in future diffs. This is something we should do anyway.

2) Switch the main CVS server to use -kk by default (unexpanded), but switch the read-only CVS mirrors to use -kkv. This means developers won't be bothered and users still have the infoon their sites. All developers would need to check their settings.

3) Remove keywords from HEAD, but add them back to the packages using an evil script on download.moodle.org.

4) Just abandon keywords entirely. This is an easy search-replace.

Make your choice!
In reply to Martin Dougiamas

Re: Git Development: Dealing with CVS $id$ tags

by Martín Langhoff -
*Blush*, I don't particularly hate them, but with my SCM developer hat on, it's just... wrong (see my example earlier about messing up database content). Sorry.

Some notes:

- Doing a CVS-wide -kk, you have to be mindful of only doing it in text files. Binary files should be -kb, if we apply -kk to binary files, we'll corrupt them.

- I think the safest thing to do is to set -kb on all files. This will _retain_ the currently expanded tags, and from that point onwards respect what we commit, literally. At that point we can remove the Id tags.

- Note that $Id tags are not the only tags we carry. There's a few $Log tags hiding in Moodle that will need manual fixups after the search-and-replace.

Edit: Like Carly Simon sings, this song ain't about me. From the poll (as of now) it seems everyone wants to get rid of them.

And: I'm also ok with adding a bit of top-of-file boilerplate wink We need a choice module that can do condorcet voting.
In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
I think the vote currently just shows that people who don't need keywords don't want them. smile

Binary files were already being treated as -kb (we explicitly specify a long list by file extension)

I'm not sure about setting everything to -kb all the time. It makes sense to me but are there any gotchas? Any other opinions?
In reply to Martin Dougiamas

Re: Git Development: Dealing with CVS $id$ tags

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
Well, it also shows that you haven't found many people who do need them. :>

I quite like the 'get rid of them in the source, but have a script that puts them back for the read-only download cvs mirrors' approach - and this is something that could be continued even after a switch to git. The data that's put in could either be the current information, or something similar (eg date/time the file was last changed) that would allow it to be found in source control.

If you're in that position of adding things with scripts that aren't in the official source control, you could also put a different standardised version tag at the top of 'official' downloads - so that instead of having the file-specific data it has 'Moodle 1.9.5 release version' in every single file (which can then be tracked in source control using the tag, obviously).

And it would even be possible for this script to include some kind of hash of the file contents (after the version) so that an admin report could be written which would list any file that has been modified from its claimed Moodle version...! If you really really want to know what version of a file somebody has and if it's 'true'...

--sam
In reply to sam marshall

Re: Git Development: Dealing with CVS $id$ tags

by Nicolas Martignoni -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators

Well, it also shows that you haven't found many people who do need them. :>

Or maybe that only followers of dev chat are aware of the vote grand sourire

In reply to Nicolas Martignoni

Re: Git Development: Dealing with CVS $id$ tags

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
I know being in dev chat gives me an unfair advantage, but actually in this case, I didn't notice the vote in dev chat - only from the forum subscription. Which is open to everyone. smile
In reply to Martin Dougiamas

Re: Git Development: Dealing with CVS $id$ tags

by Nicolas Martignoni -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators

> I think the vote currently just shows that people who don't need keywords don't want them.

I happen to think just like you, Martin, as I use them quite frequently to quickly note the version of the file I'm modifying.

In reply to Nicolas Martignoni

Re: Git Development: Dealing with CVS $id$ tags

by Martín Langhoff -
MD makes a good point, so I am interested in understanding the use cases where people actually use the (nasty! evil! smack!) Id tags.

So, Nico, how and why do you use them? My guess is that you are using CVS checkouts or daily/weekly builds? Can you tell us more? I want to make sure that if/when we switch to Id-less usage, people have usable alternatives.

--

In my (limited) experience with this, there are two main cases to cater for.

For users that use CVS checkouts, doing a cvs status on the file is the first option, or if they are using a (fancy, rocket-propelled) IDE, the IDE has that information, probably even in the same corner of the window that shows the filename.

For users that use daily/weekly builds, an option is to add a small "version identifier" to the build process -- I am thinking of a hash that describes the state of the whole tree. That would be trivial to implement -- a bit of extra code in the daily build scripts. With that "tree identity" version in place, if you grab a random daily build and change it, you can then find again *which* daily build it was and diff it.

--

MD - what are the daily builds written in - bash? any of the P* scripting languages?
In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Nicolas Martignoni -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators

Martín,

I use them actually to deal with the fr and fr_ca translations.

Here is an example: When some dev deletes a string (which should not happen for long standing strings smile, but is ok if the string was added during dev process, then deleted for some reason, never reaching in STABLE branches), quickly looking at the date (from the expansion of $Id$) of my commit for the translated string helps me to decide if I should delete the string from the translation or just inform the dev of a translated string that shouldn't have been deleted.

I certainly could search through the commit history I have through the mailing-list or search in cvs.moodle.org, but consulting the simple date is really really quicker.

In reply to Nicolas Martignoni

Re: Git Development: Dealing with CVS $id$ tags

by Martín Langhoff -
Do I understand you right that the strategy you are describing only helps if that particular string is in the last commit you made the translated files?


It seems like a quick check of the 'annotate' of the file is what you need. This commandline:

cvs annotate lang/fr/moodle.php | grep 'string..login'

will give you the right answer to "when was this string last changed in the french translation", regardless of whether it is your last commit or not... There are similar tricks in the various IDEs.
In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Annotate does not tell you who removed a line when!
In reply to Tim Hunt

Re: Git Development: Dealing with CVS $id$ tags

by Martín Langhoff -
I think he wants to know if the line ever got added in the translations.

But in any case, even annotate isn't the best tool, because it'll only show the latest change, and as you say, only if it's an addition.

If you want to know all the added/changed/removed lines matching a particular regex, then git's pickaxe is your friend smile
Average of ratings: Useful (1)
In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Iñaki Arenaza -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Your mention of 'git pickaxe' made me have a look at it, just to find that the original command is gone and it's now an option (-S) to 'git log' (-S, --pickaxe-all and --pickaxe-regex). HTH.

Saludos. Iñaki.

Average of ratings: Useful (1)
In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Nicolas Martignoni -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators

Thanks for your suggestion (didn't know the "cvs annotate" command).

Anyway, in the example given (I use to use the $Id$ tag in other circumstances), the problem is that sometimes, it's not the last commit. It could even be several weeks away. Don't know if annotate can help in this case, but if you've another solution than the $Id$ one, I'll switch without problem: I'm not dogmatic about $Id$ tag wink

In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Penny Leach -
I voted to get rid of them as I think it's completely daft to embed metadata into the file that you can get from the vcs!

But I just had a thought (sorry if it's already been raised) - don't we use them in the script that generates the unmerged files list, as an easy way to find the culprit?

Not advocating keeping them for this reason, just pointing out this script would need to be smarter smile
In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Peter Lai -
The one thing I like about CVS is the use of $Id$ tags. Mostly because it fits the the design paradigm where you are versioning individual files, not entire trees, directories, or changesets. The more modern SCMs operate under the concept of changeset=tree version, or a more project-oriented approach. But *not* having to 'scm status' or 'scm log' just to show me who committed the last change to the only file I care about at the time is a timesaver.
In reply to Martin Dougiamas

Re: Git Development: Dealing with CVS $id$ tags

by Anthony Borrow -
Picture of Core developers Picture of Plugin developers Picture of Testers
OK, I'm coming into this conversation late but here is why I find having the version number in the file. I not infrequently am asked to look at a particular file - say a patched file - on someone else's site. They are not using CVS, they just downloaded a version and started hacking. They cannot tell me when they downloaded it. They may have added something like the DragMath code to it and who knows what else they may have done. I'm looking at the file or they send the file and need to figure out which version it is. For me, having the version number in the file has always been very helpful when I am going into a situation somewhat blind (and this is often the case). For that reason, if we can somehow keep the version numbers of each file in there it would really help. At least then I can take a diff of the file compared to a particular version. Peace - Anthony
In reply to Anthony Borrow

Re: Git Development: Dealing with CVS $id$ tags

by Martín Langhoff -

Hi Anthony,

It is useful, and Git has a mechanism that is often used for exactly that. We call it "git describe" -- it gives you a unique name for the tree. And what people do is use that identifier in the script that grabs the daily snapshots. And people use it also to make the name of the snapshot file.

For example, in Moodle, we would add it into our top-level version.php. It is a unique identifier that...

  • If it matches a tag (the case of a release) -- then it is the name of the tag.

  • Otherwise, it counts patches since that tag, and adds a unique identifier for the commit.

For example, you would have

  • For a release: moodle-v1.9.8.tar.gz with version.php defining $gitver='v1.9.8'

  • For a snapshot: moodle-v1.9.8-23-gf5b223a.tar.gz and version.php defining $gitver='v1.9.8-23-gf5b223a'

The first case tells you it's straight from a release. The second case tells you it's a snapshot, 23 commits after v1.9.8, and it tells you the exact ID of the commit.

Remove the leading 'g' and you get f5b223a . Tell git

git checkout f5b223a

and you can peruse the exact contents of that tree. Want to make a temp branch (locally) to work on that problem?

git checkout -b tmpbranch f5b223a

Want to see the history?

gitk f5b223a

Note: the "patches since the tag" count is to help us humans visualise. Given the distributed version of git, many different branches could have a 23rd patch on top of v1.9.8 .

The real info is in the commit identifier.

(For example, I am running git v1.6.3.1-26-gf5b223a )

In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Anthony Borrow -
Picture of Core developers Picture of Plugin developers Picture of Testers
Martin - Thanks, I'm not that familiar with git yet and initially I realize it seems like a lot to wrap my head around. My concern is that a lot of folks don't use CVS because it is too complicated so I just want to make sure that I'll have a way of figuring out what version they are using. If I were installing the version for them that would be one thing but often I am using ssh to go in and look at their version of a particular file with no indication of when they downloaded it, what the name of the tar.gz or zip file was, whether they used git or cvs, etc. I am just using vi to see a particular file where they are having trouble and I have never had to worry about how they installed it and what patches they made to figure out the base of that file. I knew that any changes they made were based off of whatever version was indicated in the expanded $id$ and then I could create a diff file. It seems that a lot of this presumes one knew what and how things were installed on a site. That is a luxury I am seldom afforded. I am presuming that many of the people I have will not have done a git checkout. Peace - Anthony
In reply to Anthony Borrow

Re: Git Development: Dealing with CVS $id$ tags

by Martín Langhoff -
Sure. But if moodle is using git, their version.php file will be a strong hint of what snapshot they have there.

And _you_ will have a git checkout where you can run git-findmatch on a file that may be have been added from a random source...

CVS would probably be still running in readonly mode if you need to check about olden files...
In reply to Anthony Borrow

Re: Git Development: Dealing with CVS $id$ tags

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
A feel sure that git should be able to work out which version of a file you have got, even if it is slightly modified, just from the file contents.

After much struggling, here is my best attempt:

(I downloaded moodlelib.php from http://cvs.moodle.org/contrib/patches/activity_locking/lib/ and put it in my lib folder as a test.)

Then, in that folder, I ran the following command line:


git log --format=format:'echo %H; git diff -b --shortstat %H moodlelib.php; echo' MOODLE_19_STABLE moodlelib.php | sh


What that does is works back through the history of moodlelib.php (git log ... moodlelib.php).

It uses the --format option of git log to generate another git command for each version in history: git diff -b --shortstat ... moodlelib.php. That counts then number of lines of different between two version of the file, ignoring whitespace changes.

Then we pipe that to sh, to actually execute the commands we have constructed. The extra echos output the commit name, and an blank line.


The output starts looking like:
503b8b480a39ef4817dd2abf77dba0ff476ddce5
1 files changed, 2198 insertions(+), 4506 deletions(-)

e92e4b8ab0ad9f1c8c89c4de0fc02190fa196410
1 files changed, 2198 insertions(+), 4500 deletions(-)

47e32d2955229f205743a0a84de951095fb1cd49
1 files changed, 2198 insertions(+), 4500 deletions(-)

...

and the numbers gradually count down (at the rate of several versions per second) until you get to

...

8ee1b0afabbeb4b18b0c699e826d6e110b4ac7b6
1 files changed, 98 insertions(+), 55 deletions(-)

373da1d463846a48392cae937013e36daf8ad46b
1 files changed, 90 insertions(+), 59 deletions(-)

b78fbdac9df4db1504731e534256a7148f3b5e62
1 files changed, 90 insertions(+), 19 deletions(-)

3a61ccbe0a07c65d2671f3b25ccea3ea8719ca25
1 files changed, 95 insertions(+), 20 deletions(-)

a40803130b5ce31f466efef6cbeb2141a3f292db
1 files changed, 108 insertions(+), 21 deletions(-)

...


So the minimum point there, b78fbdac9df4 is probably the revision that they started editing.
In reply to Tim Hunt

Re: Git Development: Dealing with CVS $id$ tags

by Martín Langhoff -
Good one! If people are interested, I'll be happy to write a git-findmatch based on a similar principle.

That will work for older releases and people using oddball setups with random files grabbed from random places.

For releases of a project that is using git, there is no reason to not include the version identifier (tag for releases, commit id for snapshots) somewhere.
In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Anthony Borrow -
Picture of Core developers Picture of Plugin developers Picture of Testers
Thanks Tim and Martín - I will take your word for it that such things will be discernible. I just wanted to explain how I typically used the current version numbers that were there but there is no guarantee that they are accurate. In some ways it sounds like Tim's approach may actually get me closer to the actual version in the case of someone working with a patch in which the $Id$ tag was not updated but subsequent CVS commits to the file were patched into the patched file in an attempt to keep the file up to date. I appreciate you both taking the time to reassure me that despite the oddball situations that I walk into that there is a way of sorting things out. Peace - Anthony
In reply to Martín Langhoff

Re: Git Development: Dealing with CVS $id$ tags

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Well, I started by going to the documentation expecting to find a command like git-findmatch already there! It seems so obvious that I was disappointed not to find it. If you could write a git-findmatch, that would be very cool. I don't think my Perl/shell/C hacking is good enough for me to try. (If you do, there are some options from git blame that it would be worth stealing, like --contents, and -L.)

And I was going to make the point about being able to diagnose files that people had done weird manual hacking on, but Anthony and Hubert have made that point for me.

Also, I completely agree with you about using git describe in the main version.php, or somewhere like that. Actually, we could start doing that now, based on the read-only git mirror: MDLSITE-911 if you want to vote. (What do we read into the fact that got assigned bug number 911?)
In reply to Anthony Borrow

Re: Git Development: Dealing with CVS $id$ tags

by Hubert Chathi -
One problem with the CVS $id$ tag is that it is specific to a repository. If I copy the code from Moodle, and then stick it in another CVS repository, and then someone copies it from my CVS repository, the $id$ tag will contain the version number from my own repository, and will be meaningless, if not misleading, to someone who wants to compare with the main Moodle repository.
In reply to Dan Poltawski

Re: Git Development: Dealing with CVS $id$ tags

by Dan Poltawski -
OK, so we just talked more about this and the solution is simple:

* Use the -kk option when updating CVS for the checkout to not expand the $id$ tags - the same as git.

(Think that used to be the default behaviour which is why it might not have come up before...)