I just went through an infuriating time with Petr, cursing ide's not accepting my patches when we discovered a patch I had created was not applying because of the mismatch between the unexpanded $id$ tag and the expanded one from a cvs checkout.
What I am surprised about is that i've never really heard others come across this problem before today (though Tim actually mentioned it earlier).
As I have never heard about this before I am wondering if I have missed some way to prevent these conflicts with git tools, because as I see it, this will effect any git user going near commited $id$ tags when sharing their changes for others to review or even trying to commit without review using git-cvsexportcommit. Is there a solution to this i'm missing?
Also on a related note - the $id$ tags tend to get used lots round here and people find them useful, is there a way to do this sort of thing in git with hooks? Its another thing to consider for a migration plan..
<?php // $Id$
(where $Id$ is probably expanded), followed by the large NOTICE OF COPYRIGHT boilerplate. E.g. http://cvs.moodle.org/moodle/admin/filters.php?view=markup
If that is the case, diffs never go anywhere near the $Id$ tags, and there is no problem. The mess I got into was in patches where I was adding that boilerplate to files that were missing it. In future, if I start changing near the start of a file without the standard start, I am going to add the boilerplate in CVS, and wait for it to come through in a git pull.
(By the way, in the git repostiory, all the $Id$ tags are collapsed to $Id$, so they never change and you can ignore them - until you try a git cvsexportcommit.
If we do anything, I would suggest and enhancement to cvsexportcommit, to collapse all $Id$ tags in the CVS workspace before trying to apply the patch - or try it if the patch faile to apply the first time.
Following the same rule... the version control tool is supposed to keep the files you've given it... without changing them one bit!
We should kill kill kill $id$ tags. They are a serious chunk of misdesign in CVS. It is good and valid to have metadata about your files... but not inside of them!
If you use cvsexport commit, which works with cvs itself, it won't give you trouble. My guess is that patch has some magic to auto-ignore expanded $Id$ tags and their ilk.
I am surprised that his IDE doesn't have that magic, but then again, those tags are deeply evil.
We can get that data from git's metadata trivially, and if we have a user that says "I have this file, dunno where it came from" we can instantly know what file & version it is, or what file it resembles the most.
But polluting the file with external metadata. Ugh!
> If you use cvsexport commit, which works with cvs itself, it won't give you trouble. My guess is that patch has some magic to auto-ignore expanded $Id$ tags and their ilk.
Actually I think Tim came up with this issue on git-cvsexportcommit too - but as I just posted below the fix is just to checkout cvs with -kk.
just spent a few hours trying to work out so solution suitable for all people, unfortunately failed pretty badly. Here is a short summary.
At present
- All files in cvs must have -kkv flag.
- All patches submitted into cvs should use -kkv.
Problems
- People using git often submitting patches using -kk ($Id$), Netbeans and Eclipse can not apply these to -kkv checkouts.
- -kkv patches do not apply cleanly to later versions because they lack the -kkv patching magic which is available in patch binary. Unfortunately patching from command line while using IDEs confuses both Netbeans and Eclipse.
The solution proposed by Dan suggests we start using -kk on development machines and start submitting -kk patches. Eloy suggested several tricks how to fool IDEs into thinking that the files are -kk on cvs server, unfortunately none of them are usable.
So it means we either switch to --kk everyhere or stay with -kkv, mixing of these two is going to cause a lot of trouble for anybody using IDE
Positives to expanded keywords are:
a) You get a quick idea of who edited this last.
b) Users downstream can easily tell you what version they are using.
Negatives are:
a) git doesn't support them
b) IDEs have trouble with them
c) some developers hate metadata embedded in code
There seems to be four solutions:
1) Add a GPL boilerplate to every file in one huge commit so that the keywords never get caught up in future diffs. This is something we should do anyway.
2) Switch the main CVS server to use -kk by default (unexpanded), but switch the read-only CVS mirrors to use -kkv. This means developers won't be bothered and users still have the infoon their sites. All developers would need to check their settings.
3) Remove keywords from HEAD, but add them back to the packages using an evil script on download.moodle.org.
4) Just abandon keywords entirely. This is an easy search-replace.
Make your choice!
Some notes:
- Doing a CVS-wide -kk, you have to be mindful of only doing it in text files. Binary files should be -kb, if we apply -kk to binary files, we'll corrupt them.
- I think the safest thing to do is to set -kb on all files. This will _retain_ the currently expanded tags, and from that point onwards respect what we commit, literally. At that point we can remove the Id tags.
- Note that $Id tags are not the only tags we carry. There's a few $Log tags hiding in Moodle that will need manual fixups after the search-and-replace.
Edit: Like Carly Simon sings, this song ain't about me. From the poll (as of now) it seems everyone wants to get rid of them.
And: I'm also ok with adding a bit of top-of-file boilerplate
Binary files were already being treated as -kb (we explicitly specify a long list by file extension)
I'm not sure about setting everything to -kb all the time. It makes sense to me but are there any gotchas? Any other opinions?
I quite like the 'get rid of them in the source, but have a script that puts them back for the read-only download cvs mirrors' approach - and this is something that could be continued even after a switch to git. The data that's put in could either be the current information, or something similar (eg date/time the file was last changed) that would allow it to be found in source control.
If you're in that position of adding things with scripts that aren't in the official source control, you could also put a different standardised version tag at the top of 'official' downloads - so that instead of having the file-specific data it has 'Moodle 1.9.5 release version' in every single file (which can then be tracked in source control using the tag, obviously).
And it would even be possible for this script to include some kind of hash of the file contents (after the version) so that an admin report could be written which would list any file that has been modified from its claimed Moodle version...! If you really really want to know what version of a file somebody has and if it's 'true'...
--sam
Well, it also shows that you haven't found many people who do need them. :>
Or maybe that only followers of dev chat are aware of the vote
Re: Git Development: Dealing with CVS $id$ tags
> I think the vote currently just shows that people who don't need keywords don't want them.
I happen to think just like you, Martin, as I use them quite frequently to quickly note the version of the file I'm modifying.
So, Nico, how and why do you use them? My guess is that you are using CVS checkouts or daily/weekly builds? Can you tell us more? I want to make sure that if/when we switch to Id-less usage, people have usable alternatives.
--
In my (limited) experience with this, there are two main cases to cater for.
For users that use CVS checkouts, doing a cvs status on the file is the first option, or if they are using a (fancy, rocket-propelled) IDE, the IDE has that information, probably even in the same corner of the window that shows the filename.
For users that use daily/weekly builds, an option is to add a small "version identifier" to the build process -- I am thinking of a hash that describes the state of the whole tree. That would be trivial to implement -- a bit of extra code in the daily build scripts. With that "tree identity" version in place, if you grab a random daily build and change it, you can then find again *which* daily build it was and diff it.
--
MD - what are the daily builds written in - bash? any of the P* scripting languages?
Martín,
I use them actually to deal with the fr and fr_ca translations.
Here is an example: When some dev deletes a string (which should not happen for long standing strings , but is ok if the string was added during dev process, then deleted for some reason, never reaching in STABLE branches), quickly looking at the date (from the expansion of $Id$) of my commit for the translated string helps me to decide if I should delete the string from the translation or just inform the dev of a translated string that shouldn't have been deleted.
I certainly could search through the commit history I have through the mailing-list or search in cvs.moodle.org, but consulting the simple date is really really quicker.
It seems like a quick check of the 'annotate' of the file is what you need. This commandline:
cvs annotate lang/fr/moodle.php | grep 'string..login'
will give you the right answer to "when was this string last changed in the french translation", regardless of whether it is your last commit or not... There are similar tricks in the various IDEs.
But in any case, even annotate isn't the best tool, because it'll only show the latest change, and as you say, only if it's an addition.
If you want to know all the added/changed/removed lines matching a particular regex, then git's pickaxe is your friend
Your mention of 'git pickaxe' made me have a look at it, just to find that the original command is gone and it's now an option (-S) to 'git log' (-S, --pickaxe-all and --pickaxe-regex). HTH.
Saludos. Iñaki.
Thanks for your suggestion (didn't know the "cvs annotate" command).
Anyway, in the example given (I use to use the $Id$ tag in other circumstances), the problem is that sometimes, it's not the last commit. It could even be several weeks away. Don't know if annotate can help in this case, but if you've another solution than the $Id$ one, I'll switch without problem: I'm not dogmatic about $Id$ tag
But I just had a thought (sorry if it's already been raised) - don't we use them in the script that generates the unmerged files list, as an easy way to find the culprit?
Not advocating keeping them for this reason, just pointing out this script would need to be smarter
Hi Anthony,
It is useful, and Git has a mechanism that is often used for exactly that. We call it "git describe" -- it gives you a unique name for the tree. And what people do is use that identifier in the script that grabs the daily snapshots. And people use it also to make the name of the snapshot file.
For example, in Moodle, we would add it into our top-level version.php. It is a unique identifier that...
If it matches a tag (the case of a release) -- then it is the name of the tag.
Otherwise, it counts patches since that tag, and adds a unique identifier for the commit.
For example, you would have
For a release: moodle-v1.9.8.tar.gz with version.php defining $gitver='v1.9.8'
For a snapshot: moodle-v1.9.8-23-gf5b223a.tar.gz and version.php defining $gitver='v1.9.8-23-gf5b223a'
The first case tells you it's straight from a release. The second case tells you it's a snapshot, 23 commits after v1.9.8, and it tells you the exact ID of the commit.
Remove the leading 'g' and you get f5b223a . Tell git
git checkout f5b223a
and you can peruse the exact contents of that tree. Want to make a temp branch (locally) to work on that problem?
git checkout -b tmpbranch f5b223a
Want to see the history?
gitk f5b223a
Note: the "patches since the tag" count is to help us humans visualise. Given the distributed version of git, many different branches could have a 23rd patch on top of v1.9.8 .
The real info is in the commit identifier.
(For example, I am running git v1.6.3.1-26-gf5b223a )
And _you_ will have a git checkout where you can run git-findmatch on a file that may be have been added from a random source...
CVS would probably be still running in readonly mode if you need to check about olden files...
After much struggling, here is my best attempt:
(I downloaded moodlelib.php from http://cvs.moodle.org/contrib/patches/activity_locking/lib/ and put it in my lib folder as a test.)
Then, in that folder, I ran the following command line:
git log --format=format:'echo %H; git diff -b --shortstat %H moodlelib.php; echo' MOODLE_19_STABLE moodlelib.php | sh
What that does is works back through the history of moodlelib.php (git log ... moodlelib.php).
It uses the --format option of git log to generate another git command for each version in history: git diff -b --shortstat ... moodlelib.php. That counts then number of lines of different between two version of the file, ignoring whitespace changes.
Then we pipe that to sh, to actually execute the commands we have constructed. The extra echos output the commit name, and an blank line.
The output starts looking like:
503b8b480a39ef4817dd2abf77dba0ff476ddce5
1 files changed, 2198 insertions(+), 4506 deletions(-)
e92e4b8ab0ad9f1c8c89c4de0fc02190fa196410
1 files changed, 2198 insertions(+), 4500 deletions(-)
47e32d2955229f205743a0a84de951095fb1cd49
1 files changed, 2198 insertions(+), 4500 deletions(-)
...
and the numbers gradually count down (at the rate of several versions per second) until you get to
...
8ee1b0afabbeb4b18b0c699e826d6e110b4ac7b6
1 files changed, 98 insertions(+), 55 deletions(-)
373da1d463846a48392cae937013e36daf8ad46b
1 files changed, 90 insertions(+), 59 deletions(-)
b78fbdac9df4db1504731e534256a7148f3b5e62
1 files changed, 90 insertions(+), 19 deletions(-)
3a61ccbe0a07c65d2671f3b25ccea3ea8719ca25
1 files changed, 95 insertions(+), 20 deletions(-)
a40803130b5ce31f466efef6cbeb2141a3f292db
1 files changed, 108 insertions(+), 21 deletions(-)
...
So the minimum point there, b78fbdac9df4 is probably the revision that they started editing.
That will work for older releases and people using oddball setups with random files grabbed from random places.
For releases of a project that is using git, there is no reason to not include the version identifier (tag for releases, commit id for snapshots) somewhere.
And I was going to make the point about being able to diagnose files that people had done weird manual hacking on, but Anthony and Hubert have made that point for me.
Also, I completely agree with you about using git describe in the main version.php, or somewhere like that. Actually, we could start doing that now, based on the read-only git mirror: MDLSITE-911 if you want to vote. (What do we read into the fact that got assigned bug number 911?)
* Use the -kk option when updating CVS for the checkout to not expand the $id$ tags - the same as git.
(Think that used to be the default behaviour which is why it might not have come up before...)