Move the private Puppet repository to puppetserver1001
Open, HighPublic

Description

Currently people commit to the private repo on puppetmaster1001. We should move this to puppetserver1001.

Cergen certs can only be generated on puppetmaster1001, but most use cases have been moved to cfssl already (T357750), but that doesn't need to block the switch of the private repo; in the unlikely case that we still need to update a cergen-issued cert, we can still run the commands on puppetmaster1001 and then copy them to puppetserver1001 for commiting.

Event Timeline

After the change gets made, kerberos_kadmin_keytabs_repo in hieradata/common.yaml needs to be adapted to point to puppetserver1001

And prior to the migration, puppetserver1001 needs to be allowed in profile::tcpircbot

Change #1050601 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] TESTING ONLY - profile::puppetserver::git: add an option to exclude servers

https://gerrit.wikimedia.org/r/1050601

Change #1050607 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::puppetserver: skip puppet-merge

https://gerrit.wikimedia.org/r/1050607

Change #1050601 merged by Elukey:

[operations/puppet@production] profile::puppetserver::git: add an option to exclude servers

https://gerrit.wikimedia.org/r/1050601

Reporting a summary of various chats with Moritz:

  • On puppetmasterXXXX (Puppet 5 infra), the authoritative private puppet repo isn't designated via the is_git_master hiera setting, so changes to the private repository can be made via puppetmaster1001 and puppetmaster2001.
  • On puppetmasterXXXX (Puppet 5 infra), puppetmaster::gitclone defines the git repositories to be checked out. In the if $secure_private conditional block there's a case for if $is_git_master which defines a writable copy of the repository. The profile::puppetmaster::frontend class passes is_git_master = true to the puppetmaster class (all the other Puppet 5 servers like puppetmaster1002 use the puppetmaster::backend role which doesn't include it).
  • On Puppet 7 the git repositories to deploy are configured via profile::puppetserver::git and specifically the repos hiera setting, which at the moment also includes the Puppet private repo.

The key takeaway is that committing a puppet private change on puppetserverXXXX should already work out of the box, but we need to test it first.

High level ideas about how to proceed:

  1. First and foremost - do we have a backup of the private repo somewhere?
  2. We should make sure that committing a small change from puppetserverXXXX works fine. Beside checking all the hooks/configs/etc.. and try a small change, I don't see how this could be easily tested. Suggestions welcome.
  3. We should make sure that tools like requestctl will continue to work fine on the new settings (I think so but better safe than sorry).
  4. After announcing the move properly, we should probably deploy specific pre/post-commit hooks on puppetmaster::frontend nodes, maybe that simply report the node "please don't use this host for private, but a puppetserverXXXX instead (preference for 1001 if not down)". Changing the commit hooks could be easy enough and it would guarantee us an easy way to rollback if needed. Ideally we could move the puppetmaster::frontend repos to bare repositories, but I don't think that it is strictly needed now and more difficult to rollback.
  5. After the move, we should update puppet and tcpircbot as Moritz suggested above, plus any other thing that may be needed.

@jhathaway lemme know if what I wrote makes sense!

Change #1050607 abandoned by Elukey:

[operations/puppet@production] role::puppetserver: skip puppet-merge

Reason:

Abandoning this change for the moment

https://gerrit.wikimedia.org/r/1050607

Change #1052261 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] puppetmaster::gitclone: disarm pre-commit and post-commit hooks

https://gerrit.wikimedia.org/r/1052261

Change #1052914 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::puppetserver: deploy the gitpuppet admin group

https://gerrit.wikimedia.org/r/1052914

Change #1052914 abandoned by Elukey:

[operations/puppet@production] role::puppetserver: deploy the gitpuppet admin group

Reason:

Not needed, the issue is more https://gerrit.wikimedia.org/r/c/operations/puppet/+/1015032

https://gerrit.wikimedia.org/r/1052914

While prepping for making a commit on puppetserver1001, I ended up filing a revert: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052934

On puppetmaster1001 we have ops members part of the gitpuppet group, plus git's version is still not forcing the use of safe.directory. On puppetserver1001, git enforces the fact that a user executing git commands on a repo needs to be listed among the owners of the files, otherwise it gets a failure. This is to avoid "traps" on shared environments, but we should use safe.directory for all repos that we manage/deploy in my opinion.

Next steps:

  • Wait until a fix for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052934 is live and git commands work on puppetservers.
  • Change the private repo's README file (that is now outdated) from puppetserver1001 and make sure that it works fine across all other nodes. This is the test to make sure that the infrastructure really works outside the puppetmaster realm.
  • Schedule a switch day, and merge https://gerrit.wikimedia.org/r/1052261 on that day to prevent people from using puppetmaster1001's /srv/private in the future.

Change #1053272 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Add safe directory settings to the prod private repo's git config

https://gerrit.wikimedia.org/r/1053272

Change #1053272 merged by Elukey:

[operations/puppet@production] Add safe directory settings to the prod private repo's git config

https://gerrit.wikimedia.org/r/1053272

Way better now (no more errors when executing git commands):

elukey@puppetserver1001:~$ cat /etc/gitconfig.d/10-mark_puppet_repo__srv_git_private_as_safe.gitconfig
# git::systemconfig for 'mark puppet repo /srv/git/private as safe'
[safe]
directory = /srv/git/private

elukey@puppetserver1001:~$ cd /srv/git/private/

elukey@puppetserver1001:/srv/git/private$ git status
On branch master
nothing to commit, working tree clean

Change #1053616 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::tcpircbot: allow inbound conn from puppetserver nodes

https://gerrit.wikimedia.org/r/1053616

Change #1053619 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::kerveros::kadminserver: allow more nodes in rsync

https://gerrit.wikimedia.org/r/1053619

First test didn't go well:

root@puppetserver1001:/srv/git/private# git add README                                                          
root@puppetserver1001:/srv/git/private# git commit
There is no tracking information for the current branch. 
Please specify which branch you want to merge with.                                                             
See git-pull(1) for details.         

    git pull <remote> <branch>                                                                                  

If you wish to set tracking information for this branch you can do so with:
                                                        
    git branch --set-upstream-to=<remote>/<branch> master

⚠️  Something went wrong!  Maybe you attempted to rewrite history?    
You can't rebase or commit --amend in this repo.  If you tried that,
perform the repair steps under = What NOT to do = in the README.

Mentioned in SAL (#wikimedia-operations) [2024-07-11T08:46:06Z] <elukey> cd /srv/git/private; git reset --hard HEAD^ on puppetserver1001 to remove my last local commit (test before migration of the private repo to puppetserver1001) - T368023

Change #1053623 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::puppetserver::gitprivate: fix post-commit hook

https://gerrit.wikimedia.org/r/1053623