General morale
Generally optimistic. As I write this on the 24th of September 2022 things have been progressing well.
2 All the ways of a man are pure in his own eyes,
but the Lord weighs the spirit.
3 Commit your work to the Lord,
and your plans will be established.
Proverbs 16:2-3, ESV UK
What went well?
This month I:
- Implemented a way to test memory leaks in comms,
- Set up datadog with comms,
- Looked into coturn, built it locally, and attempted to resolve a security issue with it, then opted to switch to github.com/pion/turn instead.
Slightly slow this month, largely due to wrestling with third party codebases and seeking to understand them.
What didn’t go so well?
Things were a bit slow this month, but I did do a fair amount of careful thinking about how to deal with memory issues in comms, as well as some basic security concerns.
I’ve realised that likely I face a significant setback, in that in order for comms to do what I want, without me needing to
1) plug a large number of antipatterns + memory leaks,
2) convert it to docker & deploy via kubernetes and
3) likely spend a small fortune on k8s replicas in order to service a relatively tiny amount of requests,
I should strongly consider rewriting it in Golang. I may still try to do the above, but I think it is rational at this point to start investigating a rewrite and implementing some basic functionality in one. So I will start looking into doing these things in parallel.
Another matter which had a mixed outlook was the sunken cost on coturn. I did however make some inroads that should inform some issues I have with compiling protongraph, but at least now I know that I should switch to a different and more modern turn server.
What’s the outlook?
Reasonable.
My basic plan is:
- Start a prototype Golang comms rewrite while chipping away at the technical debt with the existing Nodejs comms,
- Plumb payments from the client through to the organisation service (which has a link to Stripe),
- Swap out Coturn with Pion/Turn.
The road ahead
Per the roadmap, to slightly extemporise upon the points from here:
Near term (from now through to the end of the April 2023 sprint).
- Integrate payments with the UI. Although I’ve established a connection between the backend and the Stripe payment processor gateway service, I still need to integrate this with the UI. This should be moderately straightforward to do now, basically I just need to ensure that I hit the correct routes on the Organisation service from comms invoked via the UI – and facilitate a semi-reasonable UX using control nodes for it. (March)
- Plug memory leaks in comms. It would be good to make a start on this. (February -> April)
- Secure turn server. Swap out coturn with pion/turn. (March)
- Deployment of services. Terraform code for Tokengraph and the Protongraph-Provider. (April)
- Implement basic limits. Make a start on implementing basic limits. (May -> June)
- Improve development environment. Starting to become a hassle, I’ll look towards containerising a few things and improve upon my bash scripts for local development. (April -> May)
- Create several basic generators. Sketch several tpgns for basic testing (forest, city, road) and add these to the palette. (For “city” this won’t use the city generator service, which is a much more ambitious planned undertaking for 2024; one should hopefully be able to get away with a more primitive approach for now, using native Tokengraph features.) (April)
Not so near term (from the May 2023 sprint onwards through to December 2023, i.e. likely Q1 ’23 -> Q2 ’23 in real time).
- Procgen CRUD actions. Can move / rotate / delete procedurally generated object groups. (May -> July)
- Deployment of services. Deployment of everything that remains, leveraging terraform and other techniques. (May -> June)
- Improve avatar / shadow functionality. There are a few bugs associated to this logic, increase test coverage in the client around this feature, improve the readability and correctness of the code, and fix any identified issues. (May -> June)
- Resolve memory leaks in comms. Close out all identified leaks in comms so that one can be 95% confident that all the key ones are plugged. (May -> August)
- Enforce basic limits. Complete the work on implementing basic limits (with different limits depending on free or paid organisation account status). (May -> August)
- UI / UX polish. Continue to chip away at improving the UI within the client. (May -> August)
- Basic test coverage for client. Test coverage for the client in place, and client coverage at 20% overall. n.b. the tech to measure gdscript test coverage leveraging GUT doesn’t currently exist, but it should hopefully by the time I get around to looking into it again. (August -> October)
- Basic test coverage for book-keeping subsystem services. Rudimentary test harnesses in place to focus on controllers within the instance service, the user service, the organisation service, and the payment service respectively. Test coverage hovering at about 10% for each of these services, and ideally pushing past 20%. (July -> September)
- Basic content available for use. Basic content available to users in palette (users should have a reasonably varied range of options to choose from in order to populate their instances). In particular, manually add, wire up and configure a basic selection of canned assets, and try to architect things with an eye for future generalisation and extensibility. (June -> August)
Deferred
- Fix previews in the procgen engine UI. The original alpha implementation of Protongraph had a feature wherein one could preview procedurally generated results, i.e. the output of the datagraph (tpgn file). To fix this without transmitting information over the wire, I will need to figure out how to work around the gatekeeper limitations in macos if applicable, fix compilation of the third party mesh_optimizer library (https://github.com/zeux/meshoptimizer), and introduce a workflow wherein if the standalone application accesses a tpgn file, then it makes default assumptions about the location of assets; alternatively there might be a metadata attribute in the tpgn that describes where to find as relative paths the locations of relevant assets. (Deferred, 2024)
- Instance configurability. (Not complex but not absolutely necessary for a prototype, so kicking this down the road). Try to improve the user experience in an instance, eg by making the size of it configurable (and have this tied to limits for an organisation gated by subscription level). See if a terrain mesh can be set for an instance too maybe? Probably not too complicated but having a degree of basic configurability before creating an instance would be good. (Deferred, 2024)
- Parametrised procgen + improved procgen experience. Probably not a top priority for the initial prototype, I’ll punt this further down the road. Certainly though having configurable generation within the UI of things like cities is definitely something I’m very keen to implement. Under this milestone is introducing the abstraction of a ServiceNode type in tpgn graphs for calling out to separate services (like the City Generator Service based loosely perhaps on this). (Deferred, 2024).
- Recomputation of procgen object group. Can recompute a procedurally generated object group with different parameters (maybe out of scope, to be decided). (Deferred, 2024).
- Procgen placement previews. Mouse over preview of where things will be moved / rotated to. This should be possible by judicious use of collision masks (because I don’t want avatared tokens to be blocked by pending clipboard pastes, if I end up networking the preview view), as well as using ray tracing to detect a collision as to where the token or object group will go – the same way I place these things currently, just not “locked in”. (Deferred, 2024).
- Marketplace. Allow users to upload their own assets / procgen algorithms etc. Maybe support a marketplace. (Deferred, 2025).
Summary
The setback I’m facing should hopefully not slow down things too much, and I should still be able to get to alpha by June 2023 in actual time.
Narrowing things down along the lines of this post, the main pieces of complexity remaining to solve for are these:
- Procgen CRUD actions (~ 20 points)
- Rewriting comms in Golang to make it a robust production service (~ 100 points)
- Payment integration with the UI (~ 10 points)
- Limits (~ 20 points)
One of these (payments) should be more of less done by the end of the next sprint.
150 points should be actionable in theory. Assuming that I end up needing to do another 200 points, and I get through about 20 points per sprint on the lower side of things, that means I will need 10 more sprints to get things to “alpha ready” state. That basically will take me through to the end of the December 2023 sprint.
Taking into account an additional margin of 2 months on top of that, that should see me getting to alpha conservatively by the end of the February 2024 sprint. i.e. another 12 sprints. If I can increase my lead by another 3 months (from the existing lead of 5 months to 8) by June 2023 in actual time, which is potentially doable, I should be alpha prototype ready by the end of June 2023 in actual time.
Anyway we’ll see how it all pans out in the wash.