Posts Tagged ‘django’

Django on GAE using Python NDB

July 19, 2013

Recently in a comment on a post on dba.stackexchange I learned about a service called NDB, which presumably allows one to efficiently model many-to-many entity relationships in the google datastore.  And apparently it is pluggable with django, too, via this:

To use NDB with the Django web framework, add 'google.appengine.ext.ndb.django_middleware.NdbDjangoMiddleware', to the MIDDLEWARE_CLASSES entry in your Django settings.py file.

In terms of getting started with this particular kit, there is a tutorial here which I may investigate at some point.

As an additional remark, there is a comment on this post on stack overflow indicating that I’ve been a bit remiss all along, that intermediate tables are the way to store keys / manage joins if using django nonrel.  Nonetheless I think I’ll look into NDB for my future investigations, as either solution (NDB or django nonrel) requires reworking of models used in a django project, and I’d rather use one that offers automatic caching and batching, which apparently are useful things to have.

Advertisements

Django on GAE – final thoughts for now

May 10, 2013

I came across the following project the other day: http://code.google.com/p/gae-django-cms/ .  It works right out of the box, and seems fully equipped with admin and all sorts of other useful things.  Additionally, it seems quite easy to integrate it with other (simple) django projects, like django-monetise.  Consequently this is probably where I will park my investigations of GAE for now.

Data Structures in SQL vs noSQL – with thoughts to django-nonrel

April 27, 2013

So I’ve been thinking again about the problem of SQL vs noSQL joins.  Basically, for background I am interested in how one could write another python package for the django-nonrel project, such that when a django project asks to do a many-to-many query, running on a nosql backend like MongoDB, the query is sent to that package.  Consequent to this, I would like the package to construct the data structures required to support such on a noSQL structure, ie interpret and build the necessary tables, then manage the interface between django’s data syntax on a standard SQL database and the table mapping so constructed, so that essentially the application is indistinguishable from one merely running on base django, on say a MySQL backend.  Then the behaviour of such would essentially allow one to use any django plugin currently available, without running into the need to rewrite its code (ie, directly alter its associated implicit data model(s)/data organisation, as described in its models.py file(s)).

Hence I’ve done some research, and found the following post on the MongoDB blog.  For comparison, here is a post describing a simple data structure for pure SQL.  Then it would seem that the problem becomes quite clear – whereas it is possible to use the same primary key field name for multiple tables, and join on such in mySQL or SQLite, primary key names must be different for each table on MongoDB or CouchDB, so it is necessary to create a table to associate different primary key field values if joining different tables in same, or some other noSQL backend.  Apparently this is an artifact of the fact that MongoDB must be in third normal form, whereas databases like mySQL are less restrictive, and can be in first, second or third normal form.  Presumably the fact that data must be structured in a certain way in MongoDB, as opposed to the freedom of MySQL or SQLite allows it to scale better, and consequently be the platform of choice for big data type services.

So this reduces the noSQL many-to-many django nonrel problem to the following: given a query that is essentially predicated on the assumption of first (or second, but I think first) normal form, regarding a join, how can one build boilerplate code to do the behind the scenes switch/replace, and create the data structures in third normal form representing the same equivalent information (or query) as per first?

Forum Experiment, part one

April 20, 2013

I now have a database of forums available, at this location: http://myinfo-scs.appspot.com .  This is based on code located here: https://github.com/kjk/fofou and described in more detail here: http://blog.kowalczyk.info/software/fofou/ .

I’ve also been looking at DjangoBB for something similar, in the singular forum sense, from which I could probably then easily abstract to multiple forums.  DjangoBB is rather more polished than fofou, and, of course, is written in the Django framework, which is probably a bit more solid than a free-for-all implementation as above.  The packages I am using, apart from Django Bulletin Board, are:

The price to the latter it would seem is the difficulty in getting the thing to work – in fact, I have since discovered that the only way to make the program work properly is either to rewrite djangobb (hard) or django-nonrel (very hard!).   Although it is certainly instructive to have a bit of a fumble with the djangotoolbox and django-nonrel code, there are inherent limitations with nosql that make a full solution to some of these problems more or less impossible.  Since many of the pluggable components to django (such as djangobb) implicitly rely on relational queries this can make things very difficult for running things on a NoSQL database such as GAE.

There are then several approaches one can then take.  One, is to use Google Cloud SQL – which is not horrendously expensive and I believe can be used for multiple apps for a single instantiation of the API.  Another is to use alternative forum techs, such as fofou, and another which I discovered recently, gforum, though that comes with a user-advisory regarding its hunger for user information via the widget for user logon.  But otherwise relatively promising.  So I am currently looking at defanging that particular piece of code.  For the adventurous/curious, the googlecode repository is here: http://code.google.com/p/gforum/source/checkout .

The third approach is to somehow solve the NoSQL many to many field problem and then incorporate that into django-nonrel.  Apparently this is one of those ‘untouchable’ or overly-ambitious problems in the area of computer science.  But apparently it is possible to solve the problem (although the answer is currently not open-sourced) as per this announcement here (quite recent too, September 2012): http://fatfractal.com/prod/joins-and-nosql/:

I’m an engineer and not usually given to making sweeping statements like, “we’ve solved the many-to-many relationships problem for NoSQL;” but in this case, I hope you’ll agree, it’s merited.

FatFractal are another engine like GAE.  Apparently the engine solves the problem prior to loading to a NoSQL database which I presume is something like MongoDB, such as app engine uses.  Regardless, simply knowing that the problem can be solved wins half the battle.  Indeed, if FatFractal’s claim is true, the fact that the problem is not impossible means that presumably it is only a matter of time before it is independently discovered how to do so, and the knowledge becomes public domain – and thereby applicable to the current django-nonrel distribution / github project (currently at version 1.5 development, 1.4 stable), here: https://github.com/django-nonrel/django/tree/nonrel-1.5-beta .

But until this happens, for me, and other non-experts like myself, I think the best strategy is to use alternative technologies to django (as above) to work on specific applications, since the biggest advantage to django (apart from the built in admin) is the pluggability of components (like ruby-on-rails), and more or less everything breaks and therefore needs to be rewritten (if the data models allow such) for app engine.  Consequently one might as well use web2py which is fully supported.  Although I’d certainly like to learn a bit more about how SQL and noSQL model data ; if nothing else, this would be quite instructive.

As to the question – why NoSQL, if it makes joins so abominably difficult?  The quick answer – speed.  NoSQL is a stripped down version of SQL, and is therefore faster and more horizontally scalable (apparently) so more suited to use with applications / services that need colossal amounts of data (eg, location specific information, weather data).  SQL is slower and does not scale as well, so is more suitable to applications where the data is somewhat more limited, but more highly interwoven and connected (eg, forums, blogs).

dotCloud is going OpenSource

April 13, 2013

Hi folks,

I have now moved my blog to this location, since dotCloud’s Sandbox mode is now moving to an OpenSource implementation, which I have to say is quite commendable of the chaps.  Since I can’t currently afford to maintain my own servers on a 24/7 basis, I consider this an affordable low maintenance solution.

However this does lead me to become more motivated in order to actually build a live application on dotCloud that meets their desired criteria: “Our core competency is and will continue to be the operation and support of large-scale cloud services, for tens of millions of visitors, 24 hours a day, every day.”  Certainly something to aim for!  I’m not entirely certain the nature of a service that I would like to build, but, while I’m casually brainstorming:

  • it could be a service for building rooms for online collaboration, via manyuser-video conferencing, leveraging off TokBox; I would charge some nominal (small) fee for the service on an hourly basis, and then pay TokBox a portion of the take – since TokBox is fairly cheap (I think $200 / 250 for the first 75000 hours of usage per month total), this doesn’t seem like a poor strategy.
  • an online wiki.  Running on mediaWiki, I could host a not for profit specialist wiki focusing on some area that wikipedia might not focus on in the depth that I might like – eg a research wiki.
  • a forum registration service.  Running multiple user forums, the profit model would be advertising revenue and banner space.
  • a game.  I had difficulties with doing this before, however, in terms of where I wanted to take it.
  • an online provider of digital course content that I create, or
  • a provider for MOOCs (massive open online course) that simplifies the process of creating an online network of materials for students to use in their courses.

My impression is that the first and third points are probably the ones I’m most likely to want to try.  And the third would probably be the best to get started with.

The first step to enact the forum idea in particular might to consider Django as the technology to approach the matter.  I know that the lightbird tutorials cover forums in this respect, so I would not be starting from scratch.  It would then be simply a matter of extending / abstracting away from a single forum to creating a database of forums.

I guess in terms of how I would go about testing prior to implementation is that I would deploy on a Sandbox server, self-hosted, then contact dotCloud should I mean to move to Live / Enterprise mode.  Broadly speaking though, I think consideration of a live service is really only worthwhile if one has a passion or driving need to work in an area and sees an opportunity for improvement.  And largely speaking I’m learning a bit more fooling around with Unity currently! and see no obvious ways to simplify online transactional foolishness.  Learning is perhaps the most important thing for me at the moment – knowledge is, afterall, the best enabler of serendipity.

Google App Engine now supports Django

March 24, 2013

It looks like GAE is not just NoSQL any more, see here.

In other news, I have a strategy for extracting the mysql tables for manual backup from this wordpress blog.  Essentially a copy of ~/data/wp-content via this method here.  (I can check that this is the correct location by

dotcloud run -A <name of application> <name of database service>
ls

in a terminal window.) Evidently I will need to test that this works. My plan is to create a new temporary application and see if I can “restore” the backup data to that application. If I can, I won’t have to mess around too much with automation of the process (I suppose automation is especially useful if one is setting up something like this for a client, and you don’t necessarily want to trust them with command line access to the code on the server itself).

It is worth noting that Google app engine also now supports backup (although still experimental), see here. But it looks very workable!

A very django meandering

January 7, 2012

I’m currently trying to get http://www.lightbird.net/dbe/photo.html this tutorial to work.  I’ve succeeded in getting the basics to display, except for the images!  Which of course is a bit silly, since that is the whole point of the tutorial.  So far I’ve discovered that firefox seems to like things being served in html as file://localhost/[your local location of file] but I understand that this is bad coding practice; it is far better to keep paths relative.

Even so, when I tried to directly reference an image in django I found that the html parsed correctly in firefox, even that which was generated, but did not on the django server!  This has caused me some puzzlement.  Currently I am investigating the suggestion (at the original page) and also backed up by a comment (with zero upvotes) at stackoverflow here: http://stackoverflow.com/questions/2443752/django-display-image-in-admin-interface in order to see if modification of the base admin template might work.

My development environment is eclipse.

UPDATE: Turns out that the problem was with my urls.py file.

#urls.py…..

#… import statements…

from tutorial3_lightbird.settings import MEDIA_ROOT

urlpatterns = patterns(”,
#… ,
(r’^media/(?P<path>.*)$’, ‘django.views.static.serve’,{‘document_root’: MEDIA_ROOT}),
)

Turns out that I was using the MEDIA_URL value from settings.py before.  Rookie mistake.