When you study computers-science at the University of Paderborn you get taught the Java programming language in your first two semesters (which you might already know by then). Afterwards you get a brief introduction on other programming paradigms and the corresponding languages: C, Pascal, Lisp, ML, SmallTalk, SQL, Prolog, …
Starting from your third semester you are expected to learn any arbitrary programming language by yourself. That’s it. I don’t know if this equations works out or not. But if you encounter someone who claims to be a “software-architect” and did study in Paderborn and considers programming to be beneath him,… well I think you catch my drift.
I can only pity all people who consider programming as beneath them, because they miss out on one of the important lessons of computer science. Language does define how we think. If we can express some topic in such a way, that someone other with a certain background on that topic can grasp and understand it, then we have truly conceived that topic ourselves. Here the computer is nothing more than an unforgiving and impolite listener. You need to be concise and exact or everything will blow.
If all programming languages were equivalent from the programmers point of view, everything would still be written in Assembler. This is where we Nerds who collect programming languages get our revenge. If you now still think of programming as something unworthy of a gentleman-engineer, well then: Fuck you very much.
I recently held a small workshop on Python at the PC^2. Here are the slides for the language introduction. I will post the slides of the project and it’s source code soon. Nonetheless my ambition for professionality in holding this workshop, I am still an undergrad student. A student who collects programming languages and has some expertise on this field, but still a student.
Now some fellow student of computer science asks whether he can have a certificate for attending this workshop. For his CV most probably. Does this guy want to ridicule himself? From the third semester he was expected to learn any arbitrary programming language in approx. one week and now he wants to have a certificate which states that took a workshop, held by a fellow undergrad, in order to learn a language as simple as Python.
His request might be correlated to the fact that in germany, more attention is paid to titles and certificates than to real skills. A behaviour which certainly applies to job applications. This feels like a Dilbert Comic, or a sketch from Loriot, that I now have to hand out certificates to fellow students, so they can pimp their CV with yet another certificate. I mean, where’s the justice? Who gives me a certificate for holding such a workshop, or even certifying my expertise in the Python language? Sadly: no one. This leaves me in the tricky question how I am going to pimp my CV. A difficult question indeed.
This is as disturbing as it is ridiculous and since we recently celebrated the 85th birthday of of Loriot, whose subtle, insightful and gentle humour has still to be surpassed in germany, I have taken the time of hand crafting a certificate of attendance for this workshop I held and I will happily hand it out to every attendee who want’s to add it to his collection. I hope you will find it as artistically pleasing as informatory. Behold:
Edit: I have redone that certificate, because the photocopier had problems with the original one.
P.S.: I hope you all have one of those valuable “Yodel Diplomas”.
At the moment I’m preparing a workshop for the Python programming language and thus I’m looking for interesting examples. They should be short and sweet and I already stumbled over The sieve of Erastosthenes. Unfortunately, I was a bit keen to present some advanced data structures from Python. Because of that the performance of my implementation was bad,
-
def sieve1(N):
-
primes = set()
-
numbers = set(range(2,N))
-
while numbers:
-
current_prime = min(numbers)
-
multiples = set([n for n in numbers if not n % current_prime])
-
numbers -= multiples
-
primes.add(current_prime)
-
return primes
If you are a little bit into computational complexity and python, you should spot several problems with performance:
- I’m using
min(...)which has a runtime of O(N) in the number of elements in the set. So we are doing an O(N) operation every while-loop. Not good. - The outer while-loop runs, until all numbers are removed from the set. Here some unnecessary loops might be made.
Frank pointed these problems out and I made a futile attempt at reducing the runtime cost of min() by using lists instead of sets. Since now the lists are ordered, min() can be done in O(1).
-
def sieve2(N):
-
primes = list()
-
numbers = range(2,N)
-
while numbers:
-
current_prime = min(numbers)
-
numbers = [n for n in numbers if n % current_prime]
-
primes.append(current_prime)
-
return primes
I compared these two approaches with the cProfile module, for sieves up to 100000 and got the following results:
38375 function calls in 63.917 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 63.917 63.917 :1()
1 24.233 24.233 40.626 40.626 erastosthenes.py:10(sieve1)
1 18.460 18.460 23.277 23.277 erastosthenes.py:20(sieve2)
1 0.014 0.014 63.917 63.917 erastosthenes.py:52(main)
9592 0.024 0.000 0.024 0.000 {method 'add' of 'set' objects}
9592 0.016 0.000 0.016 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
19184 21.163 0.001 21.163 0.001 {min}
2 0.007 0.004 0.007 0.004 {range}
We can already see a speedup of approximately 50% from sieve1 to sieve2 but Frank then proposed an even faster approach. It uses an array of boolean values, whose indices represent the corresponding numbers. If ‘x’ is prime, the value at that position in the array is True. The algorithm now sweeps over the array and looks for already found prime numbers up to sqrt(N), if a number is marked as prime (i.e. array field is set to true) we mark all it’s multipliers as not prime. This can be done easily with a second for-loop. The array is initialized with all values as True and we start with 2, mark all even numbers as non-prime and go to 3. Now 3 is found to be prime and all it’s multipliers are marked as non-prime, then 4 is found as non-primed and skipped,… and this is continued up to sqrt(N).
At first I could not grasp why we only need to look until sqrt(N). But consider that we have reached sqrt(N), and marked all non-prime numbers according to our algorithm. Frank now told me boldly that at this point there are only correctly marked primes left in the array above sqrt(N) and up to N. This is true, even if it’s not obvious at first. But consider the following: every non-prime number greater than sqrt(N) can be factorized into two primes, where one prime is smaller than sqrt(N), otherwise we have a non-prime number which is greater than sqrt(N)*sqrt(N) = N, so it’s not in our search range. So if we have correctly found all primes up to sqrt(N) and marked all their multipliers as non-primes, only correctly marked primes greater sqrt(N) and smaller N remain.
-
from array import array
-
-
def sieve3(N):
-
numbers = array('b', [1] * N)
-
sN = int(math.sqrt(N))
-
-
for i in xrange(2,sN):
-
if numbers[i]:
-
for j in xrange(2*i, N, i):
-
numbers[j] = 0
-
-
return [i for i in xrange(2,N) if numbers[i]]
This algorithm does not need high-level data structures like linked-lists, so we tried to get the most simple array which could be found in Python and hence tried the array module. But for reference we also tried the standard lists of Python.
-
def sieve4(N):
-
numbers = [1] * N
-
sN = int(math.sqrt(N))
-
-
for i in xrange(2,sN):
-
if numbers[i]:
-
for j in xrange(2*i, N, i):
-
numbers[j] = 0
-
-
return [i for i in xrange(2,N) if numbers[i]]
These are our performance results:
38379 function calls in 98.769 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 98.769 98.769 :1()
1 37.161 37.161 60.856 60.856 erastosthenes.py:10(sieve1)
1 29.776 29.776 37.521 37.521 erastosthenes.py:20(sieve2)
1 0.211 0.211 0.211 0.211 erastosthenes.py:29(sieve3)
1 0.120 0.120 0.120 0.120 erastosthenes.py:40(sieve4)
1 0.061 0.061 98.769 98.769 erastosthenes.py:52(main)
2 0.000 0.000 0.000 0.000 {math.sqrt}
9592 0.027 0.000 0.027 0.000 {method 'add' of 'set' objects}
9592 0.017 0.000 0.017 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
19184 31.387 0.002 31.387 0.002 {min}
2 0.009 0.005 0.009 0.005 {range}
Now the speedup is quite gigantic, at a factor of roundabout 180. Interestingly, the standard Python lists are two times faster than the arrays provided by the array module.I must have missed the point of that module. But we wanted to go down to the bare metal and compare the performance with a C program.
Due to the fact, that all you need is an array and two for-loops, we tried to implement the algorithm in C and embed it as a module in Python, so we could make a better comparison. The most difficult part here was getting the algorithm properly wrapped and packed into a .so-library usable from Python. We saved the following source code in a file called erastomodule.c and compiled it into a .so-library for dynamic linking.
-
#include "Python.h"
-
#include <math .h>
-
#include <stdlib .h>
-
#include <string .h>
-
#include <stdio .h>
-
-
static PyObject *
-
erasto_sieve(PyObject *self, PyObject *args)
-
{
-
unsigned long N = 0;
-
unsigned char *numbers = NULL;
-
-
PyObject *primes = NULL;
-
PyObject *prime = NULL;
-
-
unsigned long i,j;
-
-
/* Parse the given parameter */
-
if (!PyArg_ParseTuple(args, "k", &N))
-
return NULL;
-
-
-
/* Initialize datastructure for sieve */
-
numbers = malloc(sizeof(unsigned char) * N);
-
memset(numbers, 1, N);
-
-
/* This is the actual sieve !!! */
-
for (i = 2; i < sqrt(N); i++) {
-
if (numbers[i]) {
-
for(j = 2*i; j < N; j+=i) {
-
numbers[j] = 0;
-
}
-
}
-
}
-
-
/* create a new PyList */
-
primes = PyList_New(0);
-
if (primes == NULL)
-
return NULL;
-
-
/* Put primes into list */
-
for(i = 2; i < N; i++) {
-
if (numbers[i]) {
-
prime = Py_BuildValue("k", i);
-
if(PyList_Append(primes, prime))
-
return NULL;
-
}
-
}
-
-
return primes;
-
}
-
-
static PyMethodDef ErastoMethods[] = {
-
{"sieve", erasto_sieve, METH_VARARGS, "Sieve of Erastosthenes."},
-
{NULL, NULL, 0, NULL} /* Sentinel */
-
};
-
-
-
PyMODINIT_FUNC
-
initerasto(void) {
-
PyObject *m;
-
-
m = Py_InitModule("erasto", ErastoMethods);
-
if (m == NULL)
-
return;
-
}
Then we could import it like a normal module from Python and use the function we called erasto.sieve for our computation. The following is a comparison of sieve4 and the erasto.sieve function we implemented in C. Again we have a speedup of roundabout 180, resulting in a total speedup of roundabout 32400 when compared to my initial approach.
6 function calls in 0.149 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.149 0.149 :1()
1 0.126 0.126 0.126 0.126 erastosthenes.py:40(sieve4)
1 0.017 0.017 0.149 0.149 erastosthenes.py:52(main)
1 0.006 0.006 0.006 0.006 {erasto.sieve}
1 0.000 0.000 0.000 0.000 {math.sqrt}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The conclusion should be clear:
- Think twice about your algorithm
- Profile mercilessly
- And port expensive functions to C
Happy optimizing!
The more I use Python, the more it becomes my favourite Programming Language. Two days ago I started learning 3D-Programming with OpenGL.
The Python bindings proved to be fast and well documented and everything was quite nice and easy. But as I’m taking a course on 3D-Programming at University (called Computergraphics 2) I was forced to port all 3 Assignments already done in Python to JOGL. And it proved to be more of a pain in the ass than I had expected. Just compare this lines of Java:
float light0_position[] = { 2.0f, 5.0f, 1.0f, 1.0f };
gl.glLightfv(GL.GL_LIGHT0, GL.GL_POSITION, light0_position, 0);
whit this line of Python:
glLightfv(GL_LIGHT0, GL_POSITION, (1.0, 1.0, 1.0))
Do you recognize the significant reduction of namespace-clutterage in Python. There’s no
`gl.glSomeFunction` just `glSomeFunction`. Less is more, I do say. Thats due to Pythons ingenious
System of importing things into the local namespace. Sure, you have to be careful, but for things like OpenGl it’s
dead useful and you do not need to fear name-collisions: OpenGL origins from C, which is totally namespace-ignorant, thus
everything carries an unique identifier. This is almost obvious when you take a look at constants and functions: its like the
namespace was pulled into the name itself.
The usefulness of namespaces arises from the possibility that you can load stuff from one namespace into yours and thus omit
the necessity of prefixing things with the considered namespace. Java itself is ignorant regarding Module-Functions. Either you
use Object-Methods or Class-Methods and hence you need to prefix those with the Object or Class.
What bothers me a little is the fact that the JOGL-Developer did not try to port the OpenGL-API a little more conveniently to Java.
On one Hand they could have stripped all the Class-Constants of the GL-Class from their GL thus transforming a `GL.GL_SOME_CONSTANT`
to just `GL.SOME_CONSTANT` and secondary they could have done this to the gl-Objects too, and (I think you are already guessing it)
transforming `gl.glSomeFunction` to just `gl.someFunction`.
And by the way, the PythonOpenGL-API-Reference is way better than the JOGL-Reference.
That’s the rant for today. I will need to find a way to get comfortable with JOGL, since regarding my course I will have no other choice.
Cruel World!
At the moment I’m at a most painful taskt of converting different Java types into byte-array for sending over the network.
I need to remember that Java stores everything in a 2complement notation.
But to handle all that stuff I’ve written my own Byte-O-Rama Class in Java. It’s a shit that Java isn’t even able to encode
its basic data-types into byte[] in a simple and convenient manner. I don’t talk ByteBuffers/writers and stupid stuff i talk
byte[] barr = (byte[]) intValue; Plain and simple Type-Conversion. Shouldn’t even be explicit. But since XML is overhyped in Java
no one seems to be concerned that anybody out there might use playn byte[] ’s as a fast, overhead-free Method to serialize his objects.
Python conveniently offers this: http://www.python.org/doc/current/lib/module-struct.html
If I’ve got time and anybody bothers I will go and publish my Byte-O-Rama.
