It was few years back on my first year of studies, when my friend told me about Python. Then it was great for me. One-liners could read ftp folders, web pages, interact with operation system, list comprehension, injecting code into life programs, nested functions, procedural as object oriented programming styles etc. etc. All of this in bultin libraries with simple but yet quite functional editor, supported with great documentation. I wrote my first studies project in Python and I was proud with it a little Wink. During next years I was using Python for writing simple scripts to automate my proffesional work as PHP programmer. Python was and I still think is better language then PHP. After that I started to work as .NET/C# developer. It has diffrent syntax but C similarities let me to learn it quite quick. Framework .NET is powerfull tool also so I didn't touch Python for 2 or 3 years at least for anything longer then few lines.

But a week or so ago I decided to write a simple script to automate reading whole RSS channel. I wanted it to filter it for only those news that I was interested in. I decided to use Python for it. I was considering also PowerShell since it is bultin into Windows and it an benefit from .NET. But I don't like it's weird syntax, and since I wanted to just write this script and not struggle with syntax errors I chose Python. I don't want to say that I made an mistake. I still powerfull language and I manage to write this script quickly, but... there was some misunderstanding. Maybe I am more experienced. Or spoiled with C#, or used to it. Or just picky. Nonetheless I decided to share this with the world.

 

One of this things was global versus local scope clash. I probablye would get over it if not other things.

Let us consider little program like this:

global_var= 1

def func_with_local_scope():
    print(global_var)

func_with_local_scope()
print(global_var)


I does really simple thing: prints value '1' two times. Despite the fact that func has it's own local scope aside from global scope, it will work cause interpreter will think about as 'global_var' variable as global. It is correct behavior. Now let is change a bit:

global_var= 1

def inc():
    print(global_var)
    global_var+=1

inc()
print(global_var)

And... it will fail. Why?

  File "test.py", line 4, in inc
    print(global_var)
UnboundLocalError: local variable 'global_var' referenced before assignment

My first reaction was: "Huh?" How it is not assigned? It is done right before this function. Let's inspect scopes inside 'inc' function:

GLOBALS:

{'__name__': '__main__', 'global_var': 1, '__package__': None, '__doc__': None, 'inc': <function inc at 0x00000000031EC488>, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__file__': 'F:\\skrypty\\music_rss\\test.py', '__builtins__': <module 'builtins'>}

LOCALS:

{}

 

It is perfectly fine! Our variable is defined in global scope and it has value! And local scope is empty since I did not declared any local variables. Why system that is resolving variables names do not check for global scope when it is trying to find value? Because doing any assignment in function tells interpreter that this variable WILL be assigned in function and without declaring it with 'global' keyword it WILL be local variable. I know that. I understand that. And I think that is inconsistent and it should be done better. I would agreed to use always 'global' keyword or always not using and just letting interpreter do the job. Either way would be fine since it is consistent and desinger of the laguage decided to this that way. I his call. I could not be ideal but it would be intuitive.

Let's do one test:

global_var = 1

class Incrementer:
    print(global_var)
    global_var+=1
    print(global_var)
    def inc(self):
        print(global_var)
        print(self.global_var)

i = Incrementer()
i.inc()
print(global_var)

When I was writing this I was expecting it to fail with my knowledge of scopes in functions. But I also was suspecting that this actually might work and I will find more scope inconsistencies. And there it is. I work just fine. I mean it run but it certainly do that like Spanish Inquisition. Result will be:

1
2
1
2
1

First '1' is print(global_var) inside class declaration. Despite the fact that variable with this name is defined later in class and since this would cause an error in function I would expect it to fail here also. Here comes another 'Huh?'. Instead it is resolved to global variable.

Second print is done after incrementing variable with the same name. Which is it? It is global variable? Or it is new local class variable? If code would be end here it would be good, tricky question for interview for Python job (maybe it is I don't know). Simple programmer like me would expect (if you reference variable with the same name in the same way it SHOULD be the same variable). No. It is not.

Third print show that to us. It is global variable with the same name and it is still simple '1'. Doing incrementation in class scope of global variable (as I would expect) creates new local class variable and assign to it value of global variable PLUS one. How in hell this is incrementation?

Forth print shows us value of two and actual place of incremented value. It is know in class variable with same name as global variable. I will call this inrementation with displacement. Python-like.

Last print shows us global unchanged variable. It still there, not touched.

Uhm... I don't know. Maybe this is explained somethere in documentation. Maybe it is logical somewhere in bowels of interpreter. Or whatever magic thing make this work. I don't know. I don't have time and will to dig in few hundres pages of documentation for specifics of inner workings of something so basic like arithmetical operators. And frankly I don't care. Throw and error, when you trying to increment value that is not there - like C. Resolve it to global variable like in JS. Prevent to name the variable just like another variable in scope level above - like C#. Just pick one please. And be consistent.

Another one exist only in Python 3, contrary to previous. It begins with decision that all strings should be unicode. That is great. Resolving encodings is pain in the ass. But... console have it's encoding too. I don't know how this work in other systems then Windows but I suspect it's problem too since when googling for answer I found that in happened not only to me. So what happens when you get some unicode char that can't be encoded in console specific encoding? It suppose to show some not readable characters like │┐čŠč╣˝ˇ│ or just ? sings, don't you think? No. It throws exception. Program crash. End of story. It is command line. It is tool to interacting with users. If user will have to make something from information that 10% of it is giberish that is bad not it is not critical. When application is throwing an error when encoding string to some specific encoding when saving data to for example to DB, it probably be desired to throw error. But not in output! It is even more horrible because writing an app in Python you cannot know for sure what will be user interface encoding so you cannot be prepared on anything with tool like this.

So script below will work fine in Python Shell fine and will fail in cmd.exe:

string = 'показано'
print(sys.stdout.encoding)
print(string)

With exception like that:

  File "C:\Python33\lib\encodings\cp852.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: cha
racter maps to <undefined>

I added line print(sys.stdout.encoding) so I know that Python internally know what encoding is in output. This can be used to do something like this BEFORE print:

string.encode('cp852','replace')

so every char that can't be encoded will be replaced with something else. But why this cannot be done inside print? Why print at least cannot have another param to silently pass encoding errors?

Another one that is connected with 3 version. In previous releases I used Python to debug http response of websites. It was really nice tool for it with set_http_debuglevel method of HTTPHandler class.

send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.wp.pl\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: aris
header: Expires: Thu, 07 Feb 2013 21:21:24 GMT
header: Last-Modified: Thu, 07 Feb 2013 21:21:24 GMT
header: Pragma: no-cache
header: Cache-Control: no-cache
header: Content-type: text/html; charset=UTF-8
header: Set-Cookie: reksticket=1360272084; expires=Sat, 09-Feb-2013 21:21:24 GMT; path=/; domain=.www.wp.pl
header: Set-Cookie: rekticket=1360272084; expires=Sat, 09-Feb-2013 21:21:24 GMT; path=/; domain=.wp.pl
header: Set-Cookie: statid=89.71.103.226.25161:1360272084:324990297:v1; path=/; expires=Sun, 07-Feb-16 21:21:24 GMT
header: Set-Cookie: statid=89.71.103.226.25161:1360272084:324990297:v1; domain=.wp.pl; path=/; expires=Sun, 07-Feb-16 21:21:24 GMT
header: Content-Length: 94719
header: Connection: close

It not working in Python 3:

import urllib.request

h=urllib.request.HTTPHandler()
h.set_http_debuglevel(1)
b=urllib .request.build_opener(h)
a=b.open('http://www.wp.pl').readall()

This feature was not documented as far as I know. I have found this in 'Dive in Python' book (http://www.diveintopython.net/). I suspect this was to non important to keep maintaint it in new versions. So much that it was not even removed.

I could go on with lack of nice function for handling dates and time. Or 'self' (non)keyword inside class declarations. But this rant is long enough as it is. Maybe I will write another script in PowerShell, after fighting with syntax it will be more usable for simple scripts.