Alexander Beletsky's development blog

My profession is engineering

CS101 Building a Search Engine: Week 1

Disclaimer: this blog post expresses some impressions and details of Udacity CS101 “Building a Search Engine” online course. If you are either currently participating it or plan to do so in nearest future, this blog post could be a spoiler. Even though, I’m trying to make it generic as possible and do not spoil important things.

CS101 is fundamental course that supposes you have no background in programming at all. That’s why all lectures was very-very basic and sometimes I felt really bored. If you fell the same, that’s probably ok, since the most interesting stuff is about to start from Unit 3.

In the same time, for guys who has no programming skill’s that might be even a little tough. To be honest, I had really tough moment during my first homework, that should not be a problem for professional programmer, but I’ll describe it later.

Unit 1: Basics, Python, Numbers and Strings

What I understood from my entire career is that: backing to basics is always great. The years of enterprise development makes you strong in technologies and frameworks, but I managed to lost almost everything I got during my university days. Restoring that knowledge is very good brain exercise, constant repetition of basic is the way to mastery.

So, even that simple unit gave me a lot of things to remember, plus I learned some elementary of Python language.

Backus Naur Form

What was really interesting to me during Unit 1 is so called Backus-Naur Form, for describing the computer language grammar. This is a method of formalizing any (probably) computer language syntaxes. It has been invented by John Backus American scientist, how is famous as creator of FORTRAN and ALGOL computer languages, as well as his researches in functional programming.

Backus-Naur form is really simple and really powerful. It is described by the set of non-terminals and terminals. Each language expression is derived from BN form. Let’s take and example,

<sentence> ::= <subject> <verb> <object>
        <subject> ::= <noun>
        <object> ::= <noun>
    

Here is the primitive BNF for English language. Each sentence in English should contain Subject, Verb, Object to be complete and have a meaning. Of course, BNF is not suppose to describe natural languages as English or Russian, since it much more complex.. but it works very fine with computer languages, which are strict. So, everything in brackets are so called non-terminals, it simply means that expression could not be terminated (completed) based on them. To complete we need terminals. Each non-terminal is replaced by terminal till it’s done.

<sentence> ::= <subject> <verb> <object>
        <subject> ::= <noun>
        <object> ::= <noun>
        noun ::= I
        noun ::= Python
        noun ::= Cookies
        verb ::= Eat
        verb ::= Like
    

Now, the form is completed, so we can try to derive expression out of it. Let’s try that. So, we start from the top line

<sentence> ::= <subject> <verb> <object>
    

Derive all non-terminals from expression

<sentence> ::= <noun> <verb> <noun>    
    

First “noun” is still non-terminal, so we proceed. Due to form, noun could be any of three (I, Python, Cookies) - so I can pick up any.

<sentence> ::= I <verb> <noun>    
    

“I” is the terminal, so we process next non terminal which is verb. Verb could be any of (Eat, Like). I’ll take “Like”.

<sentence> ::= I Like <noun>    
    

The last non-terminal is noun again. The same three options (I, Python, Cookies). Let’s pick “Python”.

<sentence> ::= I Like Python    
    

All non-terminals are replaced with terminals, that means we derieved the expression completely. Based on that simple algorithm I can derive other expressions that would be valid for that form.

<sentence> ::= I Like Cookies    
        <sentence> ::= I Like I    
        <sentence> ::= I Eat Python    
        <sentence> ::= Python Like Cookies
        <sentence> ::= Python Eat I
    

As you can see, some of them are completely non-sense, but still they are totally valid expressions. If you are curious, you can find BNF’s for many know languages here.

Starting up Python

Start with Python programming language was one of my goals, dusted for quite long time on goals shelf. Hope that CS101 and further courses would be motivating enough to finally learn it. So, if you are like me - .NET, no Python background - don’t worry, that’s easy enough. Basically, all you need is Python interpreter and some text editor.

I really like Chocolatey for installing that stuff. Chocolatey is like NuGet package manager, but for software. I encourage you to try. So, instead of going to site, looking for latest version etc. I just opened my Power Shell command like and put:

cinst python
    

In 3 minutes, Python was on my machine.

The editor, you can pick up any you like. I prefer Sublime Text 2, it’s really cool. Again, you can install it by Chocolatey.

cinst sublimetext2
    

After that you are almost Python developer. Just need to learn the language.

Strings, Find in strings

The rest of Unit 1, was mostly string operations in Python. And I was surprised how easy Python syntax is. First, you don’t need to ‘mark’ varible declaration anyhow.. No types, no ‘var’ just the name and value.

s = "Hello World"
    

You can access each char inside the string just with [] operator.

s = "Hello World"
        print s[0]
    

It would print “H” char into console. It’s not really cool, what cool is - substrings by index and negative indexes.

print s[0:5] # -> Hello
        print s[:5]  # -> Hello
        print s[6:]  # -> World
        print s[:]   # -> Hello World
        print s[-1]  # -> d
        print s[:-5] # -> Hello     
    

The find operation is very similar to what we have in C++ and C#. It, tried to find substring in string if it’s find, position returned or -1.

print s.find('World') # -> 6
        print s.find('o')  # -> 4
        print s.find('o', 5) # -> 7
    

The last thing in the unit was str() method, that able convert any number (integer or float) into string representation.

print str(3.14)  # -> 3.14
        print str(100)  # -> 100
    

That’s it. Based on that knowledge I suppose to complete my homework.

Homework

Again, as whole unit - there was really simple problems. The ‘real’ tough guy to me was very simple problem - “Rounding numbers”. So, you are given with float number and you have to return it’s integer representation. If numbers fraction is greater than 0.5, it should go ceil otherwise it goes floor. Not big deal I thought to my self and start to write code..

I spend around 10 minutes to create code like that:

number_as_string = str(x)
        dot_position = number_as_string.find('.')
        if dot_position != -1:
         integer_part = int(number_as_string[0:dot_position])
         decimal_part = int(number_as_string[dot_position + 1:])
         decimal_length = len(number_as_string[dot_position + 1:]) 
         dividor = pow(10, decimal_length - 1) * 5 

         if decimal_part >= dividor :
          integer_part += 1

         print integer_part
    

While writing that I had a bad feeling, that I’m using something that I’m not supposed to know, actually. The code worked and I submitted the solution. I received a response, that it actually giving right answer.. but, I was asked to create solution without if, int() or round().

Believe me or not, but I really frustrated on that task. I just didn’t understood how it’s possible to do not use any if here, but I have a condition inside the problem. I spend additional 10 minutes, starting to think it’s just impossible. It’s really funny, but indeed I thought it’s something strange and had a great temptation to go and check for correct answer. Fortunately, I got this this online discussion (each course has it’s discussion board, where student’s can share the info). It turns out I’m not alone, some professional programmers did the look like I did with if’s and calls to other functions.

Finally, I just tried to concentrate and really pretend to be a person how is just listen to that material first time, using the only knowledge I got in Unit 1. And solution came up to my mind! It was sooo easy, so I felt really ashamed for the code I wrote above. It’s 3 lines of code, using just str() and find() method, so simple.

That was definitely facepalm situation. But, it really encourage me to continue!