CS101 Building a Search Engine: Week 5 and 6

May 31st, 2012

Disclaimer: this blog post expresses some impressions and details of Udacity CS101 “Building a Search Engine” online course. If you are either currently participating it or plan to do so in nearest future, this blog post could be a spoiler. Even though, I’m trying to make it generic as possible and do not spoil important things.

With a quite delay I’ve concluded units 5 and 6. I’m in a big rush now, since the exam week is already started, but I’ve not yet completed Unit 7. Fortunately, Unit 7 is not technical one, but rather common computer science education, that helps to shape all knowledge received through seven weeks together.

I would say that those 2 units is something where I start to feel some complexity. In Unit 5 we focused on making things faster, basically by introduction more advanced data structures for the same job. We went from a list based index implementation to self-implemented hash table and then utilized the Python dictionary type. Again, abstracting out of many simple things is what good developer should always do, but I was surprised how many things I forgot about main properties of hash functions and hash tables. We were also did a very basic algorithms analysis stuff.

Unit 6 is a real computer science. Besides of the playing with recursive algorithms we did more advanced things as graph theory. All of that was a fundamentals for implementing Pang Ranking mechanism. We used famous Google’s (Larry Page’s) algorithm that everybody heard of PageRank. This is where my brains start to heat. Will be honest with you, I still missing it’s some parts, so it will take some time get the clear picture about it.

So, the crawler starts to have real search engine features. Not only extracting links and indexing the keywords, that’s definitely not enough for search engine. But building the links graph and computing page ranks, that then used in lookup functions to provide the best choice on search keyword. It’s very simplified but working model of something that Google have (probably something that Google might have back in 1998).

Python. I like language more and more and start to feel some confidence. In the same time there are several things that I dislike. Not so serious, almost cosmetic.. but something that a little bugs me a little.

Anyway, I have only few days now to submit my exam works. I already glanced on exam tasks and they don’t appear to much complex, so I have good chances to be in time with it. Wish me a good luck! I’ll update you as soon as I got any results!

The Limit of Keystrokes

May 26th, 2012

Code Kata, Mind

Some time ago I’ve posted an article called How to be better developer or 3 accounts rule where I shared my vision on importance of having some accounts if you care about professionalism and mastery. The basic idea there is you have to produce a lot, you have to write a lot. You have to write code, write blog posts and help people on forums.

If you have seen one of the latest Scott Hanselman’s Productivity Tips video, you probably heard nice metaphor about the “Limit of Keystrokes”. It sounds really funny, Scott saying - “everybody is having limited count of keystokes, if you reach you count - you dead”. That means, we have to think on each keystroke we do with much care.

And the most efficient usage of keystrokes is for learning. Reading is very important, but nowadays there are so many information that you just not able to consume everything. That’s why recently I’ve unsubscribed a lot of RSS channels in my Google Reader, I think I will filter the Twitter as well. Instead, I try to read as few things as possible. Not only read, but try them on practice. Just write some small code to try the idea, pattern, approach of framework. Turning the theory in practice is something that gives me much fun and satisfaction.

Yesterday, we were doing Coding Dojo with good friend of mine @skalinets on Agile Base Camp conference. Instead of going my default language (which is C# for now) I paired the PHP guy. We were doing String Calculator kata in PHP. It was so great! First of all, I amazed how much PHP improved since the last time I looked on it (about 10 years ago), second is that I used my keystrokes to learn something new.

Writing the code for fun and learning has a huge value. This is definitely good investment of the keystrokes.

Github for Windows - Yay!

May 21st, 2012

Git, GitHub

For quite long time Windows users of github had a huge envy towards Github for Mac - the application that makes work with github based repositories as simple as possible. A lot of people, especially ones who are not familiar with Git, experiencing some issues with github initially. No surprise, extensive command-line, SSH, public/private keys - might sound scary for GUI addicted persons.

December 2011, Phil Haack joined github.. so, the world hold the breath, to see what actually will be done by Phil and team to improve overall Github experience on Windows. And the day has come! Today Github:Windows is officially shipped.

What’s the point?

As well as Mac users, Windows users are also much got used to UI. For long time, if you want to deal with Git on windows, you have to go and install msysgit. It’s a great product and works great actually, but you have to spend some time of learning of Git to do very basics operations. Moreover, if you hadn’t had any experience with distributed version control systems - you’ll be to much confused by new words like: pull, push, clone, fork, cherry-pick and so on. Github for Windows is about to fix that.

Go ahead and install it

Installation is very easy. Just click the download link, to get web setup file. Two things are gonna installed on you machine: Github client itself and Git shell - the powershell command line for git.

Github for Windows client

At the first run it will do some configuration stuff. It will ask you for github credentials.

As you logged on, it will show you some basic account information.

It will also add new public SSH key your account. That was a little unexpected as I received email notification for github about that. The information that it’s gonna do that, probably should be mentioned during setup.

Then it tries to locate all repositories. It scans the home folder, but I don’t keeping repositories there (just some temp copies), so I unselected everything.

As I tried to go straight and create new repository the application crashed. Oppps.. It reproduced several times, but after gone. Anyway, I contacted support@github.com with detailed steps and info.

Finally, after I changed the default folder and wait till it’s completed scanned (that took about 3 mins on my machine), I got client working.

Even if I’m not huge fan of METRO style - I was really pleased with UI. It looks very nice, application works fast and responsive. It takes almost no effort to overview application features.. everything is very intuitive.

It’s of course not the perfect. I tried to do some commits, that seems to be fine.. but sync of the repo failed. It also fails to switch the branches in 95% cases.

Git Shell for command line

Next good addition is Git Shell, powered by PowerShell (nice!). It utilizes famous posh-git project. The most useful features for me now: ‘Tab’ support that provides suggestion for the command and ‘Stats’ that are shown at command prompt, showing current repository state. There are probably a lot of other cool things there, that I haven’t discovered yet.

If you are fan of Bash, or pure Cmd.. or custom stuff (as Console 2) it very easy to change that, right in application configuration.

Conclusions

Even if it’s just first release, it’s very solid and a lot of features already there. Issues exist, but I hope it will be cleared out soon.

What make me a little wonder, that the project is not open sourced? I hope it’s just the question of time, I’m pretty sure that a lot of people are waiting to see what’s inside and submit some pull requests.

Will I personally use that product? Probably, not. I’ve spent too many time in Command-line of my favorite Far Manager, that UI is more noise that help. What I will use is Git Shell, thought. I’ve heard a lot about posh-git, now it’s time to try.

But for all people, who are just starting using Github on Window - Github:Windows would be my first recommendation.

CS101 Building a Search Engine: Week 4

May 20th, 2012

CS101, SearchEngine, Udacity

I’ve got completed Unit 4 of course during this week. It’s getting more and more interesting and the crawler we building there getting more complicated.

This week we got through the basic data structures, mainly based on lists. The most interesting thing was an index data structure, thought. We’ve built the simple page indexer. Now the result of crawling is not simply the list of crawled links, but instead is index that keeps track of content (as word) and the URL where the word is mention. If some of you don’t know what the index is, the simplest explanation is get just to open any technical book. At the end of the book you will see “Index” section. By looking for information you have to option. Either go from one page to another, finding keyword appearance.. or go to index and see exact pages, where this keyword is mentioned. Indices are essential for quick search of data.

The index that my crawler produce, crawling the test page is:

[['This', ['http://www.udacity.com/cs101x/index.html']], 
   ['is', ['http://www.udacity.com/cs101x/index.html']], 
   ['a', ['http://www.udacity.com/cs101x/index.html']], 
   ['test', ['http://www.udacity.com/cs101x/index.html']], 
   ['page', ['http://www.udacity.com/cs101x/index.html']], 
   ['for', ['http://www.udacity.com/cs101x/index.html']], 
   ['learning', ['http://www.udacity.com/cs101x/index.html']], 
   ['to', ['http://www.udacity.com/cs101x/index.html', 'http://www.udacity.com/cs101x/crawling.html']], 
   ['crawl!', ['http://www.udacity.com/cs101x/index.html']]
   # ...

I went a little above the given task and improved the crawler with “clean-up html tags” functionality. So, I get the body part of document, strip out all HTML tags and then index the content. The latest version of crawler is in this gist.

We also looked on some Internet fundamentals as: bandwidth, latency, traceroutes and protocols.

I haven’t yet started any project on python except the crawler one. With implementing the of more complex applications I start to feel the lack of IDE with debugger. I currently use Sublime Text 2 + print statement as my IDE and debugger tool. It might be time to look for something better.

Everything is going fine so far, except the fact I’m being late for one week. The final exam is going to be posted at 27th of May and it will take one week to have a change to pass it. So, I’ve got a goal to complete 2 units through this week. The half of course is done!

A year with Git

May 20th, 2012

E-conomic, Git, GitHub, GitSVN

It’s almost a year ago, I’ve posted the article How to start using Git in SVN-based organization. It has been viewed more than 3,000 times on my blog and more than 9,000 times on re-post by DZONE.

Time has passed and some of you might be interesting, what happed next? A lot, actually.

I think we followed “Baby steps to..” strategy of Git adoption. Our baby steps to Git, were small, accurate and quite long. But at the end of the day, I’m happy to say - we are pure Git organization now. Moreover, some of our projects are hosted as public and private repositories under the e-conomic organization github account.

Retrospective

So, let’s make a kind of retrospective seeing what things were happening.

Git-SVN mode. As described in original article, the best way to try Git inside the organization is go to Git-SVN mode. It will allow to use SVN as primary repo, but allow Git features like local branching, cool merging, stashes and other stuff described here.
People awareness. Started by just few developers the information about the Git had spread along mates in our department. Some were very enthusiastic about that, some not. But anyway, it got attention of our many people including our CTO. The greatest thing is that initiative has not been cancelled, instead we start to think of some kind of plan that might bring us into pure Git world.
Planning. Our plan including different evaluations, choosing between Git or HG, local or cloud hosting, Centralized or De-centralized mode etc. During this planning sessions we also identified some infrastructure dependencies that blocked switch to Git. We have so called “Language System” the application helping our copyrighters and translators to change the content of app. It’s been creating assuming that SVN used, doing checkouts and commits where. Another thing is that our deployment procedure happened to be SVN dependent. Obviously, it have to be changed to work with Git. But the most priority had “Education” task. Everybody should be able to work with Git.
Execution. Planning is easy, execution is hard. We did initial education as a series of meetings were the basics are described. Fortunately, almost as teams contained the git-aware person who were initial knowledge keeper. Some important details being moved to company’s Wiki. The problems we start to have local infrastructure setup. Being Windows organization we tried to setup primary repository and server on Windows box. Keeping that short, I just say - don’t do that. It’s not trivial at all, to configure Git server there, setup the accounts and permissions and make it work with TeamCity. We tried different scenarios of Git hosting on IIS, including Bonobo-Git-Server or Git-Dot but all had it’s own limitations, blocking us of full-feature Git usage.
Trying github. In parallel, one team that was starting out new project and was quite independent tried to host sources in cloud. Github is obvious choice here. I think, that was a great experience and this project is still hosted on github. We tried, so called “organization” mode. It ideally fits small software development shops. Easy start, easy go.
Linux local server. Failed to run it properly on Windows we had to switch to Linux box. It’s being deployed as virtual box, that is more than enough for Git server. That solved a lot of infrastructural issues, including authentications & permissions as well as CI problems. As I can see the effort to setup it was not so big (if compare to effort spend to make the same on Windows, it would be closer to 0). Setup once, it just start to work.
Mirroring repositories. The setup of local central server was the first great milestone. Even if all the developers might start to use it instead of SVN, the infrastructural problems that I mentioned in #3 were not yet solved. So, we did a partial decision. The developers are switching to Git repository, but the deployments and language works are still done on SVN repository. So it’s not so frequent operations, it’s possible to synchronize the Git and SVN between each other. It means, all new code started to appear in Git, but once a week (or often) all changes set are being pushed to SVN. This is of course an overhead and required some manual work.. But it was only one way for us.
Fixing the dependencies. It took some time while everybody got comfortable with Git. Our Wiki has extended with some policies of working with Git. We mainly follow “A successful Git branching model” keeping the feature in local branches, having remote branches for code review etc. After our deployment dependency has been fixed, we were able to push the code to production very fast. I think it’s almost a year passed to now clearly understand benefit of Git not just theoretically, but practically. And that was amazing experience, as so for me.
Pure git world. After the last dependency has been fixed, we happily entered the “Pure Git Environment” world. The SVN server has been stopped. All developers, DevOps and copyrighters are Git users now.

It took a while…

Yeah, it took awhile but we never had this migration as top priority item in our backlog. The process were long and smooth, allowing us to do our primary job of bringing value to product, but the same time improving the internal infrastructure.

Not being hurry is also good option, sometimes. The whole transition took a lot of efforts of different person along the way. I say thank you, guys - for making this happening.

What’s next?

The things are stabilized now. We are on one solid solution, working very nice. As I said, we have both local and github environments. If I pretend to be a medium and see the future, I would say we go to “pure Github” environment from here. Sure, now this transaction having it’s own dependencies. But let’s see what’s happen during next year.

CS101 Building a Search Engine: Week 3

May 7th, 2012

CS101, SearchEngine, Udacity

Yesterday, I’ve concluded Unit 3 of CS101 Building Search Engine class. I had a little lag, since I’ve been to little vacation at the beginning of previous week, so got a chance to get back to class only Thursday. So, I still have one homework task in my to-do list.

It’s been an interesting unit, through it’s still very basic one. I’m little more confident with Python, getting powered by knowledge of collections, indexes etc. Again, I’m really pleased with language simplicity. Just few code snippets I like,

# creates generic list
        some_list = []
        
        # add something inside
        some_list.append(1)
        some_list.append('z')
        some_list.append([3,2,1])
        
        # iterate by for loop
        for e in some_list:
            pass
            
        # or with while
        while some_list:
            e = some_list.pop()
        
        # get index of element
        index = some_list.index(1)

I also started to familiarize with functional style of Python programming. You can find some good inputs here. Everything look very interesting so far.

This week we moved further with “real” implementation of web crawler. Instead of going by the set of quizzes I went my own path and created my implementation of simple crawler. So, what it does currently is go from ‘seed’ page and collect all links it’s able to find on target pages and related pages. I went a little far, since I made it run on real web requests, instead of test data that current unit supposes. If you are interested code could be found here.

Still I pretend as CS101 student trying to apply only knowledge I got through latest weeks. It’s great exercise I believe, showing some gaps in my education or concept understanding.

Homework was interesting as well. Anna Patterson was a starring guest for homework session. Together with Anna we tried to improve crawler with some real life requirements, like max_pages and max_depth parameter to prevent crawler to stay in indefinite loop. Anna is great expert in this field, so for each homework task I highly recommend to check the answer, a lot of interesting details there.

CS101 Building a Search Engine: Week 2

Apr 28th, 2012

CS101, SearchEngine, Udacity

This week I’ve concluded Unit 2 of CS101 Building Search Engine class. As a previous one it was very basic. We did went thought programming fundamentals on if statements, while loops, conditions and boolean operations. In contrast to Unit 1, I haven’t got any really new information for me. All the information given very nicely, preparing listener to do some more serious stuff.

I think I started to get used to Python a little bit. Unfortunately I do not practice it much now, so I have to find simple project that I could accomplish in Python, besides the search engine. As always doing code katas is very nice for introduction to any new language, so I can do that.

Homework was simple enough, but again as last time I got one problem that made me think some extra time. This is “median search” issue. Say, you are given 3 numbers - (1,2,3). Median is the one between bigger and smaller number, in this case it’s “2”. In (9,3,6) it’s “6” and in (7,8,7) it’s “7”. As a previous time I started with something I don’t suppose to know, like lists and sorting. Solving this problem just with knowledge I got so far more problematic. So, I spent some time on that.. and was really happy than I found simple and nice solution for that. You should try to solve that, pretending you know only procedures and conditional operations.

Now, I’m looking forward for Unit 3. It’s still basic, but there we suppose to create some simple crawler. I hope it will be fun!

JSON Model Binding to IDictionary<> is Broken

Apr 26th, 2012

MVC, asp.net

Yesterday, I’ve been creating small web service based on existing ASP.NET MVC infrastructure. The task was really simple. Web service itself should be just a proxy for existing internal API. The API method takes a Dictionary that contains some fields. So, I’ve created a data model like that.

public class Notification
{
 public int Id { get; set; }
 public string Recipient { get; set; }
 public IDictionary<string, string> Fields { get; set; }
}

and simple HttpPost handler, like

[HttpPost]
public ActionResult Send(string token, Notification notification)
{
    // ...
}

the payload posted to method is:

{"id":32,"recipient":"a@a.com","fields": { "EMAIL": "a@a.com"} }

I’ve tried to test the method, but the Fields property of model was always null. First I thought I got a problem somewhere in JSON payload, but after sometime I saw that everything is correct.

Google showed I’m not alone, so the issue been raised on SO. Darin Dimitrov responded that this is a bug of JsonValueProviderFactory. In the same time, some comments below contained the link for a bug reported, that was already stated as Fixed.

I forgot to mention that I did that stuff on ASP.NET MVC 2. I decided to try that on ASP.NET MVC 3, since I got the sources and if it works I can try to backport the fix into our MVC 2 infrastructure.

With my great disappointment it fails in exactly same way for ASP.NET MVC 3. That’s not funny anymore. I blamed JavaScriptSerializer, JSON serializer that used inside the JsonValueProviderFactory that it simply not able to handle Dictionaries right. I knew that ASP.NET Web API is using Newtonsoft JSON.NET framework, which is really powerful for serialization/deserialization of JSON.

So, I run VS 2011 and create test ApiController that receives the model with IDictionary inside. What do you think happen? Ok, the model is no longer null, but it contains Dictionary with count of elements equals to zero. Fail.

public void Post(Notification notification)
{
    // ...
}

I re-raised issue again, now on ASP.NET Web Stack site on Codeplex. I also tried to quickly write the unit test that show the existence of problem, but it’s not that easy to do that, so it requires some time. Hope I can do that later.

CS101 Building a Search Engine: Week 1

Apr 21st, 2012

CS101, SearchEngine, Stanford, Udacity

CS101 is fundamental course that supposes you have no background in programming at all. That’s why all lectures was very-very basic and sometimes I felt really bored. If you fell the same, that’s probably ok, since the most interesting stuff is about to start from Unit 3.

In the same time, for guys who has no programming skill’s that might be even a little tough. To be honest, I had really tough moment during my first homework, that should not be a problem for professional programmer, but I’ll describe it later.

Unit 1: Basics, Python, Numbers and Strings

What I understood from my entire career is that: backing to basics is always great. The years of enterprise development makes you strong in technologies and frameworks, but I managed to lost almost everything I got during my university days. Restoring that knowledge is very good brain exercise, constant repetition of basic is the way to mastery.

So, even that simple unit gave me a lot of things to remember, plus I learned some elementary of Python language.

Backus Naur Form

What was really interesting to me during Unit 1 is so called Backus-Naur Form, for describing the computer language grammar. This is a method of formalizing any (probably) computer language syntaxes. It has been invented by John Backus American scientist, how is famous as creator of FORTRAN and ALGOL computer languages, as well as his researches in functional programming.

Backus-Naur form is really simple and really powerful. It is described by the set of non-terminals and terminals. Each language expression is derived from BN form. Let’s take and example,

<sentence> ::= <subject> <verb> <object>
        <subject> ::= <noun>
        <object> ::= <noun>

Here is the primitive BNF for English language. Each sentence in English should contain Subject, Verb, Object to be complete and have a meaning. Of course, BNF is not suppose to describe natural languages as English or Russian, since it much more complex.. but it works very fine with computer languages, which are strict. So, everything in brackets are so called non-terminals, it simply means that expression could not be terminated (completed) based on them. To complete we need terminals. Each non-terminal is replaced by terminal till it’s done.

<sentence> ::= <subject> <verb> <object>
        <subject> ::= <noun>
        <object> ::= <noun>
        noun ::= I
        noun ::= Python
        noun ::= Cookies
        verb ::= Eat
        verb ::= Like

Now, the form is completed, so we can try to derive expression out of it. Let’s try that. So, we start from the top line

<sentence> ::= <subject> <verb> <object>

Derive all non-terminals from expression

<sentence> ::= <noun> <verb> <noun>

First “noun” is still non-terminal, so we proceed. Due to form, noun could be any of three (I, Python, Cookies) - so I can pick up any.

<sentence> ::= I <verb> <noun>

“I” is the terminal, so we process next non terminal which is verb. Verb could be any of (Eat, Like). I’ll take “Like”.

<sentence> ::= I Like <noun>

The last non-terminal is noun again. The same three options (I, Python, Cookies). Let’s pick “Python”.

<sentence> ::= I Like Python

All non-terminals are replaced with terminals, that means we derieved the expression completely. Based on that simple algorithm I can derive other expressions that would be valid for that form.

<sentence> ::= I Like Cookies    
        <sentence> ::= I Like I    
        <sentence> ::= I Eat Python    
        <sentence> ::= Python Like Cookies
        <sentence> ::= Python Eat I

As you can see, some of them are completely non-sense, but still they are totally valid expressions. If you are curious, you can find BNF’s for many know languages here.

Starting up Python

Start with Python programming language was one of my goals, dusted for quite long time on goals shelf. Hope that CS101 and further courses would be motivating enough to finally learn it. So, if you are like me - .NET, no Python background - don’t worry, that’s easy enough. Basically, all you need is Python interpreter and some text editor.

I really like Chocolatey for installing that stuff. Chocolatey is like NuGet package manager, but for software. I encourage you to try. So, instead of going to site, looking for latest version etc. I just opened my Power Shell command like and put:

cinst python

In 3 minutes, Python was on my machine.

The editor, you can pick up any you like. I prefer Sublime Text 2, it’s really cool. Again, you can install it by Chocolatey.

cinst sublimetext2

After that you are almost Python developer. Just need to learn the language.

Strings, Find in strings

The rest of Unit 1, was mostly string operations in Python. And I was surprised how easy Python syntax is. First, you don’t need to ‘mark’ varible declaration anyhow.. No types, no ‘var’ just the name and value.

s = "Hello World"

You can access each char inside the string just with [] operator.

s = "Hello World"
        print s[0]

It would print “H” char into console. It’s not really cool, what cool is - substrings by index and negative indexes.

print s[0:5] # -> Hello
        print s[:5]  # -> Hello
        print s[6:]  # -> World
        print s[:]   # -> Hello World
        print s[-1]  # -> d
        print s[:-5] # -> Hello

The find operation is very similar to what we have in C++ and C#. It, tried to find substring in string if it’s find, position returned or -1.

print s.find('World') # -> 6
        print s.find('o')  # -> 4
        print s.find('o', 5) # -> 7

The last thing in the unit was str() method, that able convert any number (integer or float) into string representation.

print str(3.14)  # -> 3.14
        print str(100)  # -> 100

That’s it. Based on that knowledge I suppose to complete my homework.

Homework

Again, as whole unit - there was really simple problems. The ‘real’ tough guy to me was very simple problem - “Rounding numbers”. So, you are given with float number and you have to return it’s integer representation. If numbers fraction is greater than 0.5, it should go ceil otherwise it goes floor. Not big deal I thought to my self and start to write code..

I spend around 10 minutes to create code like that:

number_as_string = str(x)
        dot_position = number_as_string.find('.')
        if dot_position != -1:
         integer_part = int(number_as_string[0:dot_position])
         decimal_part = int(number_as_string[dot_position + 1:])
         decimal_length = len(number_as_string[dot_position + 1:]) 
         dividor = pow(10, decimal_length - 1) * 5 

         if decimal_part >= dividor :
          integer_part += 1

         print integer_part

While writing that I had a bad feeling, that I’m using something that I’m not supposed to know, actually. The code worked and I submitted the solution. I received a response, that it actually giving right answer.. but, I was asked to create solution without if, int() or round().

Believe me or not, but I really frustrated on that task. I just didn’t understood how it’s possible to do not use any if here, but I have a condition inside the problem. I spend additional 10 minutes, starting to think it’s just impossible. It’s really funny, but indeed I thought it’s something strange and had a great temptation to go and check for correct answer. Fortunately, I got this this online discussion (each course has it’s discussion board, where student’s can share the info). It turns out I’m not alone, some professional programmers did the look like I did with if’s and calls to other functions.

Finally, I just tried to concentrate and really pretend to be a person how is just listen to that material first time, using the only knowledge I got in Unit 1. And solution came up to my mind! It was sooo easy, so I felt really ashamed for the code I wrote above. It’s 3 lines of code, using just str() and find() method, so simple.

That was definitely facepalm situation. But, it really encourage me to continue!

I Enrolled to Udacity ‘Build Search Engine’ Online Course

Apr 17th, 2012

SearchEngine, Stanford

I’m astonished how many opportunities we have now to learn and self-improve there days. One of the greatest things was announcement of online courses by Stanford. Stanford is the world class university with best professors and highest reputation. Then I first time heard that some of Stanford courses are online with videos, quizzes and materials - I thought to my self, - I would not miss the thing.

With a great support of Coursera and Udacity now we have great list of interesting courses, including programming, artificial intelligence, cryptography etc. Lead by well known specialists those courses are just price-less, nevertheless they are available for free.

I have enrolled for new course by Udacity - CS101: Building a Search Engine that has been started 16 Apr 2012, by David Evans and Sebastian Thrun. This should be interesting journey inside the web crawling, data mining, ranking etc. What is good for me it would not evolve hardcode-math and also does some good introduction into Python programming language. By the way, CS101 does not assume you know any computer language before or any special math knowledge. So, my assumptions that this course should not take hours of digging into the difference between o(n) and O(n) but rather has more practical aspects.

So, why I’m writing that?.. Millions of people are joining them, a lot of people already got successful records for several courses already. OK, let’s look in eyes of truth. I enrolled for several ones already (ml-class and saas). But neither I successfully completed. Due to my business (read as laziness) I quickly went out of schedule and it was to difficult to line up again. I don’t want it happen again.

I have some small goals for a next 7 weeks:

Start up learning new language
Lean something new in data mining and data processing
Improve my self learning discipline
Encourage myself for next online courses

I think CS101: Building a Search Engine is vey nice candidate, because:

As I said it should not include very complex math (that I already manage to forgot)
It has no strict deadlines for units
It is interesting enough to do not be bored in a middle (at least I hope so)

Moreover, I’ll be doing a weekly blog post (Saturday) about highlighting the things I learned through week. If you are interested, please jump in since that train is still not gone. I’m sure it will be great experience.

Let’s do that together!

← Older Blog Archives Newer →

Alexander Beletsky's development blog

My profession is engineering

CS101 Building a Search Engine: Week 5 and 6

The Limit of Keystrokes

Github for Windows - Yay!

What’s the point?

Go ahead and install it

Github for Windows client

Git Shell for command line

Conclusions

CS101 Building a Search Engine: Week 4

A year with Git

Retrospective

It took a while…

What’s next?

CS101 Building a Search Engine: Week 3

CS101 Building a Search Engine: Week 2

JSON Model Binding to IDictionary<> is Broken

CS101 Building a Search Engine: Week 1

Unit 1: Basics, Python, Numbers and Strings

Backus Naur Form

Starting up Python

Strings, Find in strings

Homework

I Enrolled to Udacity ‘Build Search Engine’ Online Course