Alexander Beletsky's development blog

My profession is engineering

Agileee 2010: Pavel Gabriel: Successful projects without testers

Disclaimer: text below is compilation of notes I made on Agileee 2010 conference, listening to different speakers. I do it to keep knowledge I got on conference, share it with my colleagues and anyone else who interested. It is only about how I heard, interpret, write down the original speech. It also includes my subjective opinion on some topics. So it could not 100% reflects author opinion and original ideas.

This speech has been done by Pavel Gabriel IT professional from Minks. Maybe it was influenced by Pavels not experienced to much for public speaking in English, but speech turns to be very generic, with a lot of obvious things and not really convincing.

The story

2 programmers 1 manager. Started without testers, after release user found a lot of bugs. The message from Boss was, that hiring a tester will help to improve problems. But Pavel has other opinion on that. Developers could solve quality problems by overselves.

So, Pavel initiated a process of improvements, for making accepted-quality products by developers only.

Right tools helps make things right

Team were using Ruby as it language and Ruby On Rails as their framework of choice. It is really influenced process, cause framework is done with TDD in mind, Ruby is dynamic language that makes it easy with unit testing, a lof already created solutions like Cucumber etc. But tools are not enough, there have to be something that change developers mind to care about quality, to be responsible for quality.

Responsibility

Most issues is the because no responsibility in their actions. If there are testers, developers stop to test the application. Whole team is responsible for quality, all and no one.

On Pavel’s opinion responsibility is gathered by Asking Questions, Code review, Demonstration.

Awareness

Team have to be aware of what is going on around. TDD/BDD is way of improvement of developer awareness. Retrospectives, to discuss progress and problems.

Doing a BDD with Cucumber make a developer really clear of what exactly is expected, what roles are involved.

Communication

Communication is very important, so it is always a goal to improve communication.

Question and Answers

QnA session turns to be a nightmare :). 90% were out of context of speech, like “What tools do you use?”, “How you test pages?”, “How you do security tests?” and so on. I’ve tried to ask a common question, like “You try to did improvements, you convinced your boss you don’t need testers, developer could do testing.. what was the results? Number of bugs increased/decreased, customers started to be happy?”. An answer was, “Customers still unhappy, but users do not find critical bugs, exact statistics does not exists”. OK, fair enough. The idea of “testing by developers” turned out to be - give developers right tools to make them happy to do testing (TDD/BDD, acceptance), make them like to do that, as soon as they are happy they will produce better results. Also, everything depends on professionalism. So, we went back to old saying by Kent Beck - “XP is a team of Responsible professionals”, if people is not professional and responsible nothing will helps.

I personally like the idea of - create software by developers only. But as for me, it could work good only for small product shops (there customers are product creators actually). It does not fit big organization and outsourcing companies, mainly to to reasons above: in such organization there is just very low percentage of really professional developers.

Agileee 2010: Mary Poppendieck: It is not about working software

Disclamer: text below is compilation of notes I made on Agileee 2010 conference, listening to different speakers. I do it to keep knowledge I got on conference, share it with my colleagues and anyone else who interested. It is only about how I heard, interpret, write down the original speech. It also includes my subjective opinion on some topics. So it could not 100% reflects author opinion and original ideas.

Mary’s speech was quite long with a lot of details and information and it was quite hard to me to understand everything she said. Moreover I was sitting to far from scene, so hardly could see anything on screen. It is definitely not complete. If anyone has a presentation file it would be great to share it. Anyway, I put here what I caught so far.

Strategic Inflection Point

Mary introduces the concept of “Strategic Inflection Point”, as some point on timeframe for every organization were strategic decisions are made. Such decisions are vital for further organization development and growth.

Strategic Inflection Point for Agile?

Agile has been created 10 years ago, it is quite old stuff.. and it is time to change!

The milestones of Agile could be described as:

  • version 1.0: processes and tools, comprehensive documentation, contract negotiation, follow plan
  • version 2.0: individuals and interactions, working software, customer collaboration
  • version 3.0: team vision and initiatives, validated learning, customer discovery, initiating change

Mary emphasis that now it is time for Management version 3.0!

Let’s try to review key points for version 3.0.

Team vision and initiatives

“There is nothing useless as efficiently that which should not be done at all” - Petter Ducrker.

Most products failures are caused by lack of customers.

Bring the team to customers, to understand that actually customer wants.

Mary introduces concept of MVP, Minimum Viable Product. It is something with minimal scope that could run to see actual customer demand - does it do the job? Will customers pay for it? What do we need to learn next? You have to measure and repeat. Experiment -> Learn -> Adjust!

And it is vital to make sure that you are building right think and that make sure, that it is build right!

Customer discovery

It is like ethnography - watch persons! what do they do, learn it! Try to understand what is happening. Ideation - do prototypes, do iteration.

“Brilliant Systems are result of matching of mental models between those developing a system and those who will be using the system.”

Initiation change

Just making a change happen. Every iteration is vital to make sure that customers see results and giving feedback on that. If you know that customer download your software every month, you’ll to make it stable! Make what customers wants, not what promised to executives.. the change is coming from team, by elaborating customers feedbacks.

Extra features are biggest waste in Software development! 45% of features is never used.. Write less code! Developers productivity - less code, less features.

Later in twitter I’ve seen a link to new Mary’s book regarding Management 3.0, that you might be interesting to.

Agileee 2010: Henrik Kniberg speech: The essence of Agile

Henrik Kniberg speech: The essence of Agile

Disclamer: text below is compilation of notes I made on Agileee 2010 conference, listening to different speakers. I do it to keep knowledge I got on conference, share it with my colleagues and anyone else who interested. It is only about how I heard, interpret, write down the original speech. It also includes my subjective opinion on some topics. So it could not 100% reflects author opinion and original ideas.

First session on Agileee 2010. Due to its name I thought it will be yet another lecture of Agile/Scrum etc. It was, but Henrik make it really interesting, detailed. I’ve heard a lot of new things there. So far it is my favorite speech on conf.

Beginning

I’m scarred now :) - the words Henrik started. But all his speech were clear and fluent.

Introduction

All current agile abbreviations make us to confuse. Scrum, Kanban, XP, TDD and so on. What is about, what problems it tries to solve?

Doing a software projects could be compared to shooting a goal by cannon ball. There are Goal and Cannon.. and only one shot. After shot we hope, we launch a target. That’s why we try to do all planning, risk management and all other “do before” stuff. But the issue is - target is moving with time, so we always miss.

Taking into account percentage of successful and failed projects through the last years. Henrik gives such figures:

Success of projects:
  1. Year 1994 - 15% of success
  2. Year 2004 - 34% of success

So, we definitely learned something thought 10 years. So, what was the problems, what we’ve learned so far?

Estimations

Different people, within different context, with slightly different input do different estimations. Henrik gives some really nice examples, than estimation double only by fact that spec is wrote in bigger number of pages, or includes some irrelevant details, or contains different items.

Let’s put it really simple - we are all could not do an estimations! We could actually, but our estimations sucks. A lot of project treated to be failed, as they haven’t been met original estimations.

But they are number of successful projects increases? Because, project gotten a lot smaller and (there is no silver bullet), but Agile is something that improved the situation. Why?

  • User involvement
  • Executive management support
  • Clear business objectives
  • Optimizing scope

Agile in nutshell

  1. Customer discover that they want
  2. Developers discover how to build
  3. Responding to change is vital(to track moving goal)

Agile approach to planning

So, if we are not able to do exact estimations and to do exact plans, whats our approach for planning should be?

Agile principles: early delivery. If we late with release we have to at least deliver most important features at planned time. Releases have to be short. Henrik gives nice metaphor: if we wait for train that goes every 5 minutes, it is not big deal for us if we miss it. We just wait for 5 minutes and go to train. It it goes every 30 minutes, it will annoy us but we still could wait. But let’s consider a ship that goes for another city once in week or once in month. We miss it, we are in trouble.

Having a releases to be short, we act like a 5 minutes train.

Scrum in nutshell

Scrum tries to split the things. Organization split to teams, product split to backlog items, time split to iterations. Deliver ever sprint is key factor.

Scrum defines different roles for people in team, they are:

  • Stakeholders: users, helpdesk, customers
  • Product Owner: one who define a vision and priorities
  • Team: group of people responsible how much to pull in, how to build
  • Scrum master: one who responsible for process leadership and coaching

Definition of done, plays important role in Scrum. It have to be clearly defined by team. What exacly have to be done, to treat some particular work is done (tested, merged, release noted and so on).

How to estimate? Don’t estimate time, estimate by people who do work, estimate continuously. Planning poker is a nice agile estimation technique.

Sprint planning, Daily scrum (make a process to be white box, not a black box), something to come out, Demo (looking into product), Retrospective (looking to process).

XP in nutshell

Scrum wraps XP. Scrum don’t know nothing about programming, XP does.. it introduce engineering practices. Scrum + XP is great combination (but is not really required).

XP contains a number of practicies, but most important are - Pair programming: short feedback loop (seconds). Unit tests: short feedback loop (minutes), Continuously integration: short feedback loop (hours).

Agile architecture

It might seem that Agile don’t care about architecture, since there are no value in architecture. It is partially true. Agile is goal oriented, oriented to do things quickly, to release quickly, to fail quickly.

Quick’n’dirty turns to be Slow’n’dirty. If you don’t care on architecture at all and do patch style of programming, you are in trouble after awhile. It would be just too difficult to extend and maintain the product.

On other hand, big upfront design works could make customers unhappy. To much spend on thinking about beautiful architecture with no actual output (remember that Working Software is major Agile metric of progress).

It have to be balanced. Architecture have to be simple, as simple architecture is as simple code is. Clean and simple code is very important. But code is not an asset, code is cost. Only some code is value. Simple code is easy to change.

What is definition of Simple code? Simple code is one that all of there requirements

  1. All tests passed
  2. No duplication
  3. Readable
  4. Minimal

Sure, such criteria’s as readability and minimalism is subjective, different people could have different opinion on that. But team have to share same vision.

Kanban in nutshell

Kanban is a way of visualization of work. Kanban - visual card. David Anderson, visualized of flow. See board, talk near board.

Board could consist of such sections: Backlog, Next, Dev (Ongoing, Done), Acceptance (Ongoing, Done), In production. It is great to see that Next could be empty. Analysis of board is a way of project management with Kanban. Don’t put to much, don’t pull to much. Printer jam - don’t put more paper. Theory of constraints is applied doing a projects with Kanban.

Comparing methodologies

The comparing have to be done only with one goal - undestanding. It is like compare knife to fork, it doesn’t make a lot of sense to compare tools, decide what tool is best for to solve problem.

Comparison by a number of rules:

  • More prescriptive: XP (13), RUP(120+)
  • More adaptive: Kanban (3), Do Whatever (0)
  • In a middle: Scrum (9)

Approach to use any new methodologies

Henrik gives a really nice explanation of how to approach a thing. It is Shuhari. It cames form Japaneses martial arts, I liked that a lot, since I’m fan of karate :).

  • Shu - follow the process (shut up and listen to master)
  • Ha - adapt the process, innovate the process
  • Ri - never mind the process, forget the process, create your process

Conclusions

At the end of session Henrik did several known (but usually forgotten) priciples to follow.

  • Use right tools - chose exactly what you need, it could be difficult, mistakes could happen here
  • Don’t be dogmatic - don’t treat any practice as dogma, because it is not dogma, reach Ri level of learning

Perfection is direction, not a place - don’t forget it!

Agileee 2010: Biggest Agile conference in East Europe

First of all I would like to say thank you to my company to be at this conference. It is really great to be part of such community, to see a lot of famous faces, to listen to a wise speeches.

I’ll share my overall impressions on this conference as well as some keynotes for sessions I attend.

Registration

Registration were fine and easy, no waiting lines, no crowd. Just gave my registration letter and got lunch tickets for both days as well as big bag. Then I looked in bag I found a lot of interesting stuff in it! There was a map of Kiev (I live in Kiev, but I believe that guests find it very useful), notepad, pens, some advs. What I really liked is notebook stickers by VersionOne (that I already put on my notebook :) and planning poker cards by GlobalLogic. Also, I was quite surprised to see that I represent Nigeria on Agileee 2010 :). So, organizators were in good humor.

Morning coffee

Since I was in really big hurry I was happy to see I still have time before Opening Session. All guests could enjoy morning coffee and tea, listening to some classic tunes played by a small orchestra. During the coffee it was great to see such guests as Mary Poppendieck, Allan Kelly and others.

Opening Session

Time to begin! Alexey Krivitsky and Natalia Trenina are major people on scene! Introduction speach were nice and clean, even it has been bit messed up with microphone issues that took place. Anyway, Agileee 2010 is opened, so it is time for speakers to do their work!

Sessions

I really enjoyed some sessions, some was boring as for me. Having a notebook on my knees I tried to do as much notes as possible, so I’m planning to do a small review of each session I visisited. So my channel is going to be a little noisy next several days, but please stay in tune :).

My little feedback on organization.

The organization is really great, keep doing that guys :). What I personally complain about are: bad quality of WiFi connection, and few plugs to recharge of notebook’s battery.

Regex to match a words in dictionary on page body

Using a Regex is pretty easy in .NET applications. All you have to use is Regex object and have basic understanding of regular expression patterns.

My goal was to create a code, that would give an answer: does this particular text contain some words from dictionary or not? Using a regular expressions is an obvious choice then you do such type of operation. So, I was trying to understand what technology is demanded by job offer (Cpp, Java or .NET) and is TDD skill demanded. To archive that I created a set of “matchers” small classes each of its own area. Crawler just used those matchers to get actual data.

    protected bool MatchToTdd(string description)
    {
      return new TddMatcher().Match(description);
    }

    protected bool MatchToJava(string desciption)
    {
      return new JavaMatcher().Match(desciption);
    }

    protected bool MatchToCpp(string desciption)
    {
      return new CppMatcher().Match(desciption);
    }

    protected bool MatchToDotNet(string desciption)
    {
      return new DotNetMatcher().Match(desciption);
    }

* This source code was highlighted with Source Code Highlighter.

As you see, I have 4 matchers to cover my requirements: CppMatcher, DotNetMatcher, JavaMatcher, TddMatcher. All of them implements simple IMatcher interface.

namespace Crawler.Core.Matchers
{
  public interface IMatcher
  {
    bool Match(string input);
  }
}

* This source code was highlighted with Source Code Highlighter.

Now, let’s review the matcher. Because all the matchers do basically the same operations and differ only but its dictionary contents, they contain a dictionary of target words and delegates matching functionality to MatchUtil class. Let’s see C++ matcher for instance.

namespace Crawler.Core.Matchers
{
  public class CppMatcher : IMatcher
  {
    private static IList<string> _patterns = new List<string>()
      {
        "c\\+\\+",
        "cpp",
        "stl",
        "cppunit"
      };

    public bool Match(string input)
    {
      return MatchUtil.Match(input, _patterns);
    }
  }
}


* This source code was highlighted with Source Code Highlighter.

I wanted to design MatchUtil.Match to be universal, as much as possible and to do not depend on kind of input words. Matching words with boundaries “\b” works perfecly, as soon as you have a simple words, like ‘java’, ‘nunit’, ‘tests’ and so on, but my tests stated to fail as soon as I tried ‘c++’ or ‘.net’. Because of ‘\b’ matches boudary between 2 alphanumeric symbols, in my case ‘+’ or ‘.’ is not alphanumeric. That made a problem to me and asked StackOverflow for help. I finished up with such implementation, that I hope could be useful if you do similar stuff.

namespace Crawler.Core.Matchers
{
  class MatchUtil
  {
    public static bool Match(string input, IList<string> patterns)
    {
      var lower = input.ToLower();
      foreach (var pattern in patterns)
      {
        var start = pattern.StartsWith("\\.") ? "(?!\\w)" : "\\b";
        if (Regex.IsMatch(lower, start + pattern + "(?!\\w)"))
        {
          return true;
        }
      }
      return false;
    }
  }
}

* This source code was highlighted with Source Code Highlighter.

So, Regex.IsMatch static method is used to perform match.

This is it. If you see some issues or improvements, please let me know. http://github.com/alexbeletsky/TddDemand

Crawling a web sites with HtmlAgilityPack

Introduction

This is a first post of small series that I’m going to describe implementation and design of Crawler, that I've done recently for TDD demand analisys. I would split it up into several parts, covering its major architectural parts.

  • Part 1 - Crawling a web sites with HtmlAgilityPack
  • Part 2 - Regex to match a words in dictionary on page body
  • Part 3 - EF4 Code First approach to store data


For references, you could use a source code - http://github.com/alexbeletsky/tdd.demand

Warning it’s quite long post, cause contain code examples, if you understand basic ideas I put here, best way it to go directly to repository and see the code, as best explanation material

Using HtmlAgilityPack

HtmlAgilityPack is one of the great open sources projects I ever worked with. It is a HTML parser for .NET applications, works with great performance, supports malformed HTML. I successfully used in one of the projects and really liked it. It contains very few documentation, but it designed so well that you can get basic understanding just by looking to Visual Studio Object Browser.

So, then you need to deal with HTML in .NET - HtmlAgilityPack is a definitely framework of choice.

I’ve downloaded latest version and were very pleased that now it supports Linq to Objects. That makes usage of HtmlAgilityPack more simple and fun. I’ll give you just a simple idea how it works. Task of every crawler is to extract some information from particular html page. Say, we need to get inner text from div element with class “required”. We have a 2 options here, classical one, using XPATH and brand new, using Linq to Objects.

XPATH approach

public string GetInnerTestWithXpath() {   var document = new HtmlDocument();   document.Load(new FileStream("test.html", FileMode.Open));   var node = document.DocumentNode.SelectSingleNode(@"//div[@class=""required""]");   return node.InnerText; } * This source code was highlighted with Source Code Highlighter.

Linq to Objects approach

public string GetInnerTextWithLinq() {   var document = new HtmlDocument();   document.Load(new FileStream("test.html", FileMode.Open));   var node = document.DocumentNode.Descendants("div").Where(     d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("required")).SingleOrDefault();   return node.InnerText; } * This source code was highlighted with Source Code Highlighter.

As I personally like Linq to Objects approach, sometimes XPATH is more convenient and elegant (especially in cases you refer to page elements with out ids or special attributes).

Loading pages using WebRequest

In previous example I loaded page content from file, located on disk. Now, our goal is to load pages by URL using HTTP. .NET framework has a special WebRequest. I’ve created a separate class HtmlDocumentLoader (that implements IHtmlDocumentLoader interface) that all the details inside.

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Net; using System.Threading; namespace Crawler.Core.Model {   public class HtmlDocumentLoader : IHtmlDocumentLoader   {     private WebRequest CreateRequest(string url)     {       var request = (HttpWebRequest)WebRequest.Create(url);       request.Timeout = 5000;       request.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";       return request;     }     public HtmlAgilityPack.HtmlDocument LoadDocument(string url)     {       var document = new HtmlAgilityPack.HtmlDocument();       try       {         using (var responseStream = CreateRequest(url).GetResponse().GetResponseStream())         {           document.Load(responseStream, Encoding.UTF8);         }       }       catch(Exception )       {         //just do a second try         Thread.Sleep(1000);         using (var responseStream = CreateRequest(url).GetResponse().GetResponseStream())         {           document.Load(responseStream, Encoding.UTF8);         }       }       return document;     }   } } * This source code was highlighted with Source Code Highlighter.

Several comments here. First, You can see that we load UserAgent property of WebRequest. We are making our request look that same as it would be a Firefox web browser. Some web servers could prevent web requests from “unknown" agents, so this is kind of preventive action. Second, is how document object is being intialized.. as you might see we have a try/catch block here and just repeat the same initialization steps in catch block. It might happen that web server fails to process requirest (due to different reasons), so WebRequest object will throw and exception. We just wait for one second and retry it. I’ve noticed that such simple approach could really improve robustness of crawler.

Generic Crawler

So, now we know how to load HTML documents by using of WebRequest, specifying document URL, also we know how to use HtmlAgilityPack to extract data from a document. Now, we have to create an engine, that would automatically go through the document, extract the links for next portion of data, process data and store it. That is something that is called crawler.

As I implemented and tested several crawlers, I’ve seen that all off them have the same structure and operations and differs only in particular details of how data is extracted from pages. So, I came up with a generic crawler, implemented as abstract class. If you need to build next crawler you just inherit generic crawler and implement all abstract operations. Let’s see the heart of crawler, StartCrawling() method.

    protected virtual void StartCrawling()     {       Logger.Log(BaseUrl + " crawler started...");       CleanUp();       for (var nextPage = 1; ; nextPage++)       {         var url = CreateNextUrl(nextPage);         var document = Loader.LoadDocument(url);         Logger.Log("processing page: [" + nextPage.ToString() + "] with url: " + url);         var rows = GetJobRows(document);         var rowsCount = rows.Count();         Logger.Log("extracted " + rowsCount + " vacations on page");         if (rowsCount == 0)         {           Logger.Log("no more vacancies to process, breaking main loop");           break;         }         Logger.Log("starting to process all vacancies");         foreach (var row in rows)         {           Logger.Log("starting processing div, extracting vacancy href...");           var vacancyUrl = GetVacancyUrl(row);           if (vacancyUrl == null)           {             Logger.Log("FAILED to extract vacancy href, not stopped, proceed with next one");             continue;           }           Logger.Log("started to process vacancy with url: " + vacancyUrl);           var vacancyBody = GetVacancyBody(Loader.LoadDocument(vacancyUrl));           if (vacancyBody == null)           {             Logger.Log("FAILED to extract vacancy body, not stopped, proceed with next one");             continue;           }           var position = GetPosition(row);           var company = GetCompany(row);           var technology = GetTechnology(position, vacancyBody);           var demand = GetDemand(vacancyBody);           var record = new TddDemandRecord()           {             Site = BaseUrl,             Company = company,             Position = position,             Technology = technology,             Demand = demand,             Url = vacancyUrl           };           Logger.Log("new record has been created and initialized");           Repository.Add(record);           Repository.SaveChanges();           Logger.Log("record has been successfully stored to database.");           Logger.Log("finished to process vacancy");         }         Logger.Log("finished to process page");       }       Logger.Log(BaseUrl + " crawler has successfully finished");     } * This source code was highlighted with Source Code Highlighter.

It uses abstract fields of Loader, Logger and Repository. We have already reviewed Loader functionality, Logger is simple interface with Log method (I’ve created one implementaion to put log messages to console, that is enough to me) and Repository that we will review next time.

GetTechnology, GetDemand methods are the same for all crawlers, so they are part of generic crawler, rest of operations are “site-dependent”, so each crawler overrides its behavior.

    protected abstract IEnumerable<HtmlAgilityPack.HtmlNode> GetJobRows(HtmlAgilityPack.HtmlDocument document);     protected abstract string CreateNextUrl(int nextPage);     protected abstract string GetVacancyUrl(HtmlAgilityPack.HtmlNode row);     protected abstract string GetVacancyBody(HtmlAgilityPack.HtmlDocument htmlDocument);     protected abstract string GetPosition(HtmlAgilityPack.HtmlNode row);     protected abstract string GetCompany(HtmlAgilityPack.HtmlNode row); * This source code was highlighted with Source Code Highlighter.

Here, we’ll review one of the crawlers and how it implements all methods required by CrawlerImpl class.

namespace Crawler.Core.Crawlers {   public class RabotaUaCrawler : CrawlerImpl, ICrawler   {     private string _baseUrl = @"http://rabota.ua";     private string _searchBaseUrl = @"http://rabota.ua/jobsearch/vacancy_list?rubricIds=8,9&keyWords=&parentId=1";     public RabotaUaCrawler(ILogger logger)     {       Logger = logger;     }     public void Crawle(IHtmlDocumentLoader loader, ICrawlerRepository context)     {       Loader = loader;       Repository = context;       StartCrawling();     }     protected override string BaseUrl     {       get { return _baseUrl; }     }     protected override string SearchBaseUrl     {       get { return _searchBaseUrl; }     }     protected override IEnumerable<HtmlAgilityPack.HtmlNode> GetJobRows(HtmlAgilityPack.HtmlDocument document)     {       var vacancyDivs = document.DocumentNode.Descendants("div")         .Where(d =>           d.Attributes.Contains("class") &&           d.Attributes["class"].Value.Contains("vacancyitem"));       return vacancyDivs;     }     protected override string GetVacancyUrl(HtmlAgilityPack.HtmlNode div)     {       var vacancyHref = div.Descendants("a").Where(         d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("vacancyDescription"))         .Select(d => d.Attributes["href"].Value).SingleOrDefault();       return BaseUrl + vacancyHref;     }     private static string GetVacancyHref(HtmlAgilityPack.HtmlNode div)     {       var vacancyHref = div.Descendants("a").Where(         d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("vacancyDescription"))         .Select(d => d.Attributes["href"].Value).SingleOrDefault();       return vacancyHref;     }     protected override string CreateNextUrl(int nextPage)     {       return SearchBaseUrl + "&pg=" + nextPage;     }     protected override string GetVacancyBody(HtmlAgilityPack.HtmlDocument vacancyPage)     {       if (vacancyPage == null)       {         //TODO: log event here and skip this page         return null;       }       var description = vacancyPage.DocumentNode.Descendants("div")         .Where(           d => d.Attributes.Contains("id") && d.Attributes["id"].Value.Contains("ctl00_centerZone_vcVwPopup_pnlBody"))         .Select(d => d.InnerHtml).SingleOrDefault();       return description;     }     protected override string GetPosition(HtmlAgilityPack.HtmlNode div)     {       return div.Descendants("a").Where(         d => d.Attributes.Contains("class") &&         d.Attributes["class"].Value.Contains("vacancyName") || d.Attributes["class"].Value.Contains("jqKeywordHighlight")         ).Select(d => d.InnerText).First();     }     protected override string GetCompany(HtmlAgilityPack.HtmlNode div)     {       return div.Descendants("div").Where(         d => d.Attributes.Contains("class") &&         d.Attributes["class"].Value.Contains("companyName")).Select(d => d.FirstChild.InnerText).First();     }   } } * This source code was highlighted with Source Code Highlighter.

To make a picture complete, just review implementation of the rest of crawlers- http://github.com/alexbeletsky/tdd.demand/tree/master/src/Crawler/Core/Crawlers/

Conclusions

You might see that implementation of simple crawler as a simple thing as soon as you got good tools for that. Of cause, the functionality of it as very specific and limited, but I hope it could give you ideas for your own crawlers.

In next blog post I’ll cover a topic of usage Regex in .NET and brand-new-cool-looking Entity Framework 4 Code First approach to work with databases.

Update - Is TDD skill actually required by employers? - with data from StackOverflow

TDD

This is a follow up for my last blog post, that I showed some data gathered by crawler, to check out how much TDD skill is valuable for development shops, how much do they ask for it in offers? As you remember I was satisfied with a quality of data provided by prgjobs.com.

Today, I was reading a blog post from a Coding Horrow and understood that I missed one good source of information, that is StackOverflow Careers

Due to the latest architectural changes I’ve made to Crawler and well structure of Careers, it took about hour to create new crawler and test it. Now, I’m ready to share the data.

Careers.StackOverflow results

Here we go - 212 vacancies has been extracted from this site. 49 of them were requesting TDD (23%, not so bad).

Technologies breakdown,

Conclusions

For sure, it is more correct data. We can see that results are really close the one we’ve got for Ukrainian market, by analysis of rabotaua site. It also make it possible to make some generalization of results.

We could say that ~20% of employers are demanding on TDD skill. Rest of employers either do not mention it in applications or do not care about such skill at all.

Ciprian Mustiata gave nice point in comments for previous post, that such demand on TDD could be reasonable for countries like Ukraine, where major market is for maintenance of existing code base (typically legacy code, no tests). But we see similar figures for USA, country where a lot of brand new product born.

That’s another piece of information to think about. What’s your opinion on that?

Is TDD skill actually required by employers?

TDD

Is TDD popular among developers? Do managers knows about benefits of TDD? Are employers really looking for TDD skilled people?

I was thinking about such kind of questions and decided to perform my initial research. My research was quite simple, I wanted to check popular job looking sites and review latest job offers, especially for “skills” sections. How many of employers actually seek for developers who know/use/love TDD. Since I’m geek I would not do it manually, so I’ve written an application for that. Crawler that could get data from job looking sites and store it to DB for further analisys. I already got the data and would like to share it in this post.

How it works?

Like any other crawler it has one big cycle that makes a web requests to site, gets response, extracts the links and data from response, stores data somewhere. It proceeds as soon as relevant data is present on pages. Vacancy crawler does a search request, extracts links for all vacancy pages. As soon as link extracted, it does request to vacancy page. It analyzes the body of vacancy description by very simple method: searching for a keywords it text. So, to detect is TDD skill required or not, crawler try to match some words from vocabulary. Similar approach used to understand what technology skills (.NET, Java, C++) is required. At final it creates a record contains site name, position, technology used and TDD demand flag and stores data to database.

Source of information

I’ve taken two sites as source of information. First, RabotaUa (Ukrainian one, Ukraine is one of big players of IT outsourcing in Europe, so data will be really relevant). Second one, I wanted to pick up from USA, but it was difficult to find it, since I’m not aware of its popularity, reputation and so on. I even asked question of StackOverflow, but my question was closed. I choose JobsForProgrammers as one of google suggested.

RabotaUa results

I’ve extracted 978 records from RabotaUa. It is latest, actual, recently posted job offers. 150 of 987 vacancies contained requirements for TDD skills.

How many of TDD skills required per technology?

PrgJobs results

I’ve extracted 1000 records from JobsForProgrammers. Crawler could proceed more, but I’ve noticed that on latest pages, site contains not really relevant data, not developers jobs and offers with short and not always adequate description. So, I still consider to crawle some other USA site for data. Anyway, here are results. Only 69 of 1000 requires TDD, 7%!

Technologies breakdown, also a bit strange. Match for technology were difficult, for several reasons. Job description headline, usually contained to generic description (as Software Developer, Web developer and so on), job description body contained multiskills requirements (like C++/Perl, or C#/Java, VB.NET/Java) that current version of crawler could not handle properly.

Conclusions

To be honest, I would not expect such data. I thought 40-60% should ask for TDD, but we see that it is less than 16% from Ukranian data source and less than 7% for USA. For me, as TDD follower this is really disappointing results. I realize that such results are very simple and rough, could not be used for some real life analytics, but it gives a vision, for sure.

Also I plan to do several technical blogs with details of implementation of Crawler, I’ve created for this report.

Please let me know, what you think about such results, what further improvements could be done for more fine results, what other data sources could be used?

Update

Subtext: Open source blogging engine project

One of my previous posts I mentioned a Subtext as open source project that I keep eye, recently. Originally created by Phil Haack, one of the authors of ASP.net MVC framework. I though about contribution for some of open source project for long time, so first time I saw Subtext I realized that it could be good one to try.

Project is hosted on Google Code, using SVN as source control system. So, it is easy to get read access to Subtext repository. Currently Subtext is actively developed by Simone, managed by Phil. It is still supported by community, so everyone is able to submit a patch.

What I liked about Subtext itself:

  • Easy to use. Clear installation procedure, clear interfaces. Easy, because of simplicity.
  • Proven by time. It is already 2.6 release of Subtext now. Quite mature, developed about 5 years years.
  • Used in community. Many bloggers hosts their blogs on Subtext.

But of cause I was attracted mostly by its code. I really liked how Subtext solution done and try to extract some good practices and approaches to my personal knowledge base. Code quality is high, it is clearly seen what architecture approaches did author used, how it is breakdown thought components and layers. I was happy to see a lot of unit tests created.

I’m hacking Subtext now. I try to understand how it works, what technologies used, what issues exists. I like how it goes, because I feel “follow the master” concept, during work on Subtext. So, it is one of my first experience of working on open source projects, I would like to describe, what my contribution is:

  • Find new bugs. Yeap, I do little tester job here. I click through the application and submit new issues to tracker.
  • Little fixes. As I found some problem and it is quite clear, I submit a patch for it.
  • Verification of fixes. I try to look through latest fixes made and verify them.
  • Feature request proposal. As I see something lack, it is possible to do a feature request. Sure, as soon as it is accepted, you are free to submit a patch with implementation.

I like how it goes. I’ve already submitted several patches, hope it is not the end. Unfortunately, I could not spend as much time as I want.. but at least. I hope it will be a good experience, both for me and Subtext.

JavaScript, HTML and CSS.. I need it!

I’ve never been working close to UI during my career, neither as Win32 developer nor as Web developer. I was focused on business logic and architecture rather than user interfaces, moreover developing UI was not the best things that I liked to do.

In my previous job I met javascript first time. It wasn’t in context of web application, but rather a custom logic for desktop software. It was very unusual to me, since I’ve had few experience with dynamic languages. But as I met javascript closer, more and more and like it. I find dynamic languages are more flexible to describe objects. With no static check you could not rely on messages from complier, if you have a mistake on you code, it is only runtime than you catch it, so it forces you to do unit tests. Unit testing in dynamic languages is more simple and easy to use. You can do any type of mocking, construct a tests objects in run time, easy emulate user interface events.

As soon as you good C# developer, understand ASP.net/MVC, RDBMS, enterprise applications design, your values is less, than you lack a HTML/CSS when you do web. I’ve never treated HTML/CSS as important skill, thinking as I needed I could learn it in 1 day. I was wrong, it not so easy as I thought. Of cause, they are declarative, with no logic, conditions, runtime, but they are languages and requires respect. Sure, it is not so difficult to start with some HTML/CSS than with C#/ASP.net, but as always - experience matters.

Some years ago, I’ve started to play a guitar a bit. Tuning of guitar was the greatest problem I had, before I’ve been presented with tuner device. With a little bit experience I was able to tune slightly untuned guitar.. so if some string is close to tune, I could adjust it. But if it is totally untuned, I just could not catch it up. I’m having the same issue with HTML/CSS and JavaScript now. I could do fixes, create something simple, add something to existing application. So, I could adjust. Problems begins then I try to do something from scratch. I spend to much time, looking for examples and tutorials to do really simple things (especially with CSS). I could not tune.

I want to improve my knowledge in this area, now I understand that I really need it.

I want to ask you today, what resources, books, online tutorials you use and recommend for HTML/CSS/JavaScript area. What was you experience of education it it?