Friday, January 25, 2013

A simple way for data cleaning in VBA Excel

In data analysis data cleaning is the act of detecting and either removing or correcting inaccurate records from a record set. In case data is fetched from a Data Base Relational Systems, we're talking about incorrect or inaccurate records from a table.  For instance, in Excel 2007+ you can fetch data from a DBMS such as SQL Server in the Get External Data group. The following step is removing or correcting inaccurate records. A typical way to do it is scanning the Excel data sheet following from the top and from left to rigth, processing only columns storing data validity information. Hence, you need a VBA function such as the following:

clean("FromSheet", "ToSheet", CellCondition, Condition, True)

i.e. a VBA function with a signature like

Function clean(FromSheet As String, ToSheet As String, CellCondition As Variant, Condition As Variant, Caption As Boolean) As Long   


A typical safe way to implement such a action is copying in a new sheet the clean data. That works especially in case you refresh data sheet periodically from an external data source. Here's the VBA code. 


    Set wsI = Sheets(FromSheet)
    Set wsO = Sheets(ToSheet)

    LastRow = wsI.Range("A" & Rows.Count).End(xlUp).Row
   
    j = 1
    With wsI
        For i = 1 To LastRow
           
            ok = False
             For N = LBound(CellCondition) To UBound(CellCondition)
                 If Trim(.Range(CellCondition(N) & i).Value) = Condition(N) Then
                     ok = True
                 End If
             Next N
       
            If Caption And i = 1 Then
                ok = True
            End If
           
            If ok Then
                wsI.Rows(i).Copy wsO.Rows(j)
                j = j + 1
            End If
           
        Next i
    End With



Monday, January 21, 2013

Trends: Italy / USA / UK / World





Data from World Bank



Data from World Bank



Data from World Bank



Data from World Bank



Data from World Bank



Data from World Bank

Friday, January 18, 2013

Why yet another blog about conputing and after so many years?

It was April 2005 when I started this blog. I was a Java-Oracle developer and I was at the begin of my career as team leader at virgilio.it, at that time # 1 Italian web portal. I was amazed by what at the time was an incoming revolution,  the Web 2.0. I read and read again the article by Tim O'Really about Web 2.0. Starting a blog or just trying to start it seemed a mandatory step. The following mandatory step was becoming the fonder of an open source project. So, Pippoproxy was born, a 100 percent pure Java HTTP proxy designed/implemented for Tomcat that can be used instead of standard Apache-Tomcat solutions.

It was before my MBA and my incursion in the private equity arena where I must confess I lost a bit the touch for technology and the attraction for SEXY TECHNOLOGY. I started to find sexy discounted cash flows Excel models or amazing PowerPoint presentations aimed to convince investors to put money on some fund or listed company. Again, the more the time passed the more I was convinced that nothing new was under the sun. Java, PHP, Apache projects ... the same stuff again and again... 

Now I know I was wrong. Exactly at that time Hadoop was born as well as other other innovative open source projects. A new revolution, nowadays known with the buzzword Big Data, was born. Now I feel as excited as at that time. The same excitement of when I discovered a hack ... the same excitement of when I was child and I realized a program to predict football matches with my mythical Commodore Vic 20.  Just for fun! 

In the next posts I'm going to analyze tools, open source projects, algorithms, statistical methods, products and I'll give them a 1-5 score. No strict methodology, no committees, just personal judgment. Just for fun!

Thursday, January 17, 2013

If Facebook new Graph Search is your Personal "Big Data" why Facebook's shares were flat at $30.10 in early trading on Wednesday?


Last Tuesday Facebook announced a new way to "navigate connections and make them more useful": Graph Search (beta version).

Graph Search will allow users to ask real time questions to find friends and information within the Facebook universe. Searches like “find friends of friends who live in New York and went to Stanford” would come back with anyone who fit the bill, provided that information had been cleared to share by the users.

Graph Search will appear as a bigger search bar at the top of each page. When you search for something, that search not only determines the set of results you get, but also serves as a title for the page. You can edit the title - and in doing so create your own custom view of the content you and your friends have shared on Facebook.

The first version of Graph Search focuses on four main areas -- people, photos, places, and interests.
  • People: "friends who live in my city," "people from my hometown who like hiking," "friends of friends who have been to Yosemite National Park," "software engineers who live in San Francisco and like skiing," "people who like things I like," "people who like tennis and live nearby"
  • Photos: "photos I like," "photos of my family," "photos of my friends before 1999," "photos of my friends taken in New York," "photos of the Eiffel Tower"
  • Places: "restaurants in San Francisco," "cities visited by my family," "Indian restaurants liked by my friends from India," "tourist attractions in Italy visited by my friends," "restaurants in New York liked by chefs," "countries my friends have visited"
  • Interests: "music my friends like," "movies liked by people who like movies I like," "languages my friends speak," "strategy games played by friends of my friends," "movies liked by people who are film directors," "books read by CEOs"
Forbes talks about your Personal "Big Data".

Differences with web search

Graph Search and web search are very different. Web search is designed to take a set of keywords (for example: "hip hop") and provide the best possible results that match those keywords. With Graph Search you combine phrases (for example: "my friends in New York who like Jay-Z") to get that set of people, places, photos or other content that's been shared on Facebook. We believe they have very different uses.

Another big difference from web search is that every piece of content on Facebook has its own audience, and most content isn't public. We've built Graph Search from the start with privacy in mind, and it respects the privacy and audience of each piece of content on Facebook. It makes finding new things much easier, but you can only see what you could already view elsewhere on Facebook.

Lack of a timeline for the possible launch of graph search on mobile devices + lacks the depth of review content = NO GOOGLE KILLER?


BofA Merrill Lynch analysts estimated Facebook could add $500 million in annual revenue if it can generate just one paid click per user per year, and raised its price target on the stock by $4 to $35.

Facebook's shares were flat at $30.10 in early trading on Wednesday. They have jumped about 50 percent since November to Tuesday's close after months of weakness following its bungled Nasdaq listing in May.

However, analysts at J.P. Morgan Securities said the lack of a timeline for the possible launch of graph search on mobile devices may weigh on the tool's prospects.

The success of the graph search, which will rely heavily on local information, depends on Facebook launching a mobile product, the analysts said. Half of all searches on mobile devices seek local information, according to Google.

Graph search also lacks the depth of review content of Yelp Inc, the analysts added.

Pivotal Research Group analyst Brian Wieser said monetization potential would be largely determined by Facebook's ability to generate a significant portion of search query share volumes and he expects that quantity to be relatively low.

"Consumers are likely to continue prioritizing other sources, i.e. Google. Advertisers would consequently only use search if they can, or are perceived to, satisfy their goals efficiently with Facebook," Wieser said.

NO GOOGLE KILLER

Analysts mostly agreed that Facebook's search tool was unlikely to challenge Google's dominance in web search at least in the near term.