Friday 30 May 2014

Google and the problem of personal information

Before I retired I was particularly interested in the problems of data protection and on-line communications long before the world wide web came into existence. In the 1980s I wrote a paper for a scientific conference about the problems of reconciling the Data Protection Act with the online task of organizing an international conference involving committee members scattered around the world - and I was delighted when it was republished by a legal journal.

The recent Court ruling that Google (and other search engines) should suppress links to "personal information" looks totally unworkable because in free text documents, such as old newspaper reports, drawing a clear distinction as to which links to suppress and which links, to the same news story, not to suppress is clearly unworkable. On my other blog/web site I am frequently researching local and family history in old newspapers and on Google (admittedly too old to be affected by this ruling) and know a lot of trips for digging out hard to find information. With this in mind I posted the following on the British Computer Societry Web site as a comment on the news about the proposed "censorship".
As someone who retired from IT many years ago and now is active doing historical research I find the idea of manipulating historical records, and the tools for accessing them very worrying and I can't seen how the approach the court has proposed could possibly work. 
I could think of hundreds of different scenarios - but I will mention just one. A footballer does not want anyone to know he might have been involved in match fixing and the record shows that he played in a game in which he played. There are a number of levels - the protesting player was (1) convicted of fixing (2) appeared in court but found not guilty (3) was named in the evidence given in court but not charged (4) named as a player in a game known to be fixed (5) not named in the report of fixing, but the game is identified as being fixed - and other reports name him as a member of the team. 
A search engine like Google will provide many pathways to find details to the match, the fact that it was fixed, and the players. Let us assume that  a request was made to remove all links which might indentify a individual because the facts about him were "time expired" personal information. As a historian (and I am sure investigative journalists are at least as good as me) I know lots of tricks for search free text documents. But how could Google realistically hide this information without preventing access to other information relating to the game and other players. For instance should all the members of the team be indexed apart from the complaining member? Should Google images suppress a picture of the whole team "involved in the fixing scandal" or should they have to block out the clearly recognisable face of the complainant. 
If I was particularly researching the fixing of football matches and I came across a case where some of the player's name were not findable using Google I would immediately be alerted to the fact that the unindexed players had something to hide ... which could encourage me to look specifically for what the complainant was trying to hide. 
Information held as free text, such as old newspaper reports, beings lots of problems - for instance Google does not distinguish whether "Tottenham" is the name of a person, a football team, or a place - and most texts will refer to many different people, events, etc., (some of which must continue to be indexed) and not just the subset of news complained of. 

Wednesday 21 May 2014

Two Limericks for the price of One!

Bow
Bow
I have somewhat neglected this blog recently - in part because of some problems with my asthma - but I still have had fun writing limericks for an online competition.  My latest entry (slightly modified here to remove an explicit reference to the competition organizer) is given below. 

It  takes the form of two different limericks which - in writing - are identical apart from the last two syllables. In order to be able to read it you need to look at the end of the final line to get the right pronunciation for the ends of lines 1 through 4, noting that the limerick only makes sense if lines 3 and 4 do not rhyme with lines 1, 2 and 5.

The example's first line ends in "bow"
And the poet had rhymed it with "row"
'Cause if he rhymed "row"
Or misused the word "sow"
The judge would reject it and how.
The example's first line ends in "bow"
And the poet had rhymed it with "row"
'Cause if he rhymed "row"
Or misused the word "sow"
The judge would reject it also.