Wednesday, December 2, 2009
Declaration Logo
My new logo for a declaration in the gcc compiler introspector. The megaphone symbolizes the meaning of a declaration, it is a way to announce the availability of something in a program. It is the most basic form of data that the compiler deals with. Even if you write 1+1 you are using a statement declaration that cont...ains an expression with constants and an operator. All these parts of a statement from the user are stored in declarations. Then from the decls, a program has functions and datatypes defined for the layout of memory. These formats of data and behaviour are just templates or prototypes of things to be executed. Memes you could say, things that are copied from program instance to instance.
Tuesday, December 1, 2009
Catalyst JQuery JSTree RDF
In my lastest revision, on github,
http://github.com/h4ck3rm1k3/LIbRdfCatalyst
you will find the new version of my browser for the gcc data in rdf format.
For example, if you load the uri of a node in tree view :
http://localhost:3000/statements/tree/55927761647701690?#
You will get the statements with that databaseid, that was generated by redland for an URI.
The trick is that only one level of data is loaded, and the other objects are loaded on demand by javascript via json :
Each node returned has an ID that is passed as a parameter to the JSON routine.
http://localhost:3000/jquery/json
That routine delivers one part of the tree to the jstree.
For example :
({ attributes: { "id" : "" },
data: "node_title / ",
state: "open",
children: [
/* an array of child nodes objects */
(
{ attributes: { id : "11955769873204335371"}, data:{ title: "node http://introspector.sf.net/2003/08/16/introspector.owl#tree_list / 11955769873204335371" }, async : true, // Properties below are only used for NON-leaf nodes state: "open", // or "open" opts : { method : "GET", url : "http://localhost:3000/jquery/json/11955769873204335371" } }),
({ attributes: { id : "9766105400819017523"}, data:{ title: "node http://introspector.sf.net/2003/08/16/introspector.owl#id-2215 / 9766105400819017523" }, async : true, // Properties below are only used for NON-leaf nodes state: "open", // or "open" opts : { method : "GET", url : "http://localhost:3000/jquery/json/9766105400819017523" } }),
({ attributes: { id : "2453797205026508228"}, data:{ title: "node http://introspector.sf.net/2003/08/16/introspector.owl#id-4321 / 2453797205026508228" }, async : true, // Properties below are only used for NON-leaf nodes state: "open", // or "open" opts : { method : "GET", url : "http://localhost:3000/jquery/json/2453797205026508228" } })
, ] })
Now the nice thing about this system is that we can construct a webpage from RDF fragements. They are loaded from the database.
I will be adding in the predicate (field) information next. right now only the object data is shown.
This is that start of a nice GUI for the GCC introspector.
http://github.com/h4ck3rm1k3/LIbRdfCatalyst
you will find the new version of my browser for the gcc data in rdf format.
For example, if you load the uri of a node in tree view :
http://localhost:3000/statements/tree/55927761647701690?#
You will get the statements with that databaseid, that was generated by redland for an URI.
The trick is that only one level of data is loaded, and the other objects are loaded on demand by javascript via json :
Each node returned has an ID that is passed as a parameter to the JSON routine.
http://localhost:3000/jquery/json
That routine delivers one part of the tree to the jstree.
For example :
({ attributes: { "id" : "" },
data: "node_title / ",
state: "open",
children: [
/* an array of child nodes objects */
(
{ attributes: { id : "11955769873204335371"}, data:{ title: "node http://introspector.sf.net/2003/08/16/introspector.owl#tree_list / 11955769873204335371" }, async : true, // Properties below are only used for NON-leaf nodes state: "open", // or "open" opts : { method : "GET", url : "http://localhost:3000/jquery/json/11955769873204335371" } }),
({ attributes: { id : "9766105400819017523"}, data:{ title: "node http://introspector.sf.net/2003/08/16/introspector.owl#id-2215 / 9766105400819017523" }, async : true, // Properties below are only used for NON-leaf nodes state: "open", // or "open" opts : { method : "GET", url : "http://localhost:3000/jquery/json/9766105400819017523" } }),
({ attributes: { id : "2453797205026508228"}, data:{ title: "node http://introspector.sf.net/2003/08/16/introspector.owl#id-4321 / 2453797205026508228" }, async : true, // Properties below are only used for NON-leaf nodes state: "open", // or "open" opts : { method : "GET", url : "http://localhost:3000/jquery/json/2453797205026508228" } })
, ] })
Now the nice thing about this system is that we can construct a webpage from RDF fragements. They are loaded from the database.
I will be adding in the predicate (field) information next. right now only the object data is shown.
This is that start of a nice GUI for the GCC introspector.
Saturday, November 14, 2009
GCC Rap
My fast AST is a blast from the past.
to collect the fruits from the tree of knowledge, is the mission.
The caste of those who are in the AST are vast.
From the oldest clay tablets to the newest bazar branches, they are one in the same from our position.
onced amassed, the AST will last and serve as a mast,
upon which to hoist the flag of freedom.
The flag is raised on montag zig-zag when we enter into the conflagration of the compilation.
We are the warriors of geekdom!
Semantics of the steps are expressed in the tree,
oh say can you see, the GPL from America will set you free? Yes Way!
The Copyright is not our only salvation, thanks to microsoft, we now got the D M C A!
St. Aho, St. Sethi, and the holy Jeffrey D. Ullman will help us slay the dragon of complexity.
So we say, god bless Murry Hill and Summit New Jersey! Good old A T and T!
The compilation is a process that reduces the complexity of computation via concentration on the pendantic condemnation of error nodes in the code.
Just watch out or we will add your new fangled patterns to our database of lex!
The process of trial and error is replaced by a procedure of pure terror.
The gcc revolucionario is the bancario of the dictatario.
All you can do is get Vexed when we start to Flex.
The from terror petrified errors are removed in the SCENARIO of ontario!
The algorithm gives Rhythm to my Rhyme.
And kernighan and ritchie give us syntax on time!
Just don't let my Algorithm make you Argo!
The rule and the meter, are known only to st. peter.
Now peterpaul from senegaul likes to call the method via the this pointer.
to LOAD the reference of the methode from the register is the technique that is not unique.
We know from Van Neuman they are all just jumps over the bumps.
Abdul and Boole both knew the tools of the trade.
But the decomposition of the mathematician can be offset by the juxtaposition of the tree so that they don't fade.
The code is stored in the block! If you load the code via the node, you will get the whole flip mode squad.
But not at once, that would be too concrete, the meaning has to take the backseat.
When we compile doom, then you end up with a whole wad!
The meaning is just leaning on the screening, preening and even weaning of the encoding.
But in reality we are just convening on the meaning, and machining and sometimes throwing exceptions!
The monkey of the AST climbs and traverses the trees, jumping from branch to branch.
But the exceptions might make him fall as if stung by african killer bees!
The forest is sorerest where there are no fruit!
That comprises the content of the transmission and even the nutrition of the mathematician by definition!
Otherwise, the point is moot!
The blanch branch on the ranch is just one in the batch, and can be only found by the root.
The APPLICATION of the ration for the haitian is the damnation of his station!
So he has gotta get in there and just grab the loot!
The competition for the nutrition creates a partition, solved only by the patrician under admission and condition of contrition!
The scheduler or the arranger is a stranger! The ranger and changer of code is in perpetual danger of obmission or malnutrition, and cannot rely on just intuition!
interpretative propagation of the representation through the programs organization serves the transformation and distributed interrogation, invalidation or even intimidation of the sign!
Therefore the basic block must be kept locked in stock! This forms a blockade which cannot be betrayed, yet arrayed and replayed in the arcade!
When your code dont compile, it should make you start to whine!
Thus the compiler should not be afraid! It is still going to get made.
The canada node-billionaire is quite good! Yet his trees are also just made of wood!
codesourcery can forcefully typecast the beast, yet that is not the least of the function, that is why mark is so well payed!
Now, Node coloring does not get dullering! Just Don't start stullering or tullaring!
When this compiler gets released, It will cause a worldwide rambunction!
The Computations in constant time also deserve a rhyme!
Just remember all the complications and the combinations of constant folding transformations that are soo leet!
Don't replace a variable with a constant, unless you are sure of the relations.
Allocation is the low level causation, and the free function is the cessation of creation of crustacean quotations!
If you need to know more about that, then we need to ask Buddha!
He warned us always : Don't make to many choppy copies on floppies of the gloppy, groppy, sloppy jalopy!
The Central processing unit is not that central no more, so dont forget to lock the Mutex!
We got our Obstack ready to get poppy!
The definition of the functor is based on the strangers typed submission in signs!
We know how to deal with them, in case they get Stroppy!
The chore of the parameter is to partition the rhymes!
The direct threaded process is not multi headed!
The bore of the diameter, serves to push you backwards in time.
When you step to GCC hackers, you are going to get shredded!
We decline to redefine or even redesign the enshrined Stallman rhyme!
Because he is so leet, he will make you start dancing when the code hits your feet.
Watch out for the end users and other abusers! They will fill your cache with losers who just want new cruisers!
Keep the tree of knowledge pure and free of crufty codes, prune them off and kick them from the abode.
God is wise and allknowing, but he is almost never for the mundane showing!
So keep your benchmarks tight and your code size in cache, so that the computer don't thrash!
Follow these rules and you will start a knowing!
We will monitor the quality, and make sure the test cases run sweet!
Cause our deja gnu just keeps on blowing,
hot air through the pipes of the system,
and that gives us freedom to persue the mission.
The object that is so sublime,
World wide respected and never neglected,
And it does not even take up all our time.
We can fork and clone the codespace upon demand, and each step moves us futher from spam!
Cautionary branches are sometimes needed, and as long as the warnings are all heeded, the overall codebase is always clean!
Lean and mean is the state of the system, with no frills, bells or whistles to distract from the target.
But dont think that we don't know what we are doing,
we gotta keep this code stream moving!
foo boo bar and baz is the same to us,
food for fools and goo for the animals in the zoo! All strings are hidden away neatly in the string_cst node,
and that is what makes us robust!
The hashing function is the only drug that gets us high, so break us off a chunk of that table and pass the token around!
Around and around the function call passing, and when we are done, we will have your mplayer blastin'.
Linux kernel or hello world, there is no difference, apache and perl, porn server or curl, we go them all in the mix.
The GCC has a big bag o tricks!
to collect the fruits from the tree of knowledge, is the mission.
The caste of those who are in the AST are vast.
From the oldest clay tablets to the newest bazar branches, they are one in the same from our position.
onced amassed, the AST will last and serve as a mast,
upon which to hoist the flag of freedom.
The flag is raised on montag zig-zag when we enter into the conflagration of the compilation.
We are the warriors of geekdom!
Semantics of the steps are expressed in the tree,
oh say can you see, the GPL from America will set you free? Yes Way!
The Copyright is not our only salvation, thanks to microsoft, we now got the D M C A!
St. Aho, St. Sethi, and the holy Jeffrey D. Ullman will help us slay the dragon of complexity.
So we say, god bless Murry Hill and Summit New Jersey! Good old A T and T!
The compilation is a process that reduces the complexity of computation via concentration on the pendantic condemnation of error nodes in the code.
Just watch out or we will add your new fangled patterns to our database of lex!
The process of trial and error is replaced by a procedure of pure terror.
The gcc revolucionario is the bancario of the dictatario.
All you can do is get Vexed when we start to Flex.
The from terror petrified errors are removed in the SCENARIO of ontario!
The algorithm gives Rhythm to my Rhyme.
And kernighan and ritchie give us syntax on time!
Just don't let my Algorithm make you Argo!
The rule and the meter, are known only to st. peter.
Now peterpaul from senegaul likes to call the method via the this pointer.
to LOAD the reference of the methode from the register is the technique that is not unique.
We know from Van Neuman they are all just jumps over the bumps.
Abdul and Boole both knew the tools of the trade.
But the decomposition of the mathematician can be offset by the juxtaposition of the tree so that they don't fade.
The code is stored in the block! If you load the code via the node, you will get the whole flip mode squad.
But not at once, that would be too concrete, the meaning has to take the backseat.
When we compile doom, then you end up with a whole wad!
The meaning is just leaning on the screening, preening and even weaning of the encoding.
But in reality we are just convening on the meaning, and machining and sometimes throwing exceptions!
The monkey of the AST climbs and traverses the trees, jumping from branch to branch.
But the exceptions might make him fall as if stung by african killer bees!
The forest is sorerest where there are no fruit!
That comprises the content of the transmission and even the nutrition of the mathematician by definition!
Otherwise, the point is moot!
The blanch branch on the ranch is just one in the batch, and can be only found by the root.
The APPLICATION of the ration for the haitian is the damnation of his station!
So he has gotta get in there and just grab the loot!
The competition for the nutrition creates a partition, solved only by the patrician under admission and condition of contrition!
The scheduler or the arranger is a stranger! The ranger and changer of code is in perpetual danger of obmission or malnutrition, and cannot rely on just intuition!
interpretative propagation of the representation through the programs organization serves the transformation and distributed interrogation, invalidation or even intimidation of the sign!
Therefore the basic block must be kept locked in stock! This forms a blockade which cannot be betrayed, yet arrayed and replayed in the arcade!
When your code dont compile, it should make you start to whine!
Thus the compiler should not be afraid! It is still going to get made.
The canada node-billionaire is quite good! Yet his trees are also just made of wood!
codesourcery can forcefully typecast the beast, yet that is not the least of the function, that is why mark is so well payed!
Now, Node coloring does not get dullering! Just Don't start stullering or tullaring!
When this compiler gets released, It will cause a worldwide rambunction!
The Computations in constant time also deserve a rhyme!
Just remember all the complications and the combinations of constant folding transformations that are soo leet!
Don't replace a variable with a constant, unless you are sure of the relations.
Allocation is the low level causation, and the free function is the cessation of creation of crustacean quotations!
If you need to know more about that, then we need to ask Buddha!
He warned us always : Don't make to many choppy copies on floppies of the gloppy, groppy, sloppy jalopy!
The Central processing unit is not that central no more, so dont forget to lock the Mutex!
We got our Obstack ready to get poppy!
The definition of the functor is based on the strangers typed submission in signs!
We know how to deal with them, in case they get Stroppy!
The chore of the parameter is to partition the rhymes!
The direct threaded process is not multi headed!
The bore of the diameter, serves to push you backwards in time.
When you step to GCC hackers, you are going to get shredded!
We decline to redefine or even redesign the enshrined Stallman rhyme!
Because he is so leet, he will make you start dancing when the code hits your feet.
Watch out for the end users and other abusers! They will fill your cache with losers who just want new cruisers!
Keep the tree of knowledge pure and free of crufty codes, prune them off and kick them from the abode.
God is wise and allknowing, but he is almost never for the mundane showing!
So keep your benchmarks tight and your code size in cache, so that the computer don't thrash!
Follow these rules and you will start a knowing!
We will monitor the quality, and make sure the test cases run sweet!
Cause our deja gnu just keeps on blowing,
hot air through the pipes of the system,
and that gives us freedom to persue the mission.
The object that is so sublime,
World wide respected and never neglected,
And it does not even take up all our time.
We can fork and clone the codespace upon demand, and each step moves us futher from spam!
Cautionary branches are sometimes needed, and as long as the warnings are all heeded, the overall codebase is always clean!
Lean and mean is the state of the system, with no frills, bells or whistles to distract from the target.
But dont think that we don't know what we are doing,
we gotta keep this code stream moving!
foo boo bar and baz is the same to us,
food for fools and goo for the animals in the zoo! All strings are hidden away neatly in the string_cst node,
and that is what makes us robust!
The hashing function is the only drug that gets us high, so break us off a chunk of that table and pass the token around!
Around and around the function call passing, and when we are done, we will have your mplayer blastin'.
Linux kernel or hello world, there is no difference, apache and perl, porn server or curl, we go them all in the mix.
The GCC has a big bag o tricks!
swap it like it's hot,
When the pages's on the disk bro
swap it like it's hot,
swap it like it's hot,
swap it like it's hot,
When the pigs try to get at yea
encrypt it like it's hot,
encrypt it like it's hot,
encrypt it like it's hot,
And if a hacker get an attitude
Pop it like it's hot,
Pop it like it's hot,
Pop it like it's hot,
I got the gcc on my usb and I'm pouring Chandon
And I slice the best hashtables cause I got it going on
swap it like it's hot,
swap it like it's hot,
swap it like it's hot,
When the pigs try to get at yea
encrypt it like it's hot,
encrypt it like it's hot,
encrypt it like it's hot,
And if a hacker get an attitude
Pop it like it's hot,
Pop it like it's hot,
Pop it like it's hot,
I got the gcc on my usb and I'm pouring Chandon
And I slice the best hashtables cause I got it going on
The Compilation Will Not Be Youtubised
The Compilation Will Not Be Youtubised
You will not be able to stay home, hacker.
You will not be able to play Halo, Doom, Spore, or WIITennis.
You will not be able to pass the dutch, light the J or get Stupid.
Skip out to wget porn during NOPS,
Because the compilation will not be youtubised.
The compilation will not be youtubised.
The compilation will not be brought to you by Apple
In a podcast without promoted videos and Banner Ads.
The compilation will not show you pictures of Bill Gosper
blowing a bugle and leading a charge by Russ Noftsker,
Tom Kiely, Robert P. Adams and Andrew Egendorf to program MACLISP
on a Lisp Machine.
The compilation will not be youtubised.
The compilation will not be brought to you by Google
and will not star Super Mario, Souja Boy, or HotForWords.
The compilation will not give your website sex appeal.
The compilation will not make you conversion rate go up.
The compilation will not make your web site load faster,
because the compilation will not be youtubised, Hacker.
There will be no pictures of you and Greg Benson
pranking people ordering pizza,
or trying to slide that new iphone into your designer jeans.
AOL.com will not be able predict the winner via podcast
or report from all districts.
The compilation will not be youtubised.
There will be no pictures of MSFT shooting down
hackers in the Suggested Videos Section.
There will be no pictures of MSFT shooting down
hackers in the Suggested Videos Section.
There will be no pictures of Stallmann being
run out of MIT by the Symbolics.
There will be no slow motion or still life of Eblen Moglen
strolling through Washington in a Red, White and
Blue liberation jumpsuit that he had been saving
For just the proper occasion.
sxephil, hotforwords, and sponge bob
will no longer be so damned relevant, and
women will not care if Mr Big finally gets down with
Carrie on Sex in the City because Hackers
will be in the Emacs looking for a brighter day.
The compilation will not be youtubised.
There will be no highlights on the Promoted Videos
and no pictures of Linus Torwalds and Eric Raymond project managing.
The theme song will not be written by Steven Jobs,
Bill Gates, nor sung by Steven Fry, Jerry Seinfeld, Britney Spears, Beyonce, or Akon.
The compilation will not be youtubised.
The compilation will not be a banner Ad
about a IPod, IBook, or IMac.
You will not have to worry about a Intel in your
chip, a Vista in your OS, or the NVidia in your GPU.
The compilation will not go better with Blackberry.
The compilation will not fight the virus that may cause data loss.
The compilation will put you in the driver's seat.
The compilation will not be youtubised, will not be youtubised,
will not be youtubised, will not be youtubised.
The compilation will be no re-run hackers;
The compilation will be live.
Monday, November 9, 2009
Definition of Interoperability
Interoperability is the ability of disparate and diverse organisations
1 to interact towards mutually beneficial and agreed common goals,
involving the sharing of information and knowledge between the organizations
via the business processes they support,
by means of the exchange of data between their respective information and
communication technology (ICT) systems."
In fact, interoperability is often confused with other, related concepts.
It can be therefore a useful exercise to observe explicitly what interoperability is NOT:
1.1 Interoperability is not Integration, which is a means of changing loosely coupled systems to
make them into more tightly coupled systems.
1.2. Interoperability is not Compatibility, which is more about
the interchangeability of tools in a particular context.
1.3 Interoperability is not Adaptability, which is a means of changing a tool,
adding additional capabilities as needed even on an ad-hoc basis,
whereas interoperability refers to inherent capabilities It is also worth noting that interoperability is neither ad-hoc,
nor unilateral (nor even bilateral) in nature.
Rather, it is best understood as a shared value of a community.
The final point to be made about interoperability from the definition standpoint,
is that it is also a quality that could be broken down into a series of
quantifiable characteristics (metrics) which could be assessed (measured) separately,
as the need arises. 3.3.1.2 Definition of PEGS (Pan-European eGovernment Services)
The following is a good working definition of PEGS2,3,4,5:
"Cross-border public sector services supplied by either national public
administrations or EU public administrations provided to one another
and to European businesses and citizens, in order to implement
community legislation, by means of interoperable networks
between public administrations."
3.3.1.3 Definition of Interoperability Framework An Interoperability Framework
describes the way in which organisations have agreed, or should agree,
to interact with each other, and how standards should be used.
In other words, it provides policies and guidelines that form the basis
for selection of standards6 . It may be contextualised (i.e., adapted)
according to the socio-economic, political, cultural, linguistic,
historical and geographical situation of its scope of applicability in
a specific circumstance/situation (a constituency, a country, a set of countries, etc).
Link to video : http://ia341342.us.archive.org/3/items/EuropeanInteroperabilityFrameworkEIFandtheArchitectureGuidelinesAG2/_PAGE_11.avi Link to mp3: http://www.archive.org/download/EuropeanInteroperabilityFrameworkEIFandtheArchitectureGuidelinesAG2/Doc_page.11.mp3
Friday, November 6, 2009
Optimization and Introspector
I have been thinking today about the gcc introspector.
That I should optimize my time spent on a job, not really the execution time.
The Human time spent, and the skills needed to do a job, the learning effort involved. These are the issues.
The program itself that is the result of the work is another issue, it has also a timeframe, the power usage, time to execute and the space usage that the running program uses.
Also there are issues like downtime, programs breaking and security.
So, here are my ideas :
One, the compiler should be able to read some source code or enrich or replace the data from the doxygen xml. That means that given an introspection data dump, we should be able to read in the doxygen html and add in the missing from the compiler. We would also be able to use the doxygen output as input to the gcc introspector directly.
Secondly, There is the human issue of types. The types of a program do not matter to the computer. Of course when those types are used wrongly, the computer will crash. But lets get back to the idea. Types are basically data formats, but also meanings. We would like to be able to find similar types, compare types and visualize the types.
Third, we would like to see the types and how they are related.
We can imagine a rdf datastore of a program as a graph of all connections in the program. We will be able to see how a type is used and the code that goes from it.
What is important in this equasion is the runtime path. We want to see what paths are followed in the program, that means the code that is executed.
This turns into a big debugging exercise, the debugger will be able to show us that. The profiler as well. Dtrace on solaris. Print statements in the log file. All of those things are indications of what a program does.
But to truly understand a program we need to know the following :
1. The specfication of the program that defines the inputs and outputs.
2. The test cases that cover the entire functionality of that program.
3. The audit that shows how the source code relates to the specification
4. The audit that show how the test executes implement that code.
Now we want to get down to the level of instructions being executed on a machine. Lets say we have a virtual machine, and we can add in all types of data and annotate each instruction.
So we would have for each byte of the input data to the test case, and I am thinking of a simple system that reads from stdin and writes to stdout. We can however trace each input that is read. We would assume that all reads of data are from the test environment for the moment. So we would have at a given time, a read of some byte of information from a file at a position.
Now the contents of the file are only interesting in that we would like to trace for each byte how it is processed. For this we would define the information to understand the type of data as the entire set of instructions executed on it, and all the data that is needed for this.
We would have a block of instructions, or even a DAG with loops of the instructions if there is a loop. We would see the instructions executed, the registers used (where that data comes from) and memory used. Cache pages accessed and all that. This could be provided by the virtual machine.
Lets imagine that we are running a version of qemu or some similar tool with full debug information.
Now lets continue, I would like to define metadata as overhead, administration data that should be minimised. Metadata is like a key to a lock separated from the data for some reason, but they belong together. Lets say that the universe contains the metadata, and we need to collect it to understand a given problem.
Now the domain specific problem is not metadata, so lets say, I am working on the problem of creating a video for youtube from a mp3 and a jpg. That is the domain data. All of that data from outside that is processed in the program belongs to the runtime data. We can imagine a stream of data from the input files flowing to the output files. Then we have domain specific information about the codecs, that is also part of the domain, but removed a level. For the movie, it could be considered metadata. The parameters of the codec. In full, the entire source code and all the processing of the program is the metadata. For example, if you want to know why there is a glitch in the movie at a certain point, you need to maybe also know what was going on at that point in the program. It might be an environmental issue, like the power being shut off.
So, we want to be able to trace the entire data flow from inside to outside. For full understanding, we want to trace how the data gets into the program, for example if an integer is being loaded into the register, where does this come from? Who is the person who added that to the source code, what revision? What was the change supposed to do? What was the specification of it?
Now we would like to model the input data. Lets say in our example we can say we have frames of data in a movie. We want to be able to replace a given frame of data in memory with a set of frames. We can abstract that data by removing it. Reducing it. We would say it is the nth frame of data from the audio input.
That is domain specific, and we would have to model the domain specific types
to say such things. We therefore need a specification of the program, a bug report or some type of input as to what the meaning of it is.
Now we can imagine the pipeline of the processing of a program. We have the flow of input to output using registers and instructions on the way.
We have the flow of values from code being loaded.
Next we would look at optimizations of the compiler, we have changes to the compiler that flow into the instructions of the code. Different compiler switches flow into the instructions and registers used.
The code of the compiler is also flowing from the specification of the chips.
Some times we do not even have a public document of the chip or the language so we would take the changes to the compiler as the public documentation.
This is however the core problem as to why the compiler is so cryptic,
if the specification of the machine is secret, the specification of the language as well, then why should the compiler be easy to understand? there is a definite conflict between forces here at play. It is market economics meeting FLOSS.
So, we can then for each byte of the output file have entire trace to all the sources of it. Source code, Input files, Environmental changes and all.
For this to be processed efficiently we will have to come up with some real optimizations, but it is my basic vision of what the introspector is.
Mike
That I should optimize my time spent on a job, not really the execution time.
The Human time spent, and the skills needed to do a job, the learning effort involved. These are the issues.
The program itself that is the result of the work is another issue, it has also a timeframe, the power usage, time to execute and the space usage that the running program uses.
Also there are issues like downtime, programs breaking and security.
So, here are my ideas :
One, the compiler should be able to read some source code or enrich or replace the data from the doxygen xml. That means that given an introspection data dump, we should be able to read in the doxygen html and add in the missing from the compiler. We would also be able to use the doxygen output as input to the gcc introspector directly.
Secondly, There is the human issue of types. The types of a program do not matter to the computer. Of course when those types are used wrongly, the computer will crash. But lets get back to the idea. Types are basically data formats, but also meanings. We would like to be able to find similar types, compare types and visualize the types.
Third, we would like to see the types and how they are related.
We can imagine a rdf datastore of a program as a graph of all connections in the program. We will be able to see how a type is used and the code that goes from it.
What is important in this equasion is the runtime path. We want to see what paths are followed in the program, that means the code that is executed.
This turns into a big debugging exercise, the debugger will be able to show us that. The profiler as well. Dtrace on solaris. Print statements in the log file. All of those things are indications of what a program does.
But to truly understand a program we need to know the following :
1. The specfication of the program that defines the inputs and outputs.
2. The test cases that cover the entire functionality of that program.
3. The audit that shows how the source code relates to the specification
4. The audit that show how the test executes implement that code.
Now we want to get down to the level of instructions being executed on a machine. Lets say we have a virtual machine, and we can add in all types of data and annotate each instruction.
So we would have for each byte of the input data to the test case, and I am thinking of a simple system that reads from stdin and writes to stdout. We can however trace each input that is read. We would assume that all reads of data are from the test environment for the moment. So we would have at a given time, a read of some byte of information from a file at a position.
Now the contents of the file are only interesting in that we would like to trace for each byte how it is processed. For this we would define the information to understand the type of data as the entire set of instructions executed on it, and all the data that is needed for this.
We would have a block of instructions, or even a DAG with loops of the instructions if there is a loop. We would see the instructions executed, the registers used (where that data comes from) and memory used. Cache pages accessed and all that. This could be provided by the virtual machine.
Lets imagine that we are running a version of qemu or some similar tool with full debug information.
Now lets continue, I would like to define metadata as overhead, administration data that should be minimised. Metadata is like a key to a lock separated from the data for some reason, but they belong together. Lets say that the universe contains the metadata, and we need to collect it to understand a given problem.
Now the domain specific problem is not metadata, so lets say, I am working on the problem of creating a video for youtube from a mp3 and a jpg. That is the domain data. All of that data from outside that is processed in the program belongs to the runtime data. We can imagine a stream of data from the input files flowing to the output files. Then we have domain specific information about the codecs, that is also part of the domain, but removed a level. For the movie, it could be considered metadata. The parameters of the codec. In full, the entire source code and all the processing of the program is the metadata. For example, if you want to know why there is a glitch in the movie at a certain point, you need to maybe also know what was going on at that point in the program. It might be an environmental issue, like the power being shut off.
So, we want to be able to trace the entire data flow from inside to outside. For full understanding, we want to trace how the data gets into the program, for example if an integer is being loaded into the register, where does this come from? Who is the person who added that to the source code, what revision? What was the change supposed to do? What was the specification of it?
Now we would like to model the input data. Lets say in our example we can say we have frames of data in a movie. We want to be able to replace a given frame of data in memory with a set of frames. We can abstract that data by removing it. Reducing it. We would say it is the nth frame of data from the audio input.
That is domain specific, and we would have to model the domain specific types
to say such things. We therefore need a specification of the program, a bug report or some type of input as to what the meaning of it is.
Now we can imagine the pipeline of the processing of a program. We have the flow of input to output using registers and instructions on the way.
We have the flow of values from code being loaded.
Next we would look at optimizations of the compiler, we have changes to the compiler that flow into the instructions of the code. Different compiler switches flow into the instructions and registers used.
The code of the compiler is also flowing from the specification of the chips.
Some times we do not even have a public document of the chip or the language so we would take the changes to the compiler as the public documentation.
This is however the core problem as to why the compiler is so cryptic,
if the specification of the machine is secret, the specification of the language as well, then why should the compiler be easy to understand? there is a definite conflict between forces here at play. It is market economics meeting FLOSS.
So, we can then for each byte of the output file have entire trace to all the sources of it. Source code, Input files, Environmental changes and all.
For this to be processed efficiently we will have to come up with some real optimizations, but it is my basic vision of what the introspector is.
Mike
ScanOCR Media/Wiki
Here is my idea, for dealing with all the papers I need to with my taxes.
Create a way to import and process images, a document management system, but based on media wiki.
the scan would produce a webpage of the wikitext overlayed on top of scanned image and have a spell checker.
The user can then process the text pages as a wiki.
So it would be a document management system based on media wiki, also the scans should be easier to import, it should be tied into xsane/cups.
And of course, it should be able to store this data on git, but that is less of the issue, because I would not like to distribute my tax data to everyone.
mike
Create a way to import and process images, a document management system, but based on media wiki.
the scan would produce a webpage of the wikitext overlayed on top of scanned image and have a spell checker.
The user can then process the text pages as a wiki.
So it would be a document management system based on media wiki, also the scans should be easier to import, it should be tied into xsane/cups.
And of course, it should be able to store this data on git, but that is less of the issue, because I would not like to distribute my tax data to everyone.
mike
Sunday, November 1, 2009
Introspector Reader Update
It has been a while since I posted. I have been saving this blog for real updates, not progress reports.
Well, I have finally started to hack mencoder to produce the videos that I need in the right way. The existing reader script https://code.launchpad.net/~jamesmikedupont/introspectorreader/wikipedia-strategy uses a hack of creating symlinks to images to create frames. that creates thousands of files. Now my new mencoder hack produces much better results.
Here is my idea : I would like to have a version of espeak that can feed into information directly to mencoder. It would be able to give the exact timing for each word and the pronunciation keys as well.
These would be emitted as subtitle tracks and usable.
In addition, I would like to be able to use this information to create frames of video per word where the word is highlighted on the screen. A bouncing ball or red highlighting.
The would be done by creating successive images from the pdf, the pdf file is text anyway, so it should be a hack for pdf2ppm command to say : highlight this text, or emit the x,y coordinates of the words. I know that xpdf to xml can do this.
pdftohtml -xml is the command.
So now we have an xml file that looks like this :
Alternatively we could just render that frame, highlighting the text and then pass that single text to espeak.
That would produce many small texts, but they could be embedded in html pages for example as mp3 later on. The problem with that are link breaks, and how to deal with them.
So we have two types of processing ideas : One to markup the output of espeak with timing information, the other to break down the input to espeak into smaller chunks.
We should look into both of these ideas, and be able to use them together.
Missing right now is the timing and subtitle information from espeak. that should be a quick win.
My vision is really to have a single sentence on the screen as it is spoken, and to have the words/parts of speech visible and highlighted in realtime.
Ideally we would be able to annotate such videos and feed that back to create better ones as well.
Also ideally espeak would be able to do such things directly, like read pdf files, mediawiki directly and be able to produce interactive graphics as well.
This would be embedded in the firefox as a plugin and also be able to add in translation tools as well.
For example, I would like to be able to transform the english pronouciation into another language, how would an Albanian encode this word to produce the same sounds? a German? How to pronouce this in chinese?
that could be done with the phonetic information. Also translations of the text in meaning could be brought in.
Additionally, hyperlinks on the words would be interesting. A SVG graphics of the text, or a CSS highlighting would be even better. It could be all done directly in the browser.
So those are my ideas for the introspector reader. Imagine what would happen if you were able to read a c++ program as well? Take the compiler intermediate data and be able to create videos? Well that is the connection to the gcc introspector project I started ten years ago.
Mike
Well, I have finally started to hack mencoder to produce the videos that I need in the right way. The existing reader script https://code.launchpad.net/~jamesmikedupont/introspectorreader/wikipedia-strategy uses a hack of creating symlinks to images to create frames. that creates thousands of files. Now my new mencoder hack produces much better results.
Here is my idea : I would like to have a version of espeak that can feed into information directly to mencoder. It would be able to give the exact timing for each word and the pronunciation keys as well.
These would be emitted as subtitle tracks and usable.
In addition, I would like to be able to use this information to create frames of video per word where the word is highlighted on the screen. A bouncing ball or red highlighting.
The would be done by creating successive images from the pdf, the pdf file is text anyway, so it should be a hack for pdf2ppm command to say : highlight this text, or emit the x,y coordinates of the words. I know that xpdf to xml can do this.
pdftohtml -xml is the command.
So now we have an xml file that looks like this :
That is enough information to then render it and highlight it.SomeText
Alternatively we could just render that frame, highlighting the text and then pass that single text to espeak.
That would produce many small texts, but they could be embedded in html pages for example as mp3 later on. The problem with that are link breaks, and how to deal with them.
So we have two types of processing ideas : One to markup the output of espeak with timing information, the other to break down the input to espeak into smaller chunks.
We should look into both of these ideas, and be able to use them together.
Missing right now is the timing and subtitle information from espeak. that should be a quick win.
My vision is really to have a single sentence on the screen as it is spoken, and to have the words/parts of speech visible and highlighted in realtime.
Ideally we would be able to annotate such videos and feed that back to create better ones as well.
Also ideally espeak would be able to do such things directly, like read pdf files, mediawiki directly and be able to produce interactive graphics as well.
This would be embedded in the firefox as a plugin and also be able to add in translation tools as well.
For example, I would like to be able to transform the english pronouciation into another language, how would an Albanian encode this word to produce the same sounds? a German? How to pronouce this in chinese?
that could be done with the phonetic information. Also translations of the text in meaning could be brought in.
Additionally, hyperlinks on the words would be interesting. A SVG graphics of the text, or a CSS highlighting would be even better. It could be all done directly in the browser.
So those are my ideas for the introspector reader. Imagine what would happen if you were able to read a c++ program as well? Take the compiler intermediate data and be able to create videos? Well that is the connection to the gcc introspector project I started ten years ago.
Mike
Tuesday, September 29, 2009
MA GOOGLE : Does it ring a bell?
MA GOOGLE
Does it ring a bell?
Google is a great company and I love their services.
But we have to be honest about a real danger to our freedoms that such a company poses.
I just read about this Google Issues Cease & Desist letter to an android developer here:
http://www.linux-mag.com/cache/7544/1.html
My comment is this :
Google has never talked about GNU at all and does not recognize the FSF when talking about android or the google OS.
This is a sign that they are building an "open source" Linux system and not a free as in freedom GNU/Linux system.
The point is that there is no such thing as 90% free. It is free or it is not free, and that is pretty simple.
I feel that google has an increasing control over all information that you see, all the emails that you read and it is now taking more control over the applications.
There is a severe conflict of interest here between providing information : search
and providing a service for example google mapmaker or youtube.
If you want to search for maps, they will lead you to google map maker and not to openstreetmap for example.
I have found the google image search swept clean of images about boycott novell for example, and you have to go to yahoo to find any good stuff.
I think that google is going to be the next Ma Bell, AT&T and it will have to be split up in a similar fashion.
thanks for listening,
Mike
Monday, September 28, 2009
Life Long Learning Virtual Conference Dedication
I have started a new project to host a life long learning virtual conference.
http://lllvconf.ning.com/profiles/blogs/dedication-1
http://rdfintrospector2.blogspot.com/2009/09/life-long-learning-virtual-conference.html
http://lllvconf.ning.com/profiles/blogs/dedication-1
http://rdfintrospector2.blogspot.com/2009/09/life-long-learning-virtual-conference.html
Monopoly City Streets hall of shame
Thursday, September 24, 2009
3d Openstreetmap
OK,
Here is my idea :
Take http://openstreetmap.org data.
Put them into a 3d game engine like open arena http://openarena.ws/.
This would allow an interactive markup of a city :
Allow people to walk around in the city.
Allow people to interact and tag things.
I got this idea thinking about how to partition the nodes of the graph,
well quake uses bsp trees, so why dont we just put the nodes into that?
http://openarena.wikia.com/wiki/DeveloperFAQ
http://openarena.wikia.com/wiki/Modelling_a_map
Mike
Links:
http://www.osm-3d.org/
Ok I cannot find any sources here.
http://wiki.openstreetmap.org/wiki/OSM-3D
"The developed software is currently not an open source product. "
http://igorbrejc.net/openstreetmap/openstreetmap-in-3d
http://www.alpix.com/3d/TerrainViewer/index.html
Ok Some windows stuff...
Here is my idea :
Take http://openstreetmap.org data.
Put them into a 3d game engine like open arena http://openarena.ws/.
This would allow an interactive markup of a city :
Allow people to walk around in the city.
Allow people to interact and tag things.
I got this idea thinking about how to partition the nodes of the graph,
well quake uses bsp trees, so why dont we just put the nodes into that?
http://openarena.wikia.com/wiki/DeveloperFAQ
http://openarena.wikia.com/wiki/Modelling_a_map
Mike
Links:
http://www.osm-3d.org/
Ok I cannot find any sources here.
http://wiki.openstreetmap.org/wiki/OSM-3D
"The developed software is currently not an open source product. "
http://igorbrejc.net/openstreetmap/openstreetmap-in-3d
http://www.alpix.com/3d/TerrainViewer/index.html
Ok Some windows stuff...
Tuesday, September 22, 2009
Nothing compares to GNU
Parody on Sinead o Connor.
It's been twenty five years and 9 months,
Since they took the source away
I hack every night and work all day
Since they took the source away
Since windows been gone I hack do whatever I want
I can run whatever I choose
I can write my code in a fancy restaurant
But nothing
I said nothing can take away these bugs
`Cause nothing compares
Nothing compares to GNU
It's been so lonely without GNU here
Like a freeamp without an ogg
Nothing can stop these bugs from submitting
Tell me billy gates where did I go wrong
I could put my compiler around every code I see
But they'd only remind me of GNU
I went to the lawyer n'guess what he told me
Guess what he told me
He said hacker u better try to have freedom
No matter what you'll do
But he's a fool
`Cause nothing compares
Nothing compares to GNU
all the projects that you started, buddy
In the source forge,
All stopped when you went away
I know that coding with you RMS was sometimes hard
But I'm willing to give it another try
Nothing compares
Nothing compares to GNU
Nothing compares
Nothing compares to GNU
Nothing compares
Nothing compares to GNU
Subscribe to:
Posts (Atom)




