Update: This project was dead, but it's back! Unfortunately my daytime job keeps me somewhat busy, so don't expect the project to emerge in a couple of years.
Saya is a programming project aimed to create a cross-platform, versatile, extensible Nonlinear video editor with features available in commercial editors like Adobe Premiere, Sony Vegas or Edius Pro. The project was started in June, 2008, died in 2009 and was revived in October 2010.

Saturday, May 31, 2008

After researching for a couple of days and designing for a whole day, I finally got a decent and pretty "new project" dialog. So far it's been one of the most complicated things I've done for the project yet, because I needed to get info about all the known editors, and standard video formats (thank you, wikipedia!)

Also, Premiere shows more settings, while Edius Pro's dialog is more compact and "friendly", but in my opinion it was oversimplified. So I had to choose a compromise.

One of the things that annoyed me about Premiere was that you had to scroll and read the settings as you kept choosing, and if you didn't like one, you were presented with an overcomplicated screen of settings.

I managed to include all the important settings in one page, and they'll change as I switch between presets. If i choose "custom", the settings will become read-write (they should be read-only and greyed out), so I can just type everything right away.

I also managed to make them pretty and not bloated. This took me a while, but putting the video settings at the left and the audio settings at the right, just did the trick.

I would also like to thank the wxFormBuilder developers for making such a great tool.

Thursday, May 22, 2008

This time I've added some panels with wxAUI. The little light blue area is supposed to be a file tree for the resources directories.

The bottom part is going to be the timeline, and is fixed to the main window.

The good part is that the panels can be undocked and turned into separate windows.

A friend gave me an idea based on the first screenshot: The timeline should be on the top. And it makes sense, this way I can add the toolbar directly to the window, and there will be no problems with them being too far from the timeline.

Soon you'll be able to see actual resources loaded in the resource window. Then I'll start working on the timeline.

Wednesday, May 21, 2008

Hi everyone. I was thinking about a feature in Edius Pro that I want to implement in my editor: Resolution independency... but I stumbled on a problem.

Let's say we have a sequence whose frames are made of 4 zoomed-out clips: A, B, C and D, as in a multicam display.ABCD

To process this, normally one would apply a "reduction" transformation to A, B, C and D, followed by a translation transformation. Then that would be rendered in the resulting clip, which we'll call E.

Now imagine E has similar sequences called F, G and H.

EFGH

The result will be called I, which would have this structure:

I { E { ABCD } F { JKLM} G {NOPQ} H {RSTU} }

Now what happens when we zoom inside I? The result depends on the processing order of the transformations. If we put this in layers (tracks), we'll have something similar to this:

IE F G HABCD JKLM NOPQ RSTU

If transformations (filters) are applied bottom-up, we would gain speed in processing, but resolution is lost, since the resulting images have a constant resolution. Zooming in would result in mosaic artifacts.

Instead, if transformations are applied top-down, processing complexity would be greater, but resolution would be preserved to the last pixel. This would allow us to zoom from I to E, to A and beyond.

I was rewatching the Edius Pro ads, and I marvelled at the concept of fps and resolution-independent processing. I remember the first time that I tried to import a clip in Premiere. It complained about the fps of the video being different than the current project's.

I had forgotten about this until I rewatched this promo. Wait a sec, fps-independent? Resolution independent? Cool!

Then I thought about the difficulty of such implementation. For now we can see timelines having HH:MM:SS:FF:MMM (hours, minutes, seconds, frames, and milliseconds - or was that milliframes? I don't think so.).

Currently in the timeline structures I have the tracks sorting the clips by frame. It's an integer number. But then I thought - hey, a millisecond is also an integer. So what's the limit for our videos? Let's see...

Assumming we have a limit of 2^31, we have 2147483648 milliseconds available for a production. If we divide it by 1000, we got 2147483.648 seconds. Let's round it to 2147483 seconds. If we divide it by 3600, we got ourselves an amount of 596.52 hours, which would be equivalent to 24 straight days. More than enough, I think :).

And due to the fact that we're using std::maps and not mere vectors, we don't have to worry about memory requirements: They're exactly the same.

Another thing I realized was that storing clips in a pool for later usage, is simply a waste of time and resources. Each time we copy/paste, we're gonna generate new clips. Cutting doesn't solve the problem because we could undo and yet retain the clip in the clipboard, so we'd need to process that. So, that would leave us with only more complexity and absolutely no storage gain.

If we get rid of the clips-in-storage-pool, we can store the clips (clip as in data-structure, not the movie, duh) directly in the timeline. Since it's an STL map, we don't have to worry about pointers either!

Sunday, May 18, 2008

My bittorrent download of QsKKing's tutorial for Sony Vegas 7.0 just finished downloading. The bad news is that it was encoded with one of the latest Micr.... (UGH! I can't mention that word! It burns!) codecs, which I had to get from the internet. I couldn't watch it with VLC player (I could with mplayer, but I hate that software, it's quite unfriendly). So I tried recoding it with FFMPEG, to no avail. I then tried with mencoder, which worked.

A good tip for using mencoder is installing the rpm for KMediaGrab, which is a graphical frontend for mencoder. The KMediaGrab version I got is 0.3, and it works as charm.

Anyway, since this is a Video Editing blog, I just wanted to recommend kmediagrab to you guys, in case you need to convert from one format to another.

Now, onto QsKKing's tutorial. Let me tell you man, this rocks. Thanks for encoding it in high resolution so I could read the menus and options clearly so I can replicate them. One thing I specially liked is the overlapping clips and automatic transitions, which I didn't know in Premiere (well, with one or two months of barely using it, what can I say? Perhaps the option was hidden in there somewhere, and I really wouldn't remember anything if it wasn't for the Edicion de Video book).

Now I realize that my approach to making transitions was wrong. Transitions should happen automatically ALL the time, and all I would need to do is get the transition parameters for the clip, when ENDING. Why only the ending transition? Because transitions require by default TWO clips. Only one clip is needed for the information. And with the overlapping = transition, I don't need to define transition links for a clip. The transition parameters will just have to be embedded in the clip. Don't want a transition? Remove the overlap. Voila! :)

Again, thank you, QsKKing. I really owe you one. Just a little favor. Next time use something more universal like AVI/XVID ;-)

Now *THIS* is something that I was talking about. I just downloaded a few tutorials and ads for Canopus' Edius Pro from youtube, and they have these cool features:

* The toolbox above the timeline has these little icons that when you hover over them, you get a tooltip with the Brief description (as in the Menu) AND the accelerator shortcut key!* The resources window lets you scan a directory and they recreate a directory tree with all the resources of each directory in another pane - with icons, of course.* According to the ad, you can jog by just moving the mouse in a circular motion. I ignore if this is by just clicking in the window or use a knob-like jog dial.

But this is the kind of features that I kept asking for in overhyped editors like Jahshaka. What effort did these guys need to make a friggin' tooltip appear in front of a tool? Or a small context menu that explained to me what each button did? These things are CHILD'S PLAY to do, at least in wxWidgets. And why do programmers leave these things FOR THE END, if they're the EASIEST things to do?

So this is why I started my project. Instead of starting with the backend image processing stuff, I'll start with the front end and add all these basic features that make Video Editing "just work". If later some guy makes a super-duper Video Editor technology, he'll just have to use Saya's frontend, and no problem. Of course, provided that I have managed to write an extensible interface for making your own effect dialogs - which I intend to do. But I disgress.

Now that I've seen how Edius works, I'm going to use Edius' resources Window as model, instead of Premiere's (and possibly I'll start searching for more Edius screencasts ;-) ). If only open source programmers learned not to reinvent the wheel...

I think I got it figured out. First of all, let's simplify audio/video clipboard pasting. Since tracks can only be either video or audio, each track needs to have a video-type flag telling which kind of track it is. This way it won't accept clips of different kinds. Ordering of tracks will be handled by the UI, so it doesn't matter if audio and video tracks are mixed, audio track will always appear at the bottom and video tracks at the top. However, for pasting purposes, having only the track's type is sufficient.

Declaring one clipboard type for single clips is annoying. Instead, let's add a function that detects whether a sequence has only a single clip. Or even better, just add a friggin' flag. This is OOP, we can derive a clipboard from a sequence.

With that, we can turn the clipboard into a full-fledged sequence and our copy/paste problems will be gone.

When pasting multiple tracks, we just have to use an identifier for the track's z-position, and act accordingly. If the topmost track is pasted below, we can either deny the paste or (at the user's choice in the options dialog), create new tracks to do the paste.

Having the z-position stored in the clipboard tracks allows for moving clips at will without any difficulties, even if sequence tracks have been added or deleted.

Now the problem is nested tracks. How to handle them? Well, we have to analyse the global effect of having tracks nested. Let's say we have tracks A,B,C and D.

ABCD

Now let's add some container tracks E and F, which will contain A and B, and C and D respectively. Let's call A and B "leaf" tracks, and E and F "branch" tracks.

E - A - BF - C D

In the end, it doesn't matter how nested they are, they keep the same order. If D is rendered and C on top of it, combined, they form a single track but the effect is exactly the same. Meaning that when the tree is expanded to its full extent, there will be no problem. We just have to take care of pasting into the "leaf" tracks and everything will be as usual.

This means container tracks can't have clips. They're only made for visualization and handling purposes. So that would leave us with having to use a "trackbase" class to define branch and leaf tracks. Tracks would then need a function to calculate their Z position based on their parent tracks' position. This can be expressed with the following functions:

Finally, to paste and move tracks, we only have to make sure that all tracks are fully expanded - otherwise, pasting (or moving) clips/selections will be forbidden.

The paste operation then becomes simple, it's just iterating over the tracks and pasting the contents. Note that audio tracks will also be copied/cut/pasted if they're linked to the video tracks. To do so, we need to add a "track" member to the clip, to know which track they belong to. For that we also need a "position" member, to tell us the local position in the track. If the track's position slot doesn't equal to the clips', the clips' position is invalidated and the track is rescanned to find the clip. If the clip isn't found, it's then searched in all the tracks, and updated accordingly. If still the cip isn't found, it means we have an orphan clip and it will be deleted.

I just found a relatively new Nonlinear Editor for Windows. It's called Canopus Edius Pro.

The screenshots look awesome. I think I'm going to take some interface options from them. Hmmm what's this? They got a Render Menu in the menubar. It's worth taking a look...

Apparently there is no such thing as "nested tracks" in Edius... perhaps the idea of nesting tracks isn't really as good as I thought, it only complicates things more. The only advantage that I see in nesting tracks is making more space available for visualizing clips, and applying effects to various tracks at the same time. But if we have sequences that can be nested as clips, there's no need for nesting tracks. I can just select a bunch of clips, right click them and "combine into a single sequence". This would create a new sequence, add it to the resources and create a clip from it.(Update: Yes, there is a need for nested tracks. Do operation on various tracks at the same time, mute or hide all the tracks at the same time. But in the end, they're nothing more than a list of tracks clumped together. What's the difficulty? It'll be just for grouping.)

Going back to Edius... what's this? They got bezier curves for editing effects? O.oWow. When the Lumiera guys mentioned they wanted to add bezier curves for effects editing, they weren't kidding, there IS such a feature. And that color correction screen is really wonderful (I'd link or copy it but I don't want to get in copyright problems, the link is there for you to link).

A problem I stumbled upon is... how to handle copying and pasting? Copying a single clip is no problem. Copying a piece of a track isn't problem either, I only have to store the track beginning offset and copy the clips' data.

However, how to handle nested tracks? Should I group tracks by video and audio?

I remember when I used Premiere a while ago, I could add audio and video tracks. What I don't remember is... would I add only audio tracks or video tracks? But it seems You could add audio-only tracks, i.e. for adding background music to a video. Therefore, tracks shouldn't be grouped by video and audio.

How to handle then copying and pasting?

Here's an idea. Do a copy of the current timeline in case of multitrack clipboard, and paste into exactly the same tracks (no up/down possible). I'm thinking that this would have the same effect than dragging and dropping a selection. However, how to move multitrack clips up/down? I think I'd need to add a command for shifting up / down clips.

When I figure out how to solve this, I'll be able to implement cut/copy/paste effectively.

This is the first screenshot of Saya. So far it only consists of the menus, but the File Menu is 95% implemented (items like Capture / Batch Capture aren't updated yet because they depend on variables that I still haven't added to the application.

As you can see, the menu structure is nearly identical to Adobe Premiere, altho I've modified some things a little. The shortcut keys (the underlined letters) aren't identical since the book I purchased ("Edición de Video" by RedUsers - it's a latin-american book) doesn't include screenshots THAT detailed. But they do have a chapter on the menus, woo hoo! This is why I could copy the Premiere menus so easily. Some of the nested menus weren't in the same chapter, but I could easily compile them from the context menus on the various chapters.

The good news is that I got the project and timeline memory structure (even the undo history) completely designed now (which I'll explain in a later post). The fun part will be designing the timeline widgets, whee!

After I finish the code for enabling/disabling items on all menus, I'll start with the File Importing module. Then you'll get to see some dialogs done :)

I'll reserve the superpowerful magical swords for major releases, like ClaiohmSolais, Ascalon, Balmung or Excalibur. The earliest milestones will have wimpy names, like "baselard", "arrow", or "cutlass" :P

Wednesday, May 14, 2008

After searching the web, I finally managed to implement an undo/redo class for Saya. I never imagined that with simple use-cases, I'd be able to straighten my thoughts and get rid of all the complications in thinking up a good algorithm.

function IsNextEof() { return (curpos+1 >= undostack.size(); ) // Note that we use a double-ended queue instead of just a stack to be able to delete // old states in case memory usage exceeds a given limit.}

I would like to download Adobe's official tutorials, too to study them and implement the features in Saya, but that would feel dirty, like buying Texas from Santa Anna with his own gold, or something. So I'll keep Creative Cow's screencasts for the use cases.

One question I've been asked more than once, is why I decided to start the design with the UI.

According to good programming practices, first you need to work on a good design, and then start the implementation.

Well, for now, the project will just focus on being an editor front-end. This means that the video processing stuff will be held back for later.

Since my short-term goal is to have a workable user interface replicating the menus and dialogs of commercial video applications, I can assume that the design for these applications was well-done, and the only thing I need to do for now is to implement the dialogs and menus. When I stumble upon a wall (i.e. a dialog requiring information that I don't have yet), then I'll develop the infrastructure as needed, with the condition that I'll make it extensible to avoid getting stuck with a specific data structure.

So, if you were worried because I don't have (yet) a working UML design, rest assured I'm also working on a good design for the infrastructure.

So far, we have a class I'll call ProjectManager, which will handle project saving / loading, exporting, asking the user questions, etc.

ProjectManager will have a member m_project which will be of the class VidProject. VidProject is a container, and will have one or more of the following:

* Project Properties (title, framerate, preferred export format, codecs used, etc.)* A double-ended queue (std::deque) of undo/redo states* An std::vector of Sequences (timelines), which will have a vector of tracks each.* A vector of Clips, and a vector of Clip indexes (to reuse deleted clip slots)* A vector of Resources, which are the actual video clips (to be more precise, the info on how to retrieve such clips, i.e. filename, starting / ending frame, etc.)

Each clip will have (at least) the following information:

* id# for the origin (the resource used).* Starting origin frame (in case of video, in case of audio we'll have samples - note that origin frames could also have starting / ending frames for the actual file used)* Ending origin frame* Loop count ( negative for infinite loops; 0 for no loops)* enum: video before the first frame will be black? transparent? a copy of the first frame?)* enum: same for the last frame* Changeable duration in timeline frames (for speed up /slowdown of scenes)* A vector of effects (the effect id will be an id# in case of built-in effects, or a string, in case of external plugins. To keep things simple, the effect parameters will be stored on an std::map.* If it's an audio clip, the id# for the corresponding clip of video, in case they're sinchronized.* the id# for the starting transition (use 0 for none)* the id# for the ending transition (use 0 for none)* In case of audio tracks, the channel # (0 for first channel, 1 for second, etc - this will be defined later as the implementation gets done). In case of stereo and multiple tracks, this will be a vector where the destination tracks will point to the source tracks. For remixing tracks, mixing to mono, etc, there will be also stackable audio effects.

Tracks will be stored in a tree structure (children will be stored in a vector of track id#'s) where the root of the tree will be the final rendering. Again, we won't use pointers but indexes. To prevent recursion, each track will also have a level indicator.

Each track will contain a map from frame# => clip id#. We can use the maps to construct a per-sequence set (a set is a map of booleans) of transition frames. With these transition frames we can construct in real-time a list of states which tell us which frame contains which clip. With this info we can now render clips in the timeline.

Note that I won't plan to use pointers AT ALL. By using standard cointainers and local indexes, I can serialize the sequences into easily-storable strings for undo / redo states, and it'll be also easier to serialize the whole project into an XML string.

So, what is Saya? Saya is a project that aims to become a professional cross-platform Non Linear Video Editor. By cross-platform I mean that it'll work in Windows, Linux and even MacOS.

Perhaps you're here wondering "Why reinvent the wheel?" "Aren't you doing more software fragmentation?" and "Do you really think you can do this on your own?"

Well, I'll answer these questions one by one.Q: Why reinvent the wheel?

A: I'm *NOT* reinventing the wheel. It's the *OTHER* guys who are reinventing the wheel. And yet, nobody has done it in a successful, cross-platform way.

Allow me to explain further. Here are some features that the other Linux video editors are missing:

1. Cross-platform compatibility (and by that I mean Windows!)

As you can see, the Linux video editors are made for Linux, and Linux ONLY. While it's good to support a Free (as in Freedom) Operating System, it's not that good to leave the Windows people in the darkness. What would happen if OpenOffice.org was Linux only? What would happen if Firefox only ran on FreeBSD.

Installing a lot of extra software in Windows to be able to run the binary you downloaded is *NOT* acceptable.

Do you really think the average Windows user will agree to install and mess around with a virtual machine (and Linux inside it, which already includes a lot of headaches), or cygwin JUST to run (or try to run) a Video Editor?

How I'll succeed: With the wxWidgets library.This is why I chose to use the wxWidgets library, which makes portability SIMPLE. And I'm following a successful editor (Audacity) which was done in C++ and wxWidgets.

2. Ease of installation in Windows (and that means absolutely NO python, and much less wxPython!)

The last time I tried to run a wxPython application I downloaded the wrong version (how is an end-user supposed to know?) and the program crashed. Wanna know how I felt?

Another complaint I have is that a lot of video editors depend on an interpreted language. Hello, I want a NATIVE application. If you want to use Python, please use a Python COMPILER. Thank you. I don't want to download an app and realizing it doesn't run after I decided to run it from the command line to see the Python error messages. Ugh.

How I'll succeed: By writing the program 100% in C++.No interpreted language nightmares, no Python, no Ruby, No .NET / MONO, just plain and simple C++. With the use of STL, it's nearly a child's play to easily-write good programs. And with wxWidgets, make that double. Windows users can easily download an already-compiled binary (.exe - or even an installer), and the program will run. Was that so hard?

An advantage of writing the program in C++ and make it install / compile easily in Windows will be a greater project exposure. I can choose to limit myself to the still-small Linux community, or I can embrace (without extending and extinguishing ;-) ) a much wider audience: Windows users. As a matter of fact, I was a Windows user myself, and I was frustrated at the lack of Free (and uncrippled) Video Editors for Windows. I would have joined a project if there was one - and trust me, I've been waiting for one for more than five years. I trust that there are other Windows users like me, that are expecting such a project to be born so they can join and participate in the writing.

3. Ease of installation in Linux

I like short projects, or projects that come with their own libraries, even if redundant. While it's OK not to bundle very large libraries like wxWidgets, it's *NOT* OK to ask for a dependency of an obscure video-processing library that also has dependencies on obscure math libraries that also have dependencies ad infinitum. What am I supposed to do if my distro isn't Ubuntu with their millionaire repository? I'm not asking for sharks with frickin' laser beams on their heads, I'm just asking to bundle the rare libraries so I don't need to download more stuff before doing the ./configure - make - make install ritual.

I really don't know what's the deal with some Linux packages, but I had this experience with Kdenlive, where the release they used was buggy and I couldn't export the AMV I was making. It had timing problems. Sheesh. So, I tried to compile it on my own and I was smacked on the face with another dependency hell.

How I'll succeed: I'll bundle the used libraries.XML libraries, multimedia libraries, they'll be included in the source tree. And it's perfectly legal since they're OPEN SOURCE. Ease of installation also means increased exposure to Linux audience - those people who know how to program but run in a wide variety of Linux distributions.

4. Intuitive User Interface

Let me say this straight: It is my personal opinion (no offense intended) that some editors like Cinelerra, simply have a hideous UI. In other words, it sucks. (personal opinion), and the decision of some developers to make a new program from scratch instead of just forking, confirms this (unfortunately, they'll make it for Linux ONLY. Why oh, why do they leave Windows users in limbo?). Cinelerra used a custom-made User Interface, which means that whenever I try to open or import a file, I'll experience a lot of bugs - an annoying one is that I can't easily change folders because it keeps appending that darn slash at the end, and I can't remove it!

Then it's another maze to get to import a file. First, you can't import Divx files that you downloaded from the internet. Oh no, you first have to use some obscure tool to convert it to DV or M-JPEG format before you import it. More downloading of stuff. (And don't even get me started on crashes and hangs, but that's another matter).

Another video editor I had tried was Jahshaka. Let me tell you that if something is counter-intuitive, Jahshaka is. They provide little or no help, no documentation, no tooltips, no right-click context menus, no nothing. You had to memorize some cryptic key shortcuts and whatnot. I wiped it a couple of hours after installing the thing.

Come on guys, is that really hard to make an easy-to-use menu?

How I'll succeed: I'll copy what works.

I'll adopt existing user interfaces (i.e. menus, dialogs, widgets) from commercial projects like Adobe Premiere or Final Cut Pro. And don't worry about Trade Secrets - I'm not reverse engineering anything. This is why I'm creating the interface based only on existing documentation (i.e. books) and publicly available information (wikis, screenshots found on the web, tutorial videos, etc).

Q: Aren't you doing more software fragmentation?

A: No. I'm filling an unused niche (for the reasons explained above).

Q: Do you really think you can do this on your own?A: No. This is why I'm making the webpage, posting the project on berlios (I'd choose sourceforge, but I've had good experiences with Berlios, specifically in the Code::Blocks IDE project (in which I participated, BTW :) )

Q: OK, you convinced me. How can I help you?

Since I'm doing this from scratch, I need users of commercial video editors to share their experiences and ask for features. It doesn't matter if you don't have the software right now, you can use MS-paint, right? Or you can record some screencasts (or flash animations, whatever) of your favorite video editors (even from memory, even if the drawings suck) , i.e. "this is how you do 3-point editing" (that one would help a lot, btw), or "I like how this tool makes an icon for you" and all that. I'm not an expert in making videos, I didn't take a $1,200 dlls. course on -nameyoursoftwarehere-, so this is why i need help.

If you're a Cinelerra or Kino or whatever - user, you can also record your screen casts or take snapshots and say "I want this feature!". I'll add it as soon as I can.

If you're a C++ programmer with wxWidgets experience, YOU'RE MORE THAN WELCOME to join the project.