Today the industry received word of an innovative new technology—Ephesoft Universe. Universe is a tool that matches enterprise Big Data with intelligent capture, allowing you to process and analyze all of the content contained in your organization’s documents, even if unstructured. I have had the pleasure of early access to the product for evaluation and want to share my thoughts with you.
Universe allows business users to define different documents and data they would like to analyze in an easy and intuitive fashion. By automatically identifying large amounts of data fields like addresses, amounts, etc., users can tag the values with appropriate names used in the business (e.g., primary address, loan amount, SSN, employer). Universe then analyzes the values and determines the best ways to extract the data as more documents are processed. Once the definitions are set up, very large groups of documents can be processed in a very short time by utilizing Apache Spark clusters. This can be deployed on premise or in the cloud.
It’s all well and good to rattle off some technical specs and usage but why should you, and I, care? While Ephesoft Enterprise for intelligent capture allows you to automate the intake of documents to drive business process and archival, it doesn’t harness the immense amount of data you already have in your enterprise. Imagine, as a mortgage insurance company, being able to set rates based on data you already have from the many closing packages you have processed. With Universe, this data is found by looking at defaults in various zip codes, size of homes, loan amounts, home values, and more. You can even look at the data you have and project it out for months. Consider a fraud detection company having the ability to determine how many loan applications a particular name or social security number has applied for in a specified amount of time. Many companies pay other entities for this information—data that already exists in documents they process in their enterprise.
We are excited about this innovative technology and the advanced solutions it will enable Zia to provide to our clients. Ephesoft Universe is going to save our customers time and money while lowering risk. Please contact us to discuss how this technology can help you.
– Pat Myers, EVP and Co-Founder of Zia Consulting
This is the second blog post in the series “From the Desk of Yoran”.
Yoran Sirkis, CEO of Covertix, is a seasoned executive with more than 20 years of experience in information security, specializing in data and physical risk management. He is also a frequent speaker at leading industry conferences.
The ROI from Data-Centric Security
When was the last time you counted the number of security tools in your organization? How many different vendors are involved? What are the maintenance and licensing costs?
I bet you lost count… anyone would.
Companies strive to define and apply security rules that will best protect data, based on their specific business needs. During that process, IT staff encounter evolving security needs and are exposed to an endless amount of solutions—each addressing a valid, real-world security challenge.
As a part of the process of protecting their enterprise data, organizations end up dealing with a variety of vendors, pricey integrations, busy helpdesks, frustrated users and, continuously increasing expenses.
Doing More with Less
As I mentioned in my previous post, The Need for Data-Centric Security, the age of data-centric security transforms the security focus from top-down to bottom-up. It’s this reasoning that makes a data-centric solution more valuable than what it was originally implemented for.
Guided by the understanding and importance of offering a security solution that presents a clear ROI, we have developed a single data-centric solution that delivers much more than file protection.
How many unstructured data files do you think sit on your corporate network? Take a guess. It’s a scary thought, isn’t it? Before you make your team (and yourself) insane by protecting every single file individually, we recommend you take the data classification approach—that’s right, data-centric.
You need a system that easily lets you implement policies across data document types, such as CAD design files from R&D, to files in the accounting department that contain the number 4128 at the start of a 16-digit number.
Having a file protection solution in your organization is critical, but most solutions provide limited protection. A data-centric security solution brings much more to the table. It enables organizations to protect, manage, and audit files internally AND externally, to share sensitive information with external users, and to protect information on different devices.
Encrypting files is necessary to ensure your data is protected and is used only by the people it was intended for. Most solutions burden users, forcing them to learn a new system and placing all the responsibility on them.
A data-centric solution removes that burden by offering a system that operates seamlessly and without affecting users’ behavior. This secures your data in all of the following cases:
- After the file has been opened (using any device or location)
- When content is copied/pasted to a new document
- Protection of the file’s metadata
- When sensitive files are shared with external users
Your files are protected, and you can audit and monitor the usage of their content no matter where the files actually reside—inside or outside the organization.
Confidential data is often placed within secure vaults. But even the best vaults will only keep your data secure when it is stored within it. A data-centric solution provides persistent security, keeping your data secure anytime and anywhere. From the moment a confidential document is created, through any transport, and even when it is download on any device.
As it is transparently integrated into existing business driven processes with automated rules or manual override, a data-centric security solution will not impose on IT staff and is not dependent on a user’s actions.
It’s a given that assets residing in the cloud need to be protected. Because of this, cloud providers began offering their own security solutions as well as those from third parties. Of course, the costs begin to pile up and companies are often still uncertain about who else might have access to their cloud-based data.
Deploying a smart, data-centric security solution is the best way to protect your data anywhere—and even from cloud providers themselves.
Data Leak Prevention (DLP)
DLP solutions aim to prevent files from leaving the your business unintentionally or through malicious actions. But how do you protect your data if it is leaked? A data-centric security solution continues to monitor your files even outside the organization and ensures the data they contain remains secure.
And most importantly….
While plenty of solutions secure your data at rest, or in motion, or when it goes to third parties, but only a data-centric solution can secure the file structure and the data it contains so you know that data is always protected.
If you have concerns about your confidential data when it’s in motion, at rest, or in use; and whether it could be lost through a data breach, from a stolen device, or other unintentional or malicious way, we can give you peace of mind. You CAN have a system with a strong ROI, because you won’t find yourself facing lawsuits or losing customers and your reputation.
For more information, please visit www.covertix.com
by Bindu Wavell, Chief Architect at Zia Consulting
There are two reasons I decided to write this post. First, I want to acknowledge Alfresco for their recent investments in the developer ecosystem. The other reason is to explain where I think we are heading with our development efforts. My ulterior motive is to find people to collaborate with us on these efforts.
Since Thomas DeMeo joined Alfresco as VP of Product Management a bit over a year ago, we’ve noticed a dramatic increase in the focus on system integrators being key stakeholders for Alfresco—and not just based on expertise in sales and business development. After the release of Alfresco One 5.0 at the Alfresco Summit, we saw the likes of Peter Monks and Gabriele Columbro tapped to bring focus to user stories that are important for administrators and developers within the product management organization. Recently, Richard Esplin transitioned from the community lead to focusing on the Community Edition within product management. Alfresco hired Martin Bergljung and Ole Hejlskov to focus on developer tooling/evangelism and community outreach. Within weeks of starting, these individuals put together a new release of the SDK; incorporating contributions, adding new capabilities, and completely revamping the documentation. I’m thrilled that Alfresco is focusing resources in these areas because I think we will see resolution of a lot of technical debt—and that allows for better solutions in less time, leading to a bigger and more vibrant community.
In the past year or so, Alfresco engineering has begun to reorganize into smaller, more agile, scrum teams. This reorganization—along with the focus on product management—will drive initiatives like release agility to provide more frequent and better tested releases of distinct products. It should also provide a platform for resolving technical debts in a more sustained and predictable fashion than we’ve seen in the past. We can also expect cool new products that are easier to integrate and customize. Things like Activiti Enterprise—the integration between Activiti Enterprise and Share—enhanced Office services, reporting/analytics, media management, and even new Case Management features. Not to mention, significant improvements in the repository, Share, and records management.
As the Chief Architect at Zia, part of my mission is to facilitate improvements in developer productivity and satisfaction. In addition, I want to help the team find ways to improve project quality and consistency. I’d like to share where we are heading in these areas, but first let’s cover where we’ve been.
In the past year or so, most of our projects have been based on the third major revision of our development framework. We call the framework—the project structure and the associated tooling—Zia Alfresco Quickstart (for more information, watch this video). Quickstart includes a standard project structure that we evolved from the all-in-one archetype provided by the Alfresco 1.0 Maven SDK. It features reusable code, examples, best practices, and, to some extent, standardizes how we version and deliver our projects and reusable sub-projects.
With version 1.0 of the SDK, as well as our earlier project structures, we were seeing cycle times (from the point when we saved our code to when we were able to exercise the code) of between two and five minutes on very powerful laptops with lots of RAM and solid-state disks. One of the main reasons we started evolving the SDK was to reduce this cycle time. When we started using Quickstart for customer projects, we were able to reduce the cycle time for most edits to about 10 seconds. We did this by taking advantage of incremental compilation and hot deployment techniques. If I was writing this post a couple of years ago, it would have been all about flow. It was hard to experience flow when you had time for tea and a bagel after most code/config changes. Fortunately, this is not as much of an issue anymore. The Alfresco 1.1.x SDK made some similar techniques available for the wider developer community. With the 2.0 SDK, this has been improved even more—but there’s still work to be done.
One area where Quickstart enhances the SDK capabilities is an integration testing framework for repository customizations that also supports continuous integration and, to some extent, delivery. After we presented this framework during Tech Talk Live #69 (see video above), the 1.1 SDK added a similar capability—however, that solution has been a bit unreliable. We contributed the Quickstart testing framework to the SDK team and are hopeful it will be incorporated in the near future. We are excited that the 2.1.0 version includes support for the Share Page Object testing extensions to Selenium WebDriver that was, and continues to be, developed by the Share engineering. This will make it much easier to create tests for UI customizations and to make sure our customizations don’t unexpectedly break existing capabilities provided with the products.
With the project structure we used before Quickstart, it often took us between four and eight hours to get a full development environment (just the Alfresco pieces) installed and configured. With Quickstart, we’ve reduced this to around two or three hours.
We often need to work on code for multiple projects in any given week. In order to handle this, and to accommodate customer variations, we usually set up our development environments in Virtual Machines (VM). Nearly every time we’ve had to start from a base OS machine.
Typically, one team member sets up the initial VM, installs the development tools, and sets up the project structure. Then the VM is shared with all of the team members. We make heavy use of VM snapshots and usually someone keeps a pristine copy of the VM that tracks releases. Should a new developer join the project, or an upgrade be performed, we utilize this pristine copy. Often these VMs are over 40GB, requiring a substantial amount of time just to copy the data.
At Zia, we’ve been testing a few different code review approaches. Some projects are doing regular reviews (weekly for example), others are focusing on reviewing each new significant feature. The ability to create pull requests from forks and branches in BitBucket and GitHub has provided enough of a framework for us so far—though we’d love to incorporate more tooling around code quality and coverage to provide consistent feedback to users.
The Path Ahead
Quickstart has been seen as a proprietary solution that allows us to complete projects faster and at lower cost than we were able to previously. One of the downsides to it being proprietary is that there is a smaller community for collaboration and support of the approach. The next version of our project structure is being developed in-the-open using the open source model we admire so much.
The Quickstart project structure is quite different than any of the official SDKs, and there are good reasons for the differences. In many cases, they improve on what is available in the community today. However, what we have is different enough from the standard that new team members often have a steep learning curve to become proficient and ultimately master the structure. So, while a seasoned practitioner will be very productive, newer folks require more time and support to become productive. This turns out to be detrimental to the goal of improving productivity for some team members.
With the next generation project structure, we plan to stay closer to the official SDK so that there is a much larger community for collaboration. While we still plan to include support for certain opinionated features, we will also support and default to using more traditional Alfresco implementation approaches. Our hope is that this change in direction will facilitate quicker onboarding and allow SDK and Alfresco upgrades to be handled more expeditiously.
With Quickstart and all of the Alfresco SDKs to date, we have to duplicate a large portion of the boilerplate code for common Alfresco customizations such as web scripts, actions, behaviors, jobs, and workflows via cut and paste. While most of these aren’t difficult, they do tend to be error prone.
The development VMs we’ve been using are difficult to version control, slow to copy, and frankly, take significant CPU, disk, and memory resources that we’d prefer to allocate to development and runtime tasks.
We’ve been toying with setting up our development environments using devops tools such as Ansible, Chef, VMWare, Vagrant, and Docker. Using Docker, we have been able to spin up and exercise clustered Alfresco environments on a single machine for testing and POC activities. We’ve also used Vagrant and Ansible to get about a 40% head start on our development VMs. The hope is to script 90% of the project setup efforts, to reduce project setup time, and increase consistency between our projects. We also hope to utilize Docker or other lightweight container solutions to reduce the overhead of our environments.
To date, we’ve had mixed success using these tools to setup our development environments. It often takes a significant amount of time to create and refine the devops scripts and we don’t expect to see a return on our investment until we’ve utilized and stabilized these tools with a number of projects. Fortunately, we have worked with a few customers to create production quality release/delivery substrates using these tools. Our hope is to incorporate our experiences from these projects into the developer tooling with an eye toward standardizing how we install and configure Alfresco solutions in all environments. We feel that by utilizing these techniques, developers will be able to rebuild small, containerized environments from scratch when needed, rather than maintaining and sharing monolithic VMs. This approach will be much easier to version control, easier to upgrade, easier to share, and will be lighter on resources.
An area we are also exploring is the use of cloud development infrastructure (e.g. Codenvy) to develop, run, and test our projects. We’d like to utilize our devops work and create containers that we can use during development and testing and potentially as a vehicle for delivering projects as well. It would be great if this allowed additional interactivity and collaboration during code reviews, while fixing bugs, and for training/coaching users one-on-one or in groups. We’d also like to reduce expenditures on hardware for developers and to deliver progressive capacity to our engineering organization. The ultimate goal is to work smarter with our in-house, remote, and offshore team members.
Our first usable effort in the area of cloud development is the contribution workflow for our new Yeoman generator. By clicking a button on the project GitHub page, we can provision a development environment that has access to the project source code and a docker container that has been set up with the appropriate versions of Java, Maven, Node, and Yeoman. It would also have the local (in the container) Maven repository pre-seeded with assets needed for compiling and running the Alfresco projects we build while testing out the generators. Someone wishing to make a contribution can start developing and testing in under a minute and can send us a pull-request directly from the generated project on Codenvy.
We’d like to invite you to collaborate on these ideas and deliverables. Currently, we are focused on completing our first pass on the Yeoman generator and some high value sub-generators. We’d love to collaborate in order to continue evolving the developer/implementer experience for Alfresco extensions. If you are interested, please leave a comment, send an email, or ping me here on IRC. Once the generator is in good shape, we’ll likely set up a cloud-based development experience. This will be driven by the generators and backed by pre-packaged containers that can be used in the cloud, on our development machines, and possibly in customer dev, stage, and prod environments. Imagine quickly packaging your configuration and customizations with Alfresco into an all-dependencies included container. You could then run tests against the container, deploy that tested container to stage, perform UAT and—assuming everything is accepted—promote the exact same (tested and accepted) container to production.
Now that’s the future of Alfresco development.
How did we get here?
It’s taken nearly 10 years to arrive at our current state of Content Chaos—perhaps starting back in 2007 when managing compliance/risk began its steady decline as the primary business driver for investments into ECM systems. At the same, we initiated the rapid growth of collaboration—simplified sharing of documents both internally and externally—as the leading reason for new ECM investments.
If we define content chaos as the inability to properly find, manage, and secure documents and records, it’s clear from virtually every metric that most organizations (if not all) are facing content chaos in 2015. Whether it’s the amount of time each day that knowledge workers spend searching for documents, or the number of times the wrong version of a document is used, or even the significant investments that companies are forced to make in human capital to staff information governance or records management groups, due to the failure of technology to address these areas. Not to mention the fact that in the news virtually every week is another Sony Pictures or Anthem, where data or content security is the headline for another enterprise.
So how did we get here? Today we’ll look at three key areas that helped us create our world of content chaos: ECM Avoidance, the Dropbox Problem, and SharePoint Sprawl.
It’s interesting to consider that all of the “find, manage, and secure” issues of today could possibly have been avoided if the legacy ECM vendors of the past had focused on one simple issue—user adoption. Instead, we saw an almost myopic focus by users on ECM avoidance, looking for any way to avoid logging into complex and time-consuming ECM systems. Across virtually every industry, surveys show less than 50% of content is being managed in ECM systems, with utilization numbers of 10% (or less) being not uncommon.
From our own experience, when we started working with one of the world’s largest corporate legal departments, they had nearly all of their content stored in either emails or shared drives. This was because users simply wouldn’t utilize the legal DMS systems that were delivered to them by IT.
Ironically, perhaps the best description of this ECM Avoidance issue comes from Box.net in a corporate datasheet way from back in 2011:
Connecting to the ECM system, however, is not all that employees need. Workers want to easily find, access, and leverage current, relevant content. They don’t want to work on a sales proposal, marketing collateral, or contract, only to discover a more up-to-date version is out there in email. And if a system isn’t easy to use and intuitive, email is exactly the place people go first to share their information updates.
Shouldn’t the ECM system have been the exact place that people go to find, access, and leverage current, relevant content? Of course—but only if it’s intuitive and easy to use.
The Dropbox Problem
Dropbox, and Box.net, are obviously another key element of the content chaos seen today in the incredible technology investments and advancements made around simplifying the way that documents are shared—particularly collaboration outside of your organization.
Reviewing the story that is told around the founding of Dropbox, it’s said that the founder developed it while a student at MIT after repeatedly forgetting to bring his USB drive to class. He tried existing file sharing services but they were too slow, complex or error-prone. He then formally founded Dropbox in 2007—the same pivotal year noted at the outset when managing compliance/risk started to decline in importance and simplified collaboration began its march.
While these technologies are indeed incredibly easy to use and certainly address the need for simple collaboration, particularly outside the organization, unfortunately this ease of use makes it simple to share virtually anything outside the organization. Hence, the Dropbox problem became part of the ECM lexicon. A recent report showed over 35 billion Office documents are stored on Dropbox. Where did they come from, who are they shared with?
According to Dropbox support documentation: “other users can’t see your files in Dropbox unless you deliberately share links to files or share folders”. As we’ve seen and heard from many organizations, “deliberately” can also mean “accidentally”. And whether deliberate or accidental, this simple ability to share large amounts of corporate data (including entire folders or drives), via Dropbox, Box.net, or other similar technologies contributes to the content chaos issues of how to find, manage, or secure content. Or, put another way from a leading security researcher, “the problem is not a security flaw as such, but instead an unexpected consequence of user behavior.”
During a recent AIIM Survey, only 7% of organizations responding stated that they did not use SharePoint in some way. Even assuming some bias in the response rates, it’s clear from this survey and virtually every other metric available that SharePoint has achieved a level of pervasiveness that few would have predicted back in 2007. And, as with Dropbox, the SharePoint sprawl problem is not so much a technology issue, as much as a user behavior issue—and an issue of organizations attempting to use a technology for something which is was not designed to do. When companies describe scenarios where they have an average of two SharePoint sites per employee, clearly there is a problem.
From the same AIIM survey, now published as an AIIM Industry Watch paper titled Connecting and Optimizing SharePoint, some interesting themes emerge:
- Only 11% of respondents see their SharePoint deployment as a success
- Most organizations use SharePoint primarily for collaboration—with only 30% using it widely for document management and only 11% using it widely for records management
- Only 13% say SharePoint aligns with their information governance policies
- Only 6% have true federated search—the ability to search across both SharePoint and other document repositories and silos
- At the same time, the commitment to SharePoint remains strong—and so there is a clear need to co-exist in the future, while still addressing those areas described above given that:
- Over 60% of respondents are already using or planning to use SharePoint as the search/access portal to multiple ECM repositories
- Over 75% still have a “strong commitment” to SharePoint
So how do you address all of the topics above? You’ll have to wait and see! Stay tuned for part two of this blog, where we’ll introduce Adhere, our solution to solving content chaos. Coming soon!
Phil Robinson, SVP at Zia Consulting
Ephesoft Smart Capture has a new release – Version 3.1. EVP and Capture Practice Lead Pat Myers from Zia Consulting walks through some of the new features of the release in this short demo video.
In the first segment, Pat walks through the some of the new Advanced Key Value Extraction rules, including:
- Fuzzy Percentage
- Case Sensitivity/Insensitivity
- Setting pages and zones to search for values (first page, last page, right side, left side, and more)
- Generating and testing regular expressions and selecting pre-populated expressions
In the second segment, Pat discusses the previous options for error alerting and logging, and shows the new emailing and downloading capabilities.
Zia Consulting Co-Founder and EVP Pat Myers had the chance to sit down with Ephesoft CTO Ike Kavas to discuss Ike’s presentation at the Zia Content Connected Summit 2013 and what it’s like to be a partner of Zia.
Partner Interview – Paul St. John, VP-Americas at Alfresco
Our CEO, Mike Mahon, had the chance to interview Paul at the Second Annual Zia Summit to discuss why he joined Alfresco, his view on Enterprise Content Management strategies for the future, and what it’s like to partner with Zia.
In this episode of Alfresco’s Tech Talk series, Chief Architect Bindu Wavell from Zia Consulting joined the Alfresco team to show what’s possible when you combine the Alfresco Maven SDK with JRebel and some custom code. He was able to reduce development iterations to mere seconds, hot-deploying things like Alfresco Actions and Behaviors. Midway through we were joined by Gab Columbro, the leader of the effort to create the Maven artifacts for Alfresco as well as the Maven archetypes.
View the Tech Talk now:
As a massive amount of web content is produced on a daily basis, it’s no surprise that consumers are demanding a custom, personalized experience when they interact with your website in order to help weed through and digest information. If they frequent your site, they may expect it will recognize them and display content relevant to them. Visitors also expect that your site will have extensive features and functionality no matter what device they are viewing it on.
At the same time, companies are looking to align and leverage content across a variety of communication channels both internally and externally, including how content is published on company websites, saving time and money, as well as improving corporate control and compliance. Finally, the goal of any content management solution—be it web experience management tool like Crafter, a repository like Alfresco, capture software like Ephesoft – must be to deliver a system that people use. In order to do this, using these tools MUST be easy.
Unfortunately, the legacy of Enterprise Content Management (ECM) systems is one of complexity, both for implementers and users alike. If these systems don’t incorporate the tools employees use in their everyday work habits, they won’t be used–which is why for most organizations, email applications and shared drives are still the dominant form of “content management” and why many ECM technologies are best known as “shelfware”. But there is an alternative, offering users a content management system that works the way they do today without the need for changes in business process or the use of multiple disparate applications to solve business problems.
Crafter Software is built on Alfresco, the open platform for business-critical content management and collaboration. It’s a powerful content hub that allows users to create, edit, review, and approve content. It offers version control, security, and audit trails. It’s an open platform with open standards that allows custom solutions to be built on top of it. Companies that offer packaged solutions often fall short of meeting the needs of most organizations as they are highly unique. We believe building custom solutions that integrate best-of-breed open source technologies like Alfresco, Ephesoft, and Crafter creates a truly complete solution.
Alfresco is reliable, secure, and scalable… all features that enterprises require. However, as we mentioned, employees want EASY. They want simple collaboration; they want basic file sharing both internally and externally; and they want business solutions that are integrated with their most widely used tools like email and other Office products. They don’t want to learn several new systems, save and upload content to multiple locations, or constantly login to multiple tools.
Zia provides users with Easy ECM Solutions that “work the way they do, using the tools they use today,” so users aren’t forced to use alternatives that don’t fit their company’s IT strategy. Our Easy ECM Solutions leverage tools like Office Integration and Cloud Sync to deliver content management systems that work–and since they are built on a single Content Hub, information is available where it’s needed, when it’s needed. Alfresco software can be leveraged on premises, in the cloud, or in a hybrid model.
With Crafter, Alfresco, and Easy ECM Solutions from Zia, your organization can become a “Content Connected Enterprise”. Review the recently recorded webinar on this topic for more information here.
We invite you to attend the Second Annual Zia Content Connected Summit to learn more on September 12 in Boulder, Colorado. We’ll highlight our Easy ECM and Document Automation solutions that integrate with Crafter Software and Alfresco. More information about this free 1-day event can be found at www.ziasummit.com.
Ephesoft Intelligent Document Capture is the ideal tool for processing mortgage documents. Ephesoft ingests documents, classifies and separates them, extracts the data, and then allows you to export it in a number of ways. During this presentation, Zia’s Jon Solove reviews the Ephesoft administration panel and demonstrates how to utilize Ephesoft for mortgage processing. While legacy capture systems use “zonal” technology for extraction, Ephesoft looks at actual text. Using a series of batch classes, Jon will walk through document processing by creating extraction and classification rules.